Proceedings of the International Conference on Computer, Information Technology and Intelligent Computing (CITIC 2022)

Objectivity and Subjectivity Classification with BERT for Bahasa Melayu

Authors
Wing Kin Chong1, Hu Ng1, *, Timothy Tzen Vun Yap1, Wooi King Soo1, Vik Tor Goh2, Dong Theng Cher3
1Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Malaysia
2Faculty of Engineering, Multimedia University, Cyberjaya, Malaysia
3SIRIM Berhad, Shah Alam, Malaysia
*Corresponding author. Email: nghu@mmu.edu.my
Corresponding Author
Hu Ng
Available Online 27 December 2022.
DOI
10.2991/978-94-6463-094-7_20How to use a DOI?
Keywords
Objectivity; Word2Vec; Subjectivity classification; BERT; Sentiment classification
Abstract

This research present the notion of subjectivity and objectivity in Bahasa Melayu language. Word2Vec and BERT word embedding models are created for the purpose of subjectivity classification and sentiment classification. Two types of embeddings are developed (Word2Vec and BERT) with Wikipedia data as objectivity dataset, Twitter data as subjectivity dataset and combination of both datasets. A pre-trained BERT embedding model called Bert-Base-Bahasa-Cased is used as a reference. First, the datasets are fed into every embedding model to be embedded as vectors. The subjectivity classification and sentiment classification are carried out via 70:30 train-test splits. Both classification tasks are carried out using Logistic Regression, Random Forest, and Double Layer Neural Network classifiers. Logistic Regression on Bert-Base-Bahasa-Cased model achieved the highest result of 99.95% in subjectivity classification and 74.30% in sentiment classification.

Copyright
© 2022 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Computer, Information Technology and Intelligent Computing (CITIC 2022)
Series
Atlantis Highlights in Computer Sciences
Publication Date
27 December 2022
ISBN
10.2991/978-94-6463-094-7_20
ISSN
2589-4900
DOI
10.2991/978-94-6463-094-7_20How to use a DOI?
Copyright
© 2022 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Wing Kin Chong
AU  - Hu Ng
AU  - Timothy Tzen Vun Yap
AU  - Wooi King Soo
AU  - Vik Tor Goh
AU  - Dong Theng Cher
PY  - 2022
DA  - 2022/12/27
TI  - Objectivity and Subjectivity Classification with BERT for Bahasa Melayu
BT  - Proceedings of the International Conference on Computer, Information Technology and Intelligent Computing (CITIC 2022)
PB  - Atlantis Press
SP  - 246
EP  - 257
SN  - 2589-4900
UR  - https://doi.org/10.2991/978-94-6463-094-7_20
DO  - 10.2991/978-94-6463-094-7_20
ID  - Chong2022
ER  -