Proceedings of the 2015 International Symposium on Computers & Informatics

A Novel Term Selection Approach in sLDA for Imbalanced Text Categorization

Authors
Zhenyan Liu, Dan Meng, Weiping Wang, Yong Wang, Chenhao Bai
Corresponding Author
Zhenyan Liu
Available Online January 2015.
DOI
10.2991/isci-15.2015.231How to use a DOI?
Keywords
sLDA; imbalanced dataset; text categorization; topic model.
Abstract

The supervised Latent Dirichlet Allocation (sLDA) is a probabilistic topic model of labelled documents, which is better than unsupervised LDA for text categorization. But sLDA experiments were based upon this default assumtion that the corpus is balanced, that is, the samples of each class are approximately equal, and chose a vocabulary by tf-idf. While the corpus is imbalanced, tf-idf tends to choose terms from the majority classes and ignore terms of the minority ones. Thus the performance of text classifier will be degraded severely. Therefore this paper proposed a new term selection approach which can fairly choose more discriminative terms from every category. Experimental results show that using this new approach in sLDA for imbalanced text categorization can greatly impove the recall and precision of the minority classes, and it is superior to tf-idf.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2015 International Symposium on Computers & Informatics
Series
Advances in Computer Science Research
Publication Date
January 2015
ISBN
10.2991/isci-15.2015.231
ISSN
2352-538X
DOI
10.2991/isci-15.2015.231How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Zhenyan Liu
AU  - Dan Meng
AU  - Weiping Wang
AU  - Yong Wang
AU  - Chenhao Bai
PY  - 2015/01
DA  - 2015/01
TI  - A Novel Term Selection Approach in sLDA for Imbalanced Text Categorization
BT  - Proceedings of the 2015 International Symposium on Computers & Informatics
PB  - Atlantis Press
SP  - 1733
EP  - 1740
SN  - 2352-538X
UR  - https://doi.org/10.2991/isci-15.2015.231
DO  - 10.2991/isci-15.2015.231
ID  - Liu2015/01
ER  -