Proceedings of the 2016 International Conference on Intelligent Control and Computer Application

The Research of Chinese Short-text Classification Based on Domain Keyword Set Extension and HowNet

Authors
Xiangdong Li, Fan Gao, Cong Ding
Corresponding Author
Xiangdong Li
Available Online January 2016.
DOI
10.2991/icca-16.2016.57How to use a DOI?
Keywords
Short-text classification, Keyword set, LDA, Feature extension, HowNet
Abstract

To implement feature extension of short text and improve short text classification performance, this paper extracts the high frequency words and topic core words of each class of the training set as domain keyword set based on two different feature granularity, which are keyword and latent topic, and derives the topic probability distribution of the test text using LDA model, while some topic probability is greater than a certain threshold, extends the keywords of the topic into the testing text. Calculate the semantic similarity of the test text and the domain keyword set for each category by using HowNet. Experimental results show that the method proposed in this paper can effectively improve the short-text classification performance.

Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2016 International Conference on Intelligent Control and Computer Application
Series
Advances in Computer Science Research
Publication Date
January 2016
ISBN
10.2991/icca-16.2016.57
ISSN
2352-538X
DOI
10.2991/icca-16.2016.57How to use a DOI?
Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Xiangdong Li
AU  - Fan Gao
AU  - Cong Ding
PY  - 2016/01
DA  - 2016/01
TI  - The Research of Chinese Short-text Classification Based on Domain Keyword Set Extension and HowNet
BT  - Proceedings of the 2016 International Conference on Intelligent Control and Computer Application
PB  - Atlantis Press
SP  - 244
EP  - 247
SN  - 2352-538X
UR  - https://doi.org/10.2991/icca-16.2016.57
DO  - 10.2991/icca-16.2016.57
ID  - Li2016/01
ER  -