A Novel Improved TFIDF Algorithm
- https://doi.org/10.2991/csss-14.2014.37How to use a DOI?
- text categorization; TFIDF; semantics; information entropy; information gain
Feature weighting algorithm has a great effect on the accuracy of text categorization. The classical Term Frequency and Inverse Documentation Frequency algorithm (TFIDF) ignores the semantic relationships between terms in the document set, thus to influence the accuracy of term weight calculation. To calculate the weight of words more correctly, the paper introduces the semantic association between words and proposed a new improved algorithm (S-TFIDFIGE) combined with semantic, information entropy and information gain. Experimental results show that the proposed method has better classification results than the traditional TFIDF and other improved algorithms.
- © 2014, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Yang KeHua AU - Ye Dan PY - 2014/06 DA - 2014/06 TI - A Novel Improved TFIDF Algorithm BT - Proceedings of the 3rd International Conference on Computer Science and Service System PB - Atlantis Press SP - 162 EP - 166 SN - 1951-6851 UR - https://doi.org/10.2991/csss-14.2014.37 DO - https://doi.org/10.2991/csss-14.2014.37 ID - KeHua2014/06 ER -