Proceedings of the 3rd International Conference on Computer Science and Service System

A Novel Improved TFIDF Algorithm

Authors
Yang KeHua, Ye Dan
Corresponding Author
Yang KeHua
Available Online June 2014.
DOI
10.2991/csss-14.2014.37How to use a DOI?
Keywords
text categorization; TFIDF; semantics; information entropy; information gain
Abstract

Feature weighting algorithm has a great effect on the accuracy of text categorization. The classical Term Frequency and Inverse Documentation Frequency algorithm (TFIDF) ignores the semantic relationships between terms in the document set, thus to influence the accuracy of term weight calculation. To calculate the weight of words more correctly, the paper introduces the semantic association between words and proposed a new improved algorithm (S-TFIDFIGE) combined with semantic, information entropy and information gain. Experimental results show that the proposed method has better classification results than the traditional TFIDF and other improved algorithms.

Copyright
© 2014, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 3rd International Conference on Computer Science and Service System
Series
Advances in Intelligent Systems Research
Publication Date
June 2014
ISBN
10.2991/csss-14.2014.37
ISSN
1951-6851
DOI
10.2991/csss-14.2014.37How to use a DOI?
Copyright
© 2014, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Yang KeHua
AU  - Ye Dan
PY  - 2014/06
DA  - 2014/06
TI  - A Novel Improved TFIDF Algorithm
BT  - Proceedings of the 3rd International Conference on Computer Science and Service System
PB  - Atlantis Press
SP  - 162
EP  - 166
SN  - 1951-6851
UR  - https://doi.org/10.2991/csss-14.2014.37
DO  - 10.2991/csss-14.2014.37
ID  - KeHua2014/06
ER  -