Proceedings of the 2016 4th International Conference on Electrical & Electronics Engineering and Computer Science (ICEEECS 2016)

Improving Index Term Extraction for Chinese Books with Professional Score

Authors
Shu'qi Lv, Ning Li, Ying'ai Tian
Corresponding Author
Shu'qi Lv
Available Online December 2016.
DOI
10.2991/iceeecs-16.2016.161How to use a DOI?
Keywords
Back-of-the-book Index; Index Term Extraction; Wikipedia; PageRank
Abstract

The current situation of the index term extraction for Chinese books was investigated. Aiming to improve performance of traditional key phrase extraction methods for extracting index terms, we propose a novel feature named professional score to evaluate the importance of each candidate. Wikipedia is used to identify whether candidates are meaningful keywords in the domain of the book. Then, we quote the idea of PageRank algorithm to calculate the professional score of candidates by fully utilizing the category structure and citing relationships in Wikipedia. To evaluate the performance of our proposed feature in improving the index term extraction for Chinese books, the traditional TF-IDF and the combination method of TF-IDF and our proposed professional score are conducted. It is found that the precision, recall and F-measure obtained by the combining method are respectively higher 54%, 35% and 46% than those obtained by the traditional TF-IDF.

Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2016 4th International Conference on Electrical & Electronics Engineering and Computer Science (ICEEECS 2016)
Series
Advances in Computer Science Research
Publication Date
December 2016
ISBN
10.2991/iceeecs-16.2016.161
ISSN
2352-538X
DOI
10.2991/iceeecs-16.2016.161How to use a DOI?
Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Shu'qi Lv
AU  - Ning Li
AU  - Ying'ai Tian
PY  - 2016/12
DA  - 2016/12
TI  - Improving Index Term Extraction for Chinese Books with Professional Score
BT  - Proceedings of the 2016 4th International Conference on Electrical & Electronics Engineering and Computer Science (ICEEECS 2016)
PB  - Atlantis Press
SP  - 823
EP  - 830
SN  - 2352-538X
UR  - https://doi.org/10.2991/iceeecs-16.2016.161
DO  - 10.2991/iceeecs-16.2016.161
ID  - Lv2016/12
ER  -