Improving Index Term Extraction for Chinese Books with Professional Score
- 10.2991/iceeecs-16.2016.161How to use a DOI?
- Back-of-the-book Index; Index Term Extraction; Wikipedia; PageRank
The current situation of the index term extraction for Chinese books was investigated. Aiming to improve performance of traditional key phrase extraction methods for extracting index terms, we propose a novel feature named professional score to evaluate the importance of each candidate. Wikipedia is used to identify whether candidates are meaningful keywords in the domain of the book. Then, we quote the idea of PageRank algorithm to calculate the professional score of candidates by fully utilizing the category structure and citing relationships in Wikipedia. To evaluate the performance of our proposed feature in improving the index term extraction for Chinese books, the traditional TF-IDF and the combination method of TF-IDF and our proposed professional score are conducted. It is found that the precision, recall and F-measure obtained by the combining method are respectively higher 54%, 35% and 46% than those obtained by the traditional TF-IDF.
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Shu'qi Lv AU - Ning Li AU - Ying'ai Tian PY - 2016/12 DA - 2016/12 TI - Improving Index Term Extraction for Chinese Books with Professional Score BT - Proceedings of the 2016 4th International Conference on Electrical & Electronics Engineering and Computer Science (ICEEECS 2016) PB - Atlantis Press SP - 823 EP - 830 SN - 2352-538X UR - https://doi.org/10.2991/iceeecs-16.2016.161 DO - 10.2991/iceeecs-16.2016.161 ID - Lv2016/12 ER -