Similarity calculation based on Mongolian news corpus
- Yaowen Gao, Feilong Bao, Guanglai Gao
- Corresponding Author
- Yaowen Gao
Available Online May 2018.
- https://doi.org/10.2991/amcce-18.2018.31How to use a DOI?
- Similarity, Mongolian, Vector Space Model.
- Similarity calculation is an important part of new event detection and effective computation of text similarity can remove redundant information and improve the efficiency of users' query. The paper mainly studies the calculation of the similarity between the Mongolian news materials. Because of the non-standard Mongolian news corpus, the corpus needs to be preprocessed in order to deal with the later work, which can improve the efficiency. So first of all, it is necessary to preprocess the news corpus, including code conversion、text proofreading、stop-words removal and suffixes removal. Then the news messages are mapped to vectors with a vector space model and calculating similarity between the vectors by Cosine formula. Finally, we choose precision、recall、F-measure as evaluation standard to evaluate the experimental results. The results show that the experiment is better than the manual.
- Open Access
- This is an open access article distributed under the CC BY-NC license.
Cite this article
TY - CONF AU - Yaowen Gao AU - Feilong Bao AU - Guanglai Gao PY - 2018/05 DA - 2018/05 TI - Similarity calculation based on Mongolian news corpus BT - 2018 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018) PB - Atlantis Press SN - 2352-5401 UR - https://doi.org/10.2991/amcce-18.2018.31 DO - https://doi.org/10.2991/amcce-18.2018.31 ID - Gao2018/05 ER -