Proceedings of the 2017 5th International Conference on Frontiers of Manufacturing Science and Measuring Technology (FMSMT 2017)

A Method for Calculating the Similarity of TF - IDF Texts for Synonyms in Biomedical Domains

Authors
Miao Hao, Ke Fan
Corresponding Author
Miao Hao
Available Online April 2017.
DOI
https://doi.org/10.2991/fmsmt-17.2017.117How to use a DOI?
Keywords
FT-IDF texts, Synonyms, Biomedical domains
Abstract
In the traditional text similarity calculation, most of the TF-IDF method. TF-IDF establishes the word frequency vector for the text, and calculates the cosine between the vectors as the similarity of the text. The algorithm is widely used in many search engines, information retrieval system can be seen, but in the text of the vocabulary processing is not ideal. The synonyms between professional phrases are not perceived by models, and they are used as different words to calculate similarity. In this paper, synonymous with biomedical field as an example, in the TF-IDF model embedded synonyms recognition function. Firstly, this method acquires the synonyms of the vocabulary in the biomedical field and establishes the synonyms, then identifies the synonyms in the TF-IDF model and calculates the better weight of the phrase. The experimental results show that this method can effectively improve the precision of text similarity calculation in biomedical field, and it is a more effective than the traditional TF-IDF text similarity calculation method.
Open Access
This is an open access article distributed under the CC BY-NC license.

Download article (PDF)

Cite this article

TY  - CONF
AU  - Miao Hao
AU  - Ke Fan
PY  - 2017/04
DA  - 2017/04
TI  - A Method for Calculating the Similarity of TF - IDF Texts for Synonyms in Biomedical Domains
BT  - Proceedings of the 2017 5th International Conference on Frontiers of Manufacturing Science and Measuring Technology (FMSMT 2017)
PB  - Atlantis Press
SP  - 578
EP  - 583
SN  - 2352-5401
UR  - https://doi.org/10.2991/fmsmt-17.2017.117
DO  - https://doi.org/10.2991/fmsmt-17.2017.117
ID  - Hao2017/04
ER  -