Proceedings of the International Conference on Science and Technology (ICST 2018)

Document Retrieval System Based on Topic Clustering Method

Authors
P.M. Prihatini, I.K.G.D. Putra, I.A.D. Giriantari, M. Sudarma
Corresponding Author
P.M. Prihatini
Available Online December 2018.
DOI
10.2991/icst-18.2018.118How to use a DOI?
Keywords
document retrieval; topic model; clustering
Abstract

Document retrieval aims to find documents in a collection of unstructured text to meet the needs of user information. The search engine was required in the document retrieval system to perform the entire process automatically, starting from the processing of document text in the collection, feature selection, feature extraction, query text processing and search documents relevant to the query. There were three main factors in improving search engine performance: the feature selection method, the method of weighting features in document collections and the method of searching documents in the collection. In this paper, there were some methods used to improve the performance of search engines. For feature selection, Term Frequency-Invers Document Frequency based on Luhn's Idea was used for document features selection. For weighting features, Fuzzy Gibbs Latent Dirichlet Allocation was used for feature extraction method to weight the document features. To search documents that were relevant to the query, this paper used a Document Retrieval based on Topic Clustering method. Through this method, all documents were clustered based on feature weight obtained through feature extraction methods. Clusters that relevant to the query term combinations were selected and all documents in the cluster were displayed as search results. The result showed this method can retrieve set of documents in the cluster that relevant to the query. Therefore, this method could eliminate the query-document distance calculation function in the retrieval process, so it was hoped that the search process would run faster.

Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the International Conference on Science and Technology (ICST 2018)
Series
Atlantis Highlights in Engineering
Publication Date
December 2018
ISBN
10.2991/icst-18.2018.118
ISSN
2589-4943
DOI
10.2991/icst-18.2018.118How to use a DOI?
Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - P.M. Prihatini
AU  - I.K.G.D. Putra
AU  - I.A.D. Giriantari
AU  - M. Sudarma
PY  - 2018/12
DA  - 2018/12
TI  - Document Retrieval System Based on Topic Clustering Method
BT  - Proceedings of the International Conference on Science and Technology (ICST 2018)
PB  - Atlantis Press
SP  - 568
EP  - 573
SN  - 2589-4943
UR  - https://doi.org/10.2991/icst-18.2018.118
DO  - 10.2991/icst-18.2018.118
ID  - Prihatini2018/12
ER  -