Multi-Document Text Summarization Based on Semantic Clustering and Selection of Representative Sentences Using Latent Dirichlet Allocation
- DOI
- 10.2991/aisr.k.200424.029How to use a DOI?
- Keywords
- multi-document, semantic clustering, text summarization, LDA
- Abstract
Information in the form of online news texts has become one of the most important in this information age. There is a lot of online news that is produced every day, but this news often provides the same contextual content but with a different narrative. This makes it difficult for readers to get complete information. Therefore, we need a solution that can retrieve information in several online news texts to be more effective and efficient in the form of summarizing multi-document texts. In this research, extraction summarization is used, which is to arrange the sentences in the source document to a shorter form. The methods used are Latent Semantic Indexing (LSI) and Similarity-Based Histogram Clustering (SHC) to create semantic sentence clusters and the Latent Dirichlet Allocation (LDA) method to select representative sentences from the formed clusters. Recall-Oriented Understanding Testing for Gisting Evaluation (ROUGE) is used to measure test results. The proposed methods can reach a ROUGE-1 multi-F-measure value of 0.481.
- Copyright
- © 2020, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Oktefvia Aruda LISJANA AU - Dian Palupi RINI AU - Novi YUSLIANI PY - 2020 DA - 2020/05/06 TI - Multi-Document Text Summarization Based on Semantic Clustering and Selection of Representative Sentences Using Latent Dirichlet Allocation BT - Proceedings of the Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019) PB - Atlantis Press SP - 203 EP - 206 SN - 1951-6851 UR - https://doi.org/10.2991/aisr.k.200424.029 DO - 10.2991/aisr.k.200424.029 ID - LISJANA2020 ER -