Proceedings of the Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019)

Multi-Document Text Summarization Based on Semantic Clustering and Selection of Representative Sentences Using Latent Dirichlet Allocation

Authors
Oktefvia Aruda LISJANA, Dian Palupi RINI, Novi YUSLIANI
Corresponding Author
Dian Palupi RINI
Available Online 6 May 2020.
DOI
10.2991/aisr.k.200424.029How to use a DOI?
Keywords
multi-document, semantic clustering, text summarization, LDA
Abstract

Information in the form of online news texts has become one of the most important in this information age. There is a lot of online news that is produced every day, but this news often provides the same contextual content but with a different narrative. This makes it difficult for readers to get complete information. Therefore, we need a solution that can retrieve information in several online news texts to be more effective and efficient in the form of summarizing multi-document texts. In this research, extraction summarization is used, which is to arrange the sentences in the source document to a shorter form. The methods used are Latent Semantic Indexing (LSI) and Similarity-Based Histogram Clustering (SHC) to create semantic sentence clusters and the Latent Dirichlet Allocation (LDA) method to select representative sentences from the formed clusters. Recall-Oriented Understanding Testing for Gisting Evaluation (ROUGE) is used to measure test results. The proposed methods can reach a ROUGE-1 multi-F-measure value of 0.481.

Copyright
© 2020, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019)
Series
Advances in Intelligent Systems Research
Publication Date
6 May 2020
ISBN
10.2991/aisr.k.200424.029
ISSN
1951-6851
DOI
10.2991/aisr.k.200424.029How to use a DOI?
Copyright
© 2020, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Oktefvia Aruda LISJANA
AU  - Dian Palupi RINI
AU  - Novi YUSLIANI
PY  - 2020
DA  - 2020/05/06
TI  - Multi-Document Text Summarization Based on Semantic Clustering and Selection of Representative Sentences Using Latent Dirichlet Allocation
BT  - Proceedings of the Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019)
PB  - Atlantis Press
SP  - 203
EP  - 206
SN  - 1951-6851
UR  - https://doi.org/10.2991/aisr.k.200424.029
DO  - 10.2991/aisr.k.200424.029
ID  - LISJANA2020
ER  -