A Biterm-based Dirichlet Process Topic Model for Short Texts

Pan Yali; Yin Jian; Liu Shaopeng; Li Jing

doi:10.2991/csss-14.2014.71

<Previous Article In Volume

Next Article In Volume>

A Biterm-based Dirichlet Process Topic Model for Short Texts

Authors

Pan Yali, Yin Jian, Liu Shaopeng, Li Jing

Corresponding Author

Pan Yali

Available Online June 2014.

DOI: 10.2991/csss-14.2014.71 How to use a DOI?
Keywords: Dirichlet Process; Clustering; Biterm; Short Texts; Topic Mining;
Abstract: Topic models are prevalent in many fields (e.g. context analysis), which are applied to discovering the latent topics. In document modeling, conventional topic models (e.g. latent Dirichlet allocation and its variants) do well for normal documents. However, the severe data sparsity problem makes the topic modeling in short texts difficult and unreliable. To tackle this problem, an effective approach (biterm topic model) has been proposed recently which learns topics by directly modeling the generation of word co-occurrence patterns at corpus-level rather than at document-level. But it requires human intervention for determining the number of topics. In this paper, we propose a Dirichlet process based on word co-occurrence to make topic mining from short texts more automatically. Meanwhile, we design a Markov chain Monte Carlo sampling scheme for posterior inference in our model which is an extension of the sampling algorithm based on Chinese restaurant process. Finally, we conduct experiments on real data. The results show that our method outperforms the baseline on quality of topic and perplexity and it is more flexible.
Copyright: © 2014, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 3rd International Conference on Computer Science and Service System
Series: Advances in Intelligent Systems Research
Publication Date: June 2014
ISBN: 978-94-6252-012-7
ISSN: 1951-6851
DOI: 10.2991/csss-14.2014.71 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Pan Yali
AU  - Yin Jian
AU  - Liu Shaopeng
AU  - Li Jing
PY  - 2014/06
DA  - 2014/06
TI  - A Biterm-based Dirichlet Process Topic Model for Short Texts
BT  - Proceedings of the 3rd International Conference on Computer Science and Service System
PB  - Atlantis Press
SP  - 301
EP  - 304
SN  - 1951-6851
UR  - https://doi.org/10.2991/csss-14.2014.71
DO  - 10.2991/csss-14.2014.71
ID  - Yali2014/06
ER  -

download .riscopy to clipboard