Proceedings of the 2016 International Conference on Education, Management, Computer and Society

Identification Of Seed Users Via Short Messages Based On Hadoop

Authors
Zhiwei Ye, Pingjian Zhang
Corresponding Author
Zhiwei Ye
Available Online January 2016.
DOI
10.2991/emcs-16.2016.456How to use a DOI?
Keywords
Corpus processing; Parallel association rule mining; Inverted index hash table; MPI; Speedup
Abstract

With the rapid growth of the text processing technology, many knowledge discovery approaches have been introduced to handle large corpus. Data mining methods such as clustering and categorization, for example, have found wide applications in corpus processing. Recently, association rule mining methods also have a place in this field. However, due to the huge amount of "items" contained in corpus, the traditional association rule mining algorithms encounter great effectiveness and efficiency challenges. In this paper, a new parallel association rule mining algorithm especially customized for corpus is developed and implemented using the MPI programming interface. The main ideas are to adopt a distributed inverted index hash table, and to design a communication scheme based on "chessboard decomposition" to accelerate the generation of candidate itemsets. Experiments are are devised and conducted on the Tianhe-II Supercomputer of Guangzhou National Super Computing Center. The experimental results demonstrate that the new algorithm has achieved desirable performance, with a speedup rate of 16 when using 49 processes altogether.

Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2016 International Conference on Education, Management, Computer and Society
Series
Advances in Computer Science Research
Publication Date
January 2016
ISBN
10.2991/emcs-16.2016.456
ISSN
2352-538X
DOI
10.2991/emcs-16.2016.456How to use a DOI?
Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Zhiwei Ye
AU  - Pingjian Zhang
PY  - 2016/01
DA  - 2016/01
TI  - Identification Of Seed Users Via Short Messages Based On Hadoop
BT  - Proceedings of the 2016 International Conference on Education, Management, Computer and Society
PB  - Atlantis Press
SP  - 1814
EP  - 1817
SN  - 2352-538X
UR  - https://doi.org/10.2991/emcs-16.2016.456
DO  - 10.2991/emcs-16.2016.456
ID  - Ye2016/01
ER  -