Proceedings of the International Conference on Advances in Mechanical Engineering and Industrial Informatics

Research on Parallelized Stream Data Micro Clustering Algorithm

Authors
Ke Ma, Lingjuan Li, Yimu Ji, Shengmei Luo, Tao Wen
Corresponding Author
Ke Ma
Available Online April 2015.
DOI
10.2991/ameii-15.2015.116How to use a DOI?
Keywords
clustering; stream data; distriubuted algorithm; MapReduce; Micro-clustering
Abstract

Analysis and mining of stream data is a hot research topic in recent years. In order to improve the clustering efficiency, based on MapReduce, this paper proposes a Parallelized Stream Data Micro Clustering Algorithm PSDMC for the micro-clustering phase of CluStream algorithm. PSDMC algorithm uses a series of containers to store real-time stream data according to the arrival time. Each map node produces real-time local micro-clusters per unit time (such as 1 second). The reduce node puts together these real-time local micro-clusters to produce real-time global micro-clusters by using DBSCAN and the micro clustering method of CluStream. The global micro-clusters will be used to renew local micro-clusters in every map node and be used to create snapshots to store into Pyramidal Time Frame. Analysis shows that the efficiency of PSDMC algorithm can increase nearly linearly with the increase of map nodes while the clustering accuracy can be guaranteed.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the International Conference on Advances in Mechanical Engineering and Industrial Informatics
Series
Advances in Engineering Research
Publication Date
April 2015
ISBN
10.2991/ameii-15.2015.116
ISSN
2352-5401
DOI
10.2991/ameii-15.2015.116How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Ke Ma
AU  - Lingjuan Li
AU  - Yimu Ji
AU  - Shengmei Luo
AU  - Tao Wen
PY  - 2015/04
DA  - 2015/04
TI  - Research on Parallelized Stream Data Micro Clustering Algorithm
BT  - Proceedings of the International Conference on Advances in Mechanical Engineering and Industrial Informatics
PB  - Atlantis Press
SP  - 629
EP  - 634
SN  - 2352-5401
UR  - https://doi.org/10.2991/ameii-15.2015.116
DO  - 10.2991/ameii-15.2015.116
ID  - Ma2015/04
ER  -