Mining on the subset of raw data set based on clustering

Yuling Ma

doi:10.2991/asei-15.2015.50

<Previous Article In Volume

Next Article In Volume>

Mining on the subset of raw data set based on clustering

Authors

Yuling Ma

Corresponding Author

Yuling Ma

Available Online May 2015.

DOI: 10.2991/asei-15.2015.50 How to use a DOI?
Keywords: Big data era; Clustering algorithm; Association rule mining; ID3; Subspace; PAC learnable; Sample complexity.
Abstract: With the advancement of information process, the amount of the data accumulated by all walks of life is increasing exponentially. The emergence of massive data brings challenges to the traditional machine learning and data mining algorithms. In view of this problem, there have been many new researches, such as distributed machine learning, GPU acceleration processing, and the optimization of algorithms. But even so, when the amount of data is very big, for example, the data which come from biological field, mining on these data directly is still time-consuming and memory-consuming. In such big data era, what should we do first before mining In this paper, we proposed mining subset method. It found out a representative subset of raw data through some related algorithms, and then applied data mining algorithms to the subset. Theory and experiments both verify the correctness of our method, especially when the dataset size is very large, the advantage of our method is more obvious.
Copyright: © 2015, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2015 International conference on Applied Science and Engineering Innovation
Series: Advances in Engineering Research
Publication Date: May 2015
ISBN: 978-94-62520-94-3
ISSN: 2352-5401
DOI: 10.2991/asei-15.2015.50 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Yuling Ma
PY  - 2015/05
DA  - 2015/05
TI  - Mining on the subset of raw data set based on clustering
BT  - Proceedings of the 2015 International conference on Applied Science and Engineering Innovation
PB  - Atlantis Press
SP  - 243
EP  - 246
SN  - 2352-5401
UR  - https://doi.org/10.2991/asei-15.2015.50
DO  - 10.2991/asei-15.2015.50
ID  - Ma2015/05
ER  -

download .riscopy to clipboard