Mining on the subset of raw data set based on clustering
- 10.2991/asei-15.2015.50How to use a DOI?
- Big data era; Clustering algorithm; Association rule mining; ID3; Subspace; PAC learnable; Sample complexity.
With the advancement of information process, the amount of the data accumulated by all walks of life is increasing exponentially. The emergence of massive data brings challenges to the traditional machine learning and data mining algorithms. In view of this problem, there have been many new researches, such as distributed machine learning, GPU acceleration processing, and the optimization of algorithms. But even so, when the amount of data is very big, for example, the data which come from biological field, mining on these data directly is still time-consuming and memory-consuming. In such big data era, what should we do first before mining In this paper, we proposed mining subset method. It found out a representative subset of raw data through some related algorithms, and then applied data mining algorithms to the subset. Theory and experiments both verify the correctness of our method, especially when the dataset size is very large, the advantage of our method is more obvious.
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Yuling Ma PY - 2015/05 DA - 2015/05 TI - Mining on the subset of raw data set based on clustering BT - Proceedings of the 2015 International conference on Applied Science and Engineering Innovation PB - Atlantis Press SP - 243 EP - 246 SN - 2352-5401 UR - https://doi.org/10.2991/asei-15.2015.50 DO - 10.2991/asei-15.2015.50 ID - Ma2015/05 ER -