Development and Design of General Data Mining System
- 10.2991/itoec-15.2015.26How to use a DOI?
- Data Mining; Optimal Design; discretization
In this paper, we focus on top-down discretization methods and propose a new method for supervised discretization based on class-feature correlation by defining a class-feature contingency factor. The proposed method takes into consideration the distribution of all samples to generate an ideal discretization scheme. The method maintains a high interdependence between the target class and the discretized attribute, and avoids overfitting. Empirical evaluation of seven discretization algorithms on UCI real datasets show that the novel algorithm can yield a better discretization scheme that improves the accuracy of decision tree classification. As to the execution time of discretization and the number of generated rules, our approach also achieves promising results.
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Baowen Chen PY - 2015/03 DA - 2015/03 TI - Development and Design of General Data Mining System BT - Proceedings of the 2015 Information Technology and Mechatronics Engineering Conference PB - Atlantis Press SP - 120 EP - 123 SN - 2352-538X UR - https://doi.org/10.2991/itoec-15.2015.26 DO - 10.2991/itoec-15.2015.26 ID - Chen2015/03 ER -