Economic deterministic ensemble classifiers with probabilistic output using for robust quantification: study of unbalanced educational datasets
- 10.2991/iscde-19.2019.128How to use a DOI?
- Educational Data Mining, monitoring, individual and collective learning outcomes, Weka, classification, assessment, deterministic and probabilistic forecasts, ensemble classification and quantification
The overall goal of our work is to find economic and robust supervised machine learning methods which adequate to both individual and collective Student Performance Forecast (SPF). The individual SPF are subject of well-known classification methods but collective SPF is subject of quantification learning algorithms dealing with the novel task to predict the frequency of classes in tested sample e.g. a number of students with unsatisfactory grade. The need for revise of classification methods shows by review of 86 SPF in developing countries. The analysis depicts that most of SPF report the high overall accuracy of classifiers based on decision tree J48, Naïve Base NB, Multilayer Perception MLP, k-Nearest Neighbor k-NN, and Support Vector Machine SVM algorithms, but did not take into account the accuracy of the forecast of a minor presented class. So, given the imbalance in the sample, “useful forecast” with the F1 metric above 50% (75%) are given only in ½ (1/5) of cases of forecasts. The pivotal study of the efficacy factors of binary SPF (data type, algorithm, sample balancing, number of classes etc.). Another important finding is that classifiers with the probabilistic Naïve Bayesian kernel, have more stable behavior to classify different EDM datasets, overcoming MLP, J48, SVM and k-NN based classifiers which sometimes achieved good forecast but sometimes failed in prediction. After that, collected all the above experimental finds associated with relationship between algorithm and data information, we construct 3-15 member heterogeneous ensembles contained strong, moderate and weak classifiers for deterministic individual SPF by simple voting and heuristically proposed how individual probabilistic predictions could be generated and how to aggregate them for overall frequency forecasting, i.e. resolve the task of quantification. The proposed methods of ensemble forecasting and ensemble quantification can become the basis for new economic and robust solutions of various real-world problems in the field of machine learning.
- © 2019, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - S.M. Abdullaev AU - Y.K. Salal PY - 2019/12 DA - 2019/12 TI - Economic deterministic ensemble classifiers with probabilistic output using for robust quantification: study of unbalanced educational datasets BT - Proceedings of the International Scientific and Practical Conference on Digital Economy (ISCDE 2019) PB - Atlantis Press SP - 375 EP - 382 SN - 2352-5428 UR - https://doi.org/10.2991/iscde-19.2019.128 DO - 10.2991/iscde-19.2019.128 ID - Abdullaev2019/12 ER -