Proceedings of the International Scientific and Practical Conference on Digital Economy (ISCDE 2019)

Economic deterministic ensemble classifiers with probabilistic output using for robust quantification: study of unbalanced educational datasets

Authors
S.M. Abdullaev, Y.K. Salal
Corresponding Author
S.M. Abdullaev
Available Online December 2019.
DOI
10.2991/iscde-19.2019.128How to use a DOI?
Keywords
Educational Data Mining, monitoring, individual and collective learning outcomes, Weka, classification, assessment, deterministic and probabilistic forecasts, ensemble classification and quantification
Abstract

The overall goal of our work is to find economic and robust supervised machine learning methods which adequate to both individual and collective Student Performance Forecast (SPF). The individual SPF are subject of well-known classification methods but collective SPF is subject of quantification learning algorithms dealing with the novel task to predict the frequency of classes in tested sample e.g. a number of students with unsatisfactory grade. The need for revise of classification methods shows by review of 86 SPF in developing countries. The analysis depicts that most of SPF report the high overall accuracy of classifiers based on decision tree J48, Naïve Base NB, Multilayer Perception MLP, k-Nearest Neighbor k-NN, and Support Vector Machine SVM algorithms, but did not take into account the accuracy of the forecast of a minor presented class. So, given the imbalance in the sample, “useful forecast” with the F1 metric above 50% (75%) are given only in ½ (1/5) of cases of forecasts. The pivotal study of the efficacy factors of binary SPF (data type, algorithm, sample balancing, number of classes etc.). Another important finding is that classifiers with the probabilistic Naïve Bayesian kernel, have more stable behavior to classify different EDM datasets, overcoming MLP, J48, SVM and k-NN based classifiers which sometimes achieved good forecast but sometimes failed in prediction. After that, collected all the above experimental finds associated with relationship between algorithm and data information, we construct 3-15 member heterogeneous ensembles contained strong, moderate and weak classifiers for deterministic individual SPF by simple voting and heuristically proposed how individual probabilistic predictions could be generated and how to aggregate them for overall frequency forecasting, i.e. resolve the task of quantification. The proposed methods of ensemble forecasting and ensemble quantification can become the basis for new economic and robust solutions of various real-world problems in the field of machine learning.

Copyright
© 2019, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the International Scientific and Practical Conference on Digital Economy (ISCDE 2019)
Series
Advances in Economics, Business and Management Research
Publication Date
December 2019
ISBN
10.2991/iscde-19.2019.128
ISSN
2352-5428
DOI
10.2991/iscde-19.2019.128How to use a DOI?
Copyright
© 2019, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - S.M. Abdullaev
AU  - Y.K. Salal
PY  - 2019/12
DA  - 2019/12
TI  - Economic deterministic ensemble classifiers with probabilistic output using for robust quantification: study of unbalanced educational datasets
BT  - Proceedings of the International Scientific and Practical Conference on Digital Economy (ISCDE 2019)
PB  - Atlantis Press
SP  - 375
EP  - 382
SN  - 2352-5428
UR  - https://doi.org/10.2991/iscde-19.2019.128
DO  - 10.2991/iscde-19.2019.128
ID  - Abdullaev2019/12
ER  -