Bagging-Based Logistic Regression With Spark: A Medical Data Mining Method
- 10.2991/ameii-16.2016.288How to use a DOI?
- Medical Data Mining, Bagging, Logistic Regression, Spark
Medical data in various organizational forms is voluminous and heterogeneous, it is significant to utilize efficient data mining techniques to explore the development rules of diverse diseases. However, many single-node data analysis tools lack enough memory and computing power, therefore, distributed and parallel computing is in great demand. In this paper, we propose a comprehensive medical data mining method consisting of data preprocessing and bagging-based logistic regression with Spark (BLR algorithm) which is improved for better compatibility with Spark, a fast parallel computing framework. Experimental results indicated that although the BLR algorithm took a little more duration than logistic regression (LR), it was 2.12% higher than LR in accuracy and outperformed LR with other common evaluation indexes.
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Jian Pan AU - Yiang Hua AU - Xingtian Liu AU - Zhiqiang Chen AU - Zhaofeng Yan PY - 2016/04 DA - 2016/04 TI - Bagging-Based Logistic Regression With Spark: A Medical Data Mining Method BT - Proceedings of the 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) PB - Atlantis Press SN - 2352-5401 UR - https://doi.org/10.2991/ameii-16.2016.288 DO - 10.2991/ameii-16.2016.288 ID - Pan2016/04 ER -