International Journal of Computational Intelligence Systems

Volume 10, Issue 1, 2017, Pages 647 - 662

A Combined-Learning Based Framework for Improved Software Fault Prediction

Authors
Chubato Wondaferaw Yohannesefreewwin@yahoo.com, Tianrui Litrli@swjtu.edu.cn
School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China
Received 21 August 2016, Accepted 10 January 2017, Available Online 25 January 2017.
DOI
10.2991/ijcis.2017.10.1.43How to use a DOI?
Keywords
Software Fault Prediction; Software Metrics; Feature Selection; Data Balancing; Machine Learning
Abstract

Software Fault Prediction (SFP) is found to be vital to predict the fault-proneness of software modules, which allows software engineers to focus development activities on fault-prone modules, thereby prioritize and optimize tests, improve software quality and make better use of resources. In this regard, machine learning has been successfully applied to solve classification problems for SFP. Nevertheless, the presence of different software metrics, the redundant and irrelevant features and the imbalanced nature of software datasets have created more and more challenges for the classification problems. Therefore, the objective of this study is to independently examine software metrics with multiple Feature Selection (FS) combined with Data Balancing (DB) using Synthetic Minority Oversampling Techniques for improving classification performance. Accordingly, a new framework that efficiently handles those challenges in a combined form on both Object Oriented Metrics (OOM) and Static Code Metrics (SCM) datasets is proposed. The experimental results confirm that the prediction performance could be compromised without suitable Feature Selection Techniques (FST). To mitigate that, data must be balanced. Thus our combined technique assures the robust performance. Furthermore, a combination of Random Forts (RF) with Information Gain (IG) FS yields the highest Receiver Operating Characteristic (ROC) curve (0.993) value, which is found to be the best combination when SCM are used, whereas the combination of RF with Correlation-based Feature Selection (CFS) guarantees the highest ROC (0.909) value, which is found to be the best choice when OOM are used. Therefore, as shown in this study, software metrics used to predict the fault proneness of the software modules must be carefully examined and suitable FST for software metrics must be cautiously selected. Moreover, DB must be applied in order to obtain robust performance. In addition to that, dealing with the challenges mentioned above, the proposed framework ensures the remarkable classification performance and lays the pathway to quality assurance of software.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Download article (PDF)
View full text (HTML)

Journal
International Journal of Computational Intelligence Systems
Volume-Issue
10 - 1
Pages
647 - 662
Publication Date
2017/01/25
ISSN (Online)
1875-6883
ISSN (Print)
1875-6891
DOI
10.2991/ijcis.2017.10.1.43How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Chubato Wondaferaw Yohannese
AU  - Tianrui Li
PY  - 2017
DA  - 2017/01/25
TI  - A Combined-Learning Based Framework for Improved Software Fault Prediction
JO  - International Journal of Computational Intelligence Systems
SP  - 647
EP  - 662
VL  - 10
IS  - 1
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.2017.10.1.43
DO  - 10.2991/ijcis.2017.10.1.43
ID  - Yohannese2017
ER  -