International Journal of Computational Intelligence Systems

Volume 12, Issue 1, November 2018, Pages 282 - 298

An Empirical Study for Enhanced Software Defect Prediction Using a Learning-Based Framework

Authors
Kamal Bashir1, 2, Tianrui Li1, *, Chubato Wondaferaw Yohannese1
1School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China
2Department of Information Technology, College of Computer Science and Information Technology, Karary University, Omdurman 12304, Sudan
*

Corresponding author. Email: trli@swjtu.edu.cn

Received 25 March 2018, Accepted 11 January 2019, Available Online 28 January 2019.
DOI
10.2991/ijcis.2018.125905638How to use a DOI?
Keywords
Software defect prediction; Feature selection; Data sampling; Noise filtering
Abstract

The object of software defect prediction (SDP) is to identify defect-prone modules. This is achieved through constructing prediction models using datasets obtained by mining software historical depositories. However, data mined from these depositories are often associated with high dimensionality, class imbalance, and mislabels which deteriorate classification performance and increase model complexity. In order to mitigate the consequences, this paper proposes an integrated preprocessing framework in which feature selection (FS), data balance (DB), and noise filtering (NF) techniques are fused to deal with the factors that deteriorate learning performance. We apply the proposed framework on three software metrics, namely static code metric (SCM), object oriented metric (OOM), and combined metric (CombM) and build models based on four scenarios (S): (S1) original data; (S2) FS subsets; (S3) FS subsets after DB using random under sampling (RUS) and synthetic minority oversampling technique (SMOTE); (S4) FS subsets after DB (RUS and SMOTE); and NF using iterative partitioning filter (IPF) and iterative noise filtering based on the fusing of classifiers (INFFC). Empirical results show that 1. the integrated preprocessing of FS, DB, and NF improves the performance of all the models built for SDP, 2. for all FS methods, all the models improve performance progressively from S2 through to S4 in all the software metrics, 3. model performance based on S4 is statistically significantly better than the performance based on S3 for all the software metrics, and 4. in order to achieve optimal model performance for SDP, appropriate implementation of the proposed framework is required. The results also validate the effectiveness of our proposal and provide guidelines for achieving quality training data that enhances model performance for SDP.

Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)
View full text (HTML)

Journal
International Journal of Computational Intelligence Systems
Volume-Issue
12 - 1
Pages
282 - 298
Publication Date
2019/01/28
ISSN (Online)
1875-6883
ISSN (Print)
1875-6891
DOI
10.2991/ijcis.2018.125905638How to use a DOI?
Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Kamal Bashir
AU  - Tianrui Li
AU  - Chubato Wondaferaw Yohannese
PY  - 2019
DA  - 2019/01/28
TI  - An Empirical Study for Enhanced Software Defect Prediction Using a Learning-Based Framework
JO  - International Journal of Computational Intelligence Systems
SP  - 282
EP  - 298
VL  - 12
IS  - 1
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.2018.125905638
DO  - 10.2991/ijcis.2018.125905638
ID  - Bashir2019
ER  -