DocumentCode :
2840197
Title :
Exploring Software Quality Classification with a Wrapper-Based Feature Ranking Technique
Author :
Gao, Kehan ; Khoshgoftaar, Taghi ; Napolitano, Amri
Author_Institution :
Eastern Connecticut State Univ., Willimantic, CT, USA
fYear :
2009
fDate :
2-4 Nov. 2009
Firstpage :
67
Lastpage :
74
Abstract :
Feature selection is a process of selecting a subset of relevant features for building learning models. It is an important activity for data preprocessing used in software quality modeling and other data mining problems. Feature selection algorithms can be divided into two categories, feature ranking and feature subset selection. Feature ranking orders the features by a criterion and a user selects some of the features that are appropriate for a given scenario. Feature subset selection techniques search the space of possible feature subsets and evaluate the suitability of each. This paper investigates performance metric based feature ranking techniques by using the multilayer perceptron (MLP) learner with nine different performance metrics. The nine performance metrics include overall accuracy (OA), default F-measure (DFM), default geometric mean (DGM), default arithmetic mean (DAM), area under ROC (AUC), area under PRC (PRC), best F-measure (BFM), best geometric mean (BGM) and best arithmetic mean (BAM). The goal of the paper is to study the effect of the different performance metrics on the feature ranking results, which in turn influences the classification performance. We assessed the performance of the classification models constructed on those selected feature subsets through an empirical case study that was carried out on six data sets of real-world software systems. The results demonstrate that AUC, PRC, BFM, BGM and BAM as performance metrics for feature ranking outperformed the other performance metrics, OA, DFM, DGMand DAM, unanimously across all the data sets and therefore are recommended based on this study. In addition, the performances of the classification models were maintained or even improved when over 85 percent of the features were eliminated from the original data sets.
Keywords :
data mining; learning (artificial intelligence); multilayer perceptrons; sensitivity analysis; software metrics; software quality; area under PRC; area under ROC; best F-measure; best arithmetic mean; best geometric mean; data mining; data preprocessing; default F-measure; default arithmetic mean; default geometric mean; feature selection; learning models; multilayer perceptron; overall accuracy; software quality classification; wrapper-based feature ranking technique; Arithmetic; Data mining; Data preprocessing; Design for manufacture; Magnesium compounds; Measurement; Multilayer perceptrons; Partial response channels; Software quality; Software systems; feature ranking technique; performance metric; software quality modeling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence, 2009. ICTAI '09. 21st International Conference on
Conference_Location :
Newark, NJ
ISSN :
1082-3409
Print_ISBN :
978-1-4244-5619-2
Electronic_ISBN :
1082-3409
Type :
conf
DOI :
10.1109/ICTAI.2009.24
Filename :
5364717
Link To Document :
بازگشت