DocumentCode :
237261
Title :
A Mutual Information-Based Hybrid Feature Selection Method for Software Cost Estimation Using Feature Clustering
Author :
Qin Liu ; Shihai Shi ; Hongming Zhu ; Jiakai Xiao
Author_Institution :
Sch. of Software Eng., Tongji Univ., Shanghai, China
fYear :
2014
fDate :
21-25 July 2014
Firstpage :
27
Lastpage :
32
Abstract :
Feature selection methods are designed to obtain the optimal feature subset from the original features to give the most accurate prediction. So far, supervised and unsupervised feature selection methods have been discussed and developed separately. However, these two methods can be combined together as a hybrid feature selection method for some data sets. In this paper, we propose a mutual information-based (MI-based) hybrid feature selection method using feature clustering. In the unsupervised learning stage, the original features are grouped into several clusters based on the feature similarity to each other with agglomerative hierarchical clustering. Then in the supervised learning stage, the feature in each cluster that can maximize the feature similarity with the response feature which represents the class label is selected as the representative feature. These representative features compose the feature subset. Our contribution includes 1)the newly proposed feature selection method and 2)the application of feature clustering for software cost estimation. The proposed method employs wrapper approaches, so it can evaluate the prediction performance of each feature subset to determine the optimal one. The experimental results in software cost estimation demonstrate that the proposed method can outperform at least 11.5% and 14.8% than the supervised feature selection method INMIFS and mRMRFS in ISBSG R8 and Desharnais data set in terms of PRED (0.25) value.
Keywords :
feature selection; pattern clustering; software cost estimation; unsupervised learning; feature clustering; feature similarity; mutual information-based hybrid feature selection method; software cost estimation; unsupervised learning; Cognition; Entropy; Estimation; Mutual information; Random variables; Redundancy; Software; feature clustering; feature selection; mutual information; software cost estimation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Software and Applications Conference (COMPSAC), 2014 IEEE 38th Annual
Conference_Location :
Vasteras
Type :
conf
DOI :
10.1109/COMPSAC.2014.99
Filename :
6899197
Link To Document :
بازگشت