DocumentCode
1813686
Title
Robust ensemble feature selection for high dimensional data sets
Author
Ben Brahim, Afef ; Limam, Mohamed
Author_Institution
LARODEC, Univ. of Tunis, Tunis, Tunisia
fYear
2013
fDate
1-5 July 2013
Firstpage
151
Lastpage
157
Abstract
Feature selection is an important and frequently used technique in data preprocessing for performing data mining on large scale data sets. Several feature selection methods exist in the literature, each of them uses a specific feature evaluation criterion and may produce different feature subsets even when applied to the same data set. There is not a better resulting subset than the others but all the obtained subsets are the best subsets among the whole feature space. Thinking of a way to take advantage of different feature selection methods simultaneously is a challenging data mining problem. Recently, ensemble feature selection concept have been introduced to help solve this problem. Multiple feature selections are combined in order to produce more robust feature subsets and better classification results. However, one of the most critical decisions when performing ensemble feature selection is the aggregation technique to use for combining the resulting feature lists from the multiple algorithms into a single decision for each feature. In this paper, we propose a robust feature aggregation technique to combine the results of three different filter methods. Our aggregation technique is based on measuring feature algorithms confidence and conflict with the other ones in order to assign a reliability factor guiding the final feature selection. Experiments on high dimensional data sets show that the proposed approach outperforms the single feature selection algorithms as well as two well known aggregation methods in terms of classification performance.
Keywords
data mining; pattern classification; aggregation technique; confidence measurement; conflict measurement; data mining problem; data preprocessing; feature subsets; filter methods; high dimensional data sets; reliability factor; robust ensemble feature selection; Breast cancer; Data mining; Decision trees; Machine learning algorithms; Robustness; Support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing and Simulation (HPCS), 2013 International Conference on
Conference_Location
Helsinki
Print_ISBN
978-1-4799-0836-3
Type
conf
DOI
10.1109/HPCSim.2013.6641406
Filename
6641406
Link To Document