Title :
Comparison of hybrid feature selection models on gene expression data
Author :
Saengsiri, Patharawut ; Wichian, Sageemas Na ; Meesad, Phayung ; Herwig, Unger
Author_Institution :
Dept. of Inf. Technol., KMUTNB, Bangkok, Thailand
Abstract :
Microarray data contains thousands of genes which are used to evaluate expression level. However, most of them are not associated with cancer diseases and leads to the curse of dimensionality. The challenge based on microarray data is feature selection which searches for subsets of informative genes. At the moment, these techniques focus on filter and wrapper approaches to discover subsets of genes. Filter approach is better than wrapper approach in terms of time consuming. On the contrary, the accuracy of wrapper approach is higher than that of filter approach. However, it is more beneficial to reduce the time process and increase accuracy simultaneously when searching for subsets of genes. Thus, this paper proposes comparison of hybrid feature selection models on gene expression datasets, this consists of four steps 1) filter subgroup of gene using Correlation based Feature Selection (CFS), Gain Ratio (GR), and Information Gain (INFO) 2) transfers output of each filter method into a wrapper approach that´s based on the Support Vector Machine (SVM) classifier and two heuristic searches which are Greedy Search (GS) and Genetic Algorithm (GA) 3) generate hybrid feature selection model CFSSVMGA, CSFSVMGS, GRSVMGA, GRSVMGS, INFOSVMGA, and INFOSVMGS 4) performance comparison using precision, recall, F-measure, and accuracy rate. Results from the experiment concluded the CFSSVMGA model outperformed other models on three public gene expression datasets.
Keywords :
bioinformatics; feature extraction; filtering theory; genetic algorithms; greedy algorithms; pattern classification; support vector machines; correlation based feature selection; filter approach; gain ratio; gene expression dataset; genetic algorithm; greedy search; heuristic searches; hybrid feature selection models; information gain; microarray data; support vector machine classifier; time process reduction; wrapper approach; Accuracy; Cancer; Classification algorithms; Colon; Filtering algorithms; Gene expression; Support vector machines; feature selection; gene expression; support vector machine;
Conference_Titel :
Knowledge Engineering, 2010 8th International Conference on ICT and
Conference_Location :
Bangkok
Print_ISBN :
978-1-4244-9874-1
Electronic_ISBN :
2157-0981
DOI :
10.1109/ICTKE.2010.5692905