Author :
Hua, Dong ; Chen, Dechang ; Youssef, Abdou
Abstract :
Gene selection with microarray data is an important task towards the study of genomics. The goal is to identify the optimal subset of genes such that maximum discrimination power across samples (e.g., tumor types) while minimum redundancy among genes are achieved. Essentially, it is NPcomplete. Approximation algorithms are usually solicited including individual ranking and sequential forward selection. Typically, from source input microarray data to output selected genes, multiple steps including preprocessing, discretization, discrimination modeling, redundancy modeling, optimization formularization, classification, and evaluation are involved in the presence of a number of options (techniques) for each of them. Putting them together, we form the concept of customization for gene selection in this paper, that is, configure the entire scenario such that various maybe trivial techniques can team work with superior performance rather than focus on certain technique within a single step (e.g., discrimination power modeling). One configuration following the principle of simplicity is constructed in this paper which identi?es genes effectively shown by experiments.