Title :
Multiclass Gene Selection Using Pareto-Fronts
Author :
Rajapakse, Jagath C. ; Mundra, Piyushkumar A.
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
Filter methods are often used for selection of genes in multiclass sample classification by using microarray data. Such techniques usually tend to bias toward a few classes that are easily distinguishable from other classes due to imbalances of strong features and sample sizes of different classes. It could therefore lead to selection of redundant genes while missing the relevant genes, leading to poor classification of tissue samples. In this manuscript, we propose to decompose multiclass ranking statistics into class-specific statistics and then use Pareto-front analysis for selection of genes. This alleviates the bias induced by class intrinsic characteristics of dominating classes. The use of Pareto-front analysis is demonstrated on two filter criteria commonly used for gene selection: F-score and KW-score. A significant improvement in classification performance and reduction in redundancy among top-ranked genes were achieved in experiments with both synthetic and real-benchmark data sets.
Keywords :
Pareto analysis; biology computing; genetics; genomics; F-score; KW-score; Pareto-front analysis; filter methods; microarray data; multiclass gene selection; tissue samples; Benchmark testing; Bioinformatics; Cancer; Computational biology; Gene expression; Redundancy; Training; Aggregation statistics; Pareto-front analysis; filter methods; gene selection; multiobjective evolutionary optimization; Algorithms; Computational Biology; Databases, Genetic; Gene Expression Profiling; Humans; Models, Genetic; Models, Statistical; Neoplasms; Statistics, Nonparametric;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2013.1