Title :
Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data
Author :
Shah, Mohak ; Marchand, Mario ; Corbeil, Jacques
Author_Institution :
Accenture, Chicago, IL, USA
Abstract :
One of the objectives of designing feature selection learning algorithms is to obtain classifiers that depend on a small number of attributes and have verifiable future performance guarantees. There are few, if any, approaches that successfully address the two goals simultaneously. To the best of our knowledge, such algorithms that give theoretical bounds on the future performance have not been proposed so far in the context of the classification of gene expression data. In this work, we investigate the premise of learning a conjunction (or disjunction) of decision stumps in Occam´s Razor, Sample Compression, and PAC-Bayes learning settings for identifying a small subset of attributes that can be used to perform reliable classification tasks. We apply the proposed approaches for gene identification from DNA microarray data and compare our results to those of the well-known successful approaches proposed for the task. We show that our algorithm not only finds hypotheses with a much smaller number of genes while giving competitive classification accuracy but also having tight risk guarantees on future performance, unlike other approaches. The proposed approaches are general and extensible in terms of both designing novel algorithms and application to other domains.
Keywords :
biology computing; learning (artificial intelligence); molecular biophysics; pattern classification; DNA microarray data; Occam Razor learning; PAC-Bayes learning; decision stump; feature selection; gene expression data classification; gene identification; microarray data learning; sample compression learning; Algorithm design and analysis; Bayesian methods; Classification algorithms; Data processing; Encoding; Pattern analysis; Upper bound; Microarray data classification; feature selection; gene identification.; risk bounds;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
DOI :
10.1109/TPAMI.2011.82