Title :
Bayesian class discovery in microarray datasets
Author :
Roth, Volker ; Lange, Tilman
Author_Institution :
Inst. for Computational Sci., ETH Zurich, Switzerland
fDate :
5/1/2004 12:00:00 AM
Abstract :
A novel approach to class discovery in gene expression datasets is presented. In the context of clinical diagnosis, the central goal of class discovery algorithms is to simultaneously find putative (sub-)types of diseases and to identify informative subsets of genes with disease-type specific expression profile. Contrary to many other approaches in the literature, the method presented implements a wrapper strategy for feature selection, in the sense that the features are directly selected by optimizing the discriminative power of the used partitioning algorithm. The usual combinatorial problems associated with wrapper approaches are overcome by a Bayesian inference mechanism. On the technical side, we present an efficient optimization algorithm with guaranteed local convergence property. The only free parameter of the optimization method is selected by a resampling-based stability analysis. Experiments with Leukemia and Lymphoma datasets demonstrate that our method is able to correctly infer partitions and corresponding subsets of genes which both are relevant in a biological sense. Moreover, the frequently observed problem of ambiguities caused by different but equally high-scoring partitions is successfully overcome by the model selection method proposed.
Keywords :
Bayes methods; arrays; biology computing; cancer; data analysis; feature extraction; genetics; medical computing; patient diagnosis; Bayesian class discovery; Bayesian interference mechanism; class discovery algorithms; clinical diagnosis; disease-type specific expression profile; diseases; feature selection; gene expression datasets; leukemia; lymphona; microarray datasets; model selection method; resampling-based stability analysis; wrapper strategy; Bayesian methods; Clinical diagnosis; Convergence; Diseases; Gene expression; Inference algorithms; Inference mechanisms; Optimization methods; Partitioning algorithms; Stability analysis; Algorithms; Bayes Theorem; Cluster Analysis; Databases, Nucleic Acid; Gene Expression Profiling; Genetic Screening; Humans; Leukemia; Models, Genetic; Models, Statistical; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Reproducibility of Results; Sensitivity and Specificity; Sequence Alignment; Sequence Analysis, DNA;
Journal_Title :
Biomedical Engineering, IEEE Transactions on
DOI :
10.1109/TBME.2004.824139