DocumentCode :
1016389
Title :
Modeling and Visualizing Uncertainty in Gene Expression Clusters Using Dirichlet Process Mixtures
Author :
Rasmussen, Carl Edward ; de la Cruz, B.J. ; Ghahramani, Zoubin ; Wild, David L.
Author_Institution :
Dept. of Eng., Univ. of Cambridge, Cambridge, UK
Volume :
6
Issue :
4
fYear :
2009
Firstpage :
615
Lastpage :
628
Abstract :
Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data.
Keywords :
Bayes methods; Gaussian distribution; genetics; molecular biophysics; molecular clusters; probability; Dirichlet process mixtures; Gaussian covariances; Rosetta compendium; bootstrap approach; clustering methods; gene expression clusters; hierarchical clustering; high-dimensional nontime series; nonparametric Bayesian alternative; probability; standard linkage algorithms; uncertainty modeling; Bayesian methods; Biological system modeling; Clustering algorithms; Clustering methods; Couplings; Data analysis; Data visualization; Gene expression; Measurement standards; Uncertainty; Bioinformatics (genome or protein) databases; Clustering; Monte Carlo.; and association rules; bioinformatics (genome or protein) databases; biology and genetics; classification; statistical computing; stochastic processes; Algorithms; Artificial Intelligence; Bayes Theorem; Cluster Analysis; Computational Biology; Gene Expression Profiling; Models, Genetic; Models, Statistical; Monte Carlo Method; Multigene Family; Normal Distribution; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Sequence Alignment; Sequence Analysis, DNA; Software; Stochastic Processes;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2007.70269
Filename :
4407680
Link To Document :
بازگشت