DocumentCode
1496133
Title
TCLUST: A Fast Method for Clustering Genome-Scale Expression Data
Author
Dost, Banu ; Wu, Chunlei ; Su, Andrew ; Bafna, Vineet
Author_Institution
Dept. of Comput. Sci. & Eng., Univ. of California, San Diego, CA, USA
Volume
8
Issue
3
fYear
2011
Firstpage
808
Lastpage
818
Abstract
Genes with a common function are often hypothesized to have correlated expression levels in mRNA expression data, motivating the development of clustering algorithms for gene expression data sets. We observe that existing approaches do not scale well for large data sets, and indeed did not converge for the data set considered here. We present a novel clustering method TCLUST that exploits coconnectedness to efficiently cluster large, sparse expression data. We compare our approach with two existing clustering methods CAST and K-means which have been previously applied to clustering of gene-expression data with good performance results. Using a number of metrics, TCLUST is shown to be superior to or at least competitive with the other methods, while being much faster. We have applied this clustering algorithm to a genome-scale gene-expression data set and used gene set enrichment analysis to discover highly significant biological clusters.
Keywords
bioinformatics; data analysis; genetics; genomics; molecular biophysics; TCLUST algorithm; data clustering; data set; genome-scale expression; k-means clustering; Algorithm design and analysis; Bioinformatics; Biological information theory; Clustering algorithms; Clustering methods; Current measurement; Error analysis; Gene expression; Genomics; Probes; Microarray expression; clustering; coconnectedness.; graph algorithms; Algorithms; Animals; Cluster Analysis; Computer Simulation; Databases, Genetic; Gene Expression Profiling; Genomics; Mice; Mice, Inbred Strains; Models, Molecular; Oligonucleotide Array Sequence Analysis;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2010.34
Filename
5467032
Link To Document