مرکز منطقه ای اطلاع رساني علوم و فناوري - Hierarchical Clustering of High- Throughput Expression Data Based on General Dependences

DocumentCode :

86546

Title :

Hierarchical Clustering of High- Throughput Expression Data Based on General Dependences

Author :

Tianwei Yu ; Hesen Peng

Author_Institution :

Dept. of Biostat. & Bioinf., Emory Univ., Atlanta, GA, USA

Volume :

Issue :

fYear :

2013

fDate :

July-Aug. 2013

Firstpage :

1080

Lastpage :

1085

Abstract :

High-throughput expression technologies, including gene expression array and liquid chromatography--mass spectrometry (LC-MS) and so on, measure thousands of features, i.e., genes or metabolites, on a continuous scale. In such data, both linear and nonlinear relations exist between features. Nonlinear relations can reflect critical regulation patterns in the biological system. However, they are not identified and utilized by traditional clustering methods based on linear associations. Clustering based on general dependences, i.e., both linear and nonlinear relations, is hampered by the high dimensionality and high noise level of the data. We developed a sensitive nonparametric measure of general dependence between (groups of) random variables in high dimensions. Based on this dependence measure, we developed a hierarchical clustering method. In simulation studies, the method outperformed correlation- and mutual information (MI)-based hierarchical clustering methods in clustering features with nonlinear dependences. We applied the method to a microarray data set measuring the gene expression in cell-cycle time series to show it generates biologically relevant results. The R code is available at http://userwww.service.emory.edu/~tyu8/GDHC.

Keywords :

bioinformatics; cellular biophysics; genetics; lab-on-a-chip; statistical analysis; LC-MS method; biological system critical regulation pattern; cell-cycle time series; correlation-based hierarchical clustering method; data high dimensionality effect; data high noise level effect; feature linear relation; feature nonlinear dependence clustering; feature nonlinear relation; gene expression array; gene expression measurement; general dependence sensitive nonparametric measure; high dimension random variable; high- throughput expression data; high-throughput expression technology; linear association; liquid chromatography-mass spectrometry; metabolite; microarray data set; mutual information-based hierarchical clustering method; simulation study; Bioinformatics; Clustering methods; Couplings; Noise; Random variables; Standards; Vectors; Algorithms; clustering; similarity measures;

fLanguage :

English

Journal_Title :

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

1545-5963

Type :

jour

DOI :

10.1109/TCBB.2013.99

Filename :

6582410

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=86546