Title :
A Novel Approach for Discovering Overlapping Clusters in Gene Expression Data
Author :
Ma, P.C.H. ; Chan, Keith C C
Author_Institution :
Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong
fDate :
7/1/2009 12:00:00 AM
Abstract :
Many existing clustering algorithms have been used to identify coexpressed genes in gene expression data. These algorithms are used mainly to partition data in the sense that each gene is allowed to belong only to one cluster. Since proteins typically interact with different groups of proteins in order to serve different biological roles, the genes that produce these proteins are therefore expected to coexpress with more than one group of genes. In other words, some genes are expected to belong to more than one cluster. This poses a challenge to gene expression data clustering as there is a need for overlapping clusters to be discovered in a noisy environment. For this task, we propose an effective information theoretical approach, which consists of an initial clustering phase and a second reclustering phase, in this paper. The proposed approach has been tested with both simulated and real expression data. Experimental results show that it can improve the performances of existing clustering algorithms and is able to effectively uncover interesting patterns in noisy gene expression data so that, based on these patterns, overlapping clusters can be discovered.
Keywords :
bioinformatics; genetics; molecular biophysics; proteins; bioinformatics; gene expression data; noisy environment; overlapping clustering algorithm; proteins interaction; reclustering phase analysis; Biological information theory; Clustering algorithms; Data mining; Gene expression; Genetic communication; Information theory; Partitioning algorithms; Proteins; RNA; Testing; Working environment noise; Bioinformatics; data mining; gene expression data clustering; information theory; Algorithms; Cluster Analysis; Computer Simulation; Databases, Genetic; Gene Expression; Gene Expression Profiling; Information Theory; Models, Genetic; Models, Statistical; Yeasts;
Journal_Title :
Biomedical Engineering, IEEE Transactions on
DOI :
10.1109/TBME.2009.2015055