Title :
Gene Expression Clustering: a Novel Graph Partitioning Approach
Author :
Chen, Yanhua ; Dong, Ming ; Rege, Manjeet
Author_Institution :
Wayne State Univ., Detroit
Abstract :
In order to help understand how the genes are affected by different disease conditions in a biological system, clustering is typically performed to analyze gene expression data. In this paper, we propose to solve the clustering problem using a graph theoretical approach, and apply a novel graph partitioning model -isoperimetric graph partitioning (IGP), to group biological samples from gene expression data. The IGP algorithm has several advantages compared to the well-established spectral graph partitioning (SGP) model. First, IGP requires a simple solution to a sparse system of linear equations instead of the eigen-problem in the SGP model. Second, IGP avoids degenerate cases produced by spectral approach to achieve a partition with higher accuracy. Moreover, we integrate unsupervised gene selection into the proposed approach through two-way ordering of gene expression data, such that we can eliminate irrelevant or redundant genes in the data and obtain an improved clustering result. We evaluate our approach on several well-known problems involving gene expression profiles of colon cancer and leukemia subtypes. Our experiment results demonstrate that IGP constantly outperforms SGP and produces a better result that is closer to the original labeling of sample sets provided by domain experts. Furthermore, the clustering accuracy is improved significantly when IGP is integrated with the unsupervised gene (feature) selection.
Keywords :
biology computing; data analysis; diseases; eigenvalues and eigenfunctions; genetics; graph theory; pattern clustering; sparse matrices; unsupervised learning; biological system; disease; eigen-problem; gene expression data clustering; isoperimetric graph partitioning; linear equation; sparse system; spectral graph partitioning; unsupervised gene selection; Biological system modeling; Biological systems; Clustering algorithms; Colon; Data analysis; Diseases; Equations; Gene expression; Partitioning algorithms; Performance analysis;
Conference_Titel :
Neural Networks, 2007. IJCNN 2007. International Joint Conference on
Conference_Location :
Orlando, FL
Print_ISBN :
978-1-4244-1379-9
Electronic_ISBN :
1098-7576
DOI :
10.1109/IJCNN.2007.4371187