DocumentCode :
952365
Title :
Gene Expression Data Analysis Using a Novel Approach to Biclustering Combining Discrete and Continuous Data
Author :
Christinat, Yann ; Wachmann, Bernd ; Zhang, Lei
Author_Institution :
Lab. for Comput. Biol. & Bioinf., Ecole Polytech. Fed. de Lausanne, Lausanne
Volume :
5
Issue :
4
fYear :
2008
Firstpage :
583
Lastpage :
593
Abstract :
Many different methods exist for pattern detection in gene expression data. In contrast to classical methods, biclustering has the ability to cluster a group of genes together with a group of conditions (replicates, set of patients or drug compounds). However, since the problem is NP-complex, most algorithms use heuristic search functions and therefore might converge towards local maxima. By using the results of biclustering on discrete data as a starting point for a local search function on continuous data, our algorithm avoids the problem of heuristic initialization. Similar to OPSM, our algorithm aims to detect biclusters whose rows and columns can be ordered such that row values are growing across the bicluster´s columns and vice-versa. Results have been generated on the yeast genome (Saccharomyces cerevisiae), a human cancer dataset and random data. Results on the yeast genome showed that 89% of the one hundred biggest non-overlapping biclusters were enriched with Gene Ontology annotations. A comparison with OPSM and ISA demonstrated a better efficiency when using gene and condition orders. We present results on random and real datasets that show the ability of our algorithm to capture statistically significant and biologically relevant biclusters.
Keywords :
DNA; arrays; biochemistry; cancer; data analysis; data mining; genetics; medical computing; microorganisms; ontologies (artificial intelligence); pattern clustering; statistical analysis; tumours; DNA microarray analysis; Gene Ontology annotations; NP-complex problem; Saccharomyces cerevisiae; biclustering approach; continuous data; data mining; discrete data; gene expression data analysis; human cancer dataset; local search function; order-preserving submatrices; pattern detection; random data; yeast genome; Bioinformatics (genome or protein) databases; Data and knowledge visualization; Data mining; Graph and tree search strategies; Machine learning; Algorithms; Cluster Analysis; Data Interpretation, Statistical; Gene Expression Profiling; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Proteome; Signal Transduction;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2007.70251
Filename :
4359900
Link To Document :
بازگشت