Title :
A Biclustering Algorithm to Discover Functional Modules from ENCODE ChIP-Seq Data
Author :
Chao Wu ; Bakshi, Ankita ; Aronow, Bruce ; Jegga, Anil ; Bhatnagar, Rohit
Author_Institution :
Dept. of Comput. Sci., Univ. of Cincinnati, Cincinnati, OH, USA
Abstract :
A number of biclustering algorithms have been introduced to discover local gene expression patterns in micro array data. Also, High-throughput biological techniques such as ChIP-seq have generated massive genome-wide data and offered ideal opportunities where biclustering can help unveil underlying biological mechanisms. Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) has been used to identify how transcription factors (TF) and other chromatin-associated proteins influence binding mechanisms. In the data the peaks indicate binding events with possible binding locations and their strengths. It is essential that the values associated with the peaks be as close as possible to each other in the selected biclusters. Here we present a novel framework capable of finding statistically significant biclusters on this type of real-valued datasets. The ideal biclusters should contain similar values with very low variance. We applied our algorithm on ChIP-seq datasets recently released from the ENCODE project and uncovered meaningful biclusters of genes and TFs which can be interpreted as local combinatorial regulation patterns of TFs. We also compared our proposed method to several competing biclustering algorithms to show that it outperforms others in unveiling this type of patterns.
Keywords :
DNA; biology computing; data analysis; genetics; genomics; pattern clustering; ENCODE ChIP-seq data; TF; biclustering algorithms; binding events; binding locations; binding mechanisms; chromatin immunoprecipitation; chromatin-associated proteins; encyclopedia of DNA elements; functional module discovery; high-throughput biological techniques; local combinatorial regulation patterns; local gene expression patterns; massively parallel sequencing; microarray data; real-valued datasets; statistically significant biclusters; transcription factors; Algorithm design and analysis; Clustering algorithms; Gene expression; Measurement; Sociology; Standards; Statistics; Biclustering; ChIP-seq analysis; combinatorial regulation of TFs;
Conference_Titel :
Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4799-3143-9
DOI :
10.1109/ICDMW.2013.96