Title :
An information theoretic approach for the discovery of irregular and repetitive patterns in genomic data
Author :
Davis, Willard ; Kalyanaraman, Ananth ; Cook, Diane
Author_Institution :
Sch. of Electr. Eng. & Comput. Sci., Washington State Univ., Pullman, WA
Abstract :
The unprecedented rate at which genomic data is accumulated underscores the need to develop highly efficient analytical capabilities. Traditionally, most of the effort post-sequencing has been focused on the identification and annotation of genes along with their promoters and regulatory elements. However, a major part of the vastness outside the gene-space is still left unexplored because of a lack of appropriate computational tools. Here, we propose a new approach for exploring and describing a genome without biasing the search process towards already known structural entities. Our primary objective is to discover novel conserved patterns that would typically fall off the scope of the current suite of repeat finding tools because of irregularities in their structure. The output is a hierarchy of patterns with arbitrary structural characteristics. A hierarchical representation captures the genomic sequence content at an abstract level and offers novel ways to examine the information contained in them. Our approach is an information theoretic search process which uses pattern matching techniques for processing the sequence data. Preliminary evaluation on the Drosophila genome has resulted in the finding of a number of irregular patterns. Discovering new patterns is an important problem in both whole- and comparative genomic application domains. The proposed approach can provide an information-theoretic framework for conducting pattern and knowledge discovery on genomic data.
Keywords :
bioinformatics; cellular biophysics; data mining; genetics; information theory; molecular biophysics; Drosophila genome; genes; genomic data; genomic sequence content; information theory; irregular patterns; knowledge discovery; pattern matching; repetitive patterns; Bioinformatics; Biological information theory; Diseases; Evolution (biology); Frequency; Genomics; Humans; Scattering; Sequences; Software tools;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08. IEEE Symposium on
Conference_Location :
Sun Valley, ID
Print_ISBN :
978-1-4244-1778-0
Electronic_ISBN :
978-1-4244-1779-7
DOI :
10.1109/CIBCB.2008.4675756