DocumentCode
3394270
Title
An information theoretic approach for the discovery of irregular and repetitive patterns in genomic data
Author
Davis, Willard ; Kalyanaraman, Ananth ; Cook, Diane
Author_Institution
Sch. of Electr. Eng. & Comput. Sci., Washington State Univ., Pullman, WA
fYear
2008
fDate
15-17 Sept. 2008
Firstpage
30
Lastpage
37
Abstract
The unprecedented rate at which genomic data is accumulated underscores the need to develop highly efficient analytical capabilities. Traditionally, most of the effort post-sequencing has been focused on the identification and annotation of genes along with their promoters and regulatory elements. However, a major part of the vastness outside the gene-space is still left unexplored because of a lack of appropriate computational tools. Here, we propose a new approach for exploring and describing a genome without biasing the search process towards already known structural entities. Our primary objective is to discover novel conserved patterns that would typically fall off the scope of the current suite of repeat finding tools because of irregularities in their structure. The output is a hierarchy of patterns with arbitrary structural characteristics. A hierarchical representation captures the genomic sequence content at an abstract level and offers novel ways to examine the information contained in them. Our approach is an information theoretic search process which uses pattern matching techniques for processing the sequence data. Preliminary evaluation on the Drosophila genome has resulted in the finding of a number of irregular patterns. Discovering new patterns is an important problem in both whole- and comparative genomic application domains. The proposed approach can provide an information-theoretic framework for conducting pattern and knowledge discovery on genomic data.
Keywords
bioinformatics; cellular biophysics; data mining; genetics; information theory; molecular biophysics; Drosophila genome; genes; genomic data; genomic sequence content; information theory; irregular patterns; knowledge discovery; pattern matching; repetitive patterns; Bioinformatics; Biological information theory; Diseases; Evolution (biology); Frequency; Genomics; Humans; Scattering; Sequences; Software tools;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08. IEEE Symposium on
Conference_Location
Sun Valley, ID
Print_ISBN
978-1-4244-1778-0
Electronic_ISBN
978-1-4244-1779-7
Type
conf
DOI
10.1109/CIBCB.2008.4675756
Filename
4675756
Link To Document