• DocumentCode
    3394270
  • Title

    An information theoretic approach for the discovery of irregular and repetitive patterns in genomic data

  • Author

    Davis, Willard ; Kalyanaraman, Ananth ; Cook, Diane

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., Washington State Univ., Pullman, WA
  • fYear
    2008
  • fDate
    15-17 Sept. 2008
  • Firstpage
    30
  • Lastpage
    37
  • Abstract
    The unprecedented rate at which genomic data is accumulated underscores the need to develop highly efficient analytical capabilities. Traditionally, most of the effort post-sequencing has been focused on the identification and annotation of genes along with their promoters and regulatory elements. However, a major part of the vastness outside the gene-space is still left unexplored because of a lack of appropriate computational tools. Here, we propose a new approach for exploring and describing a genome without biasing the search process towards already known structural entities. Our primary objective is to discover novel conserved patterns that would typically fall off the scope of the current suite of repeat finding tools because of irregularities in their structure. The output is a hierarchy of patterns with arbitrary structural characteristics. A hierarchical representation captures the genomic sequence content at an abstract level and offers novel ways to examine the information contained in them. Our approach is an information theoretic search process which uses pattern matching techniques for processing the sequence data. Preliminary evaluation on the Drosophila genome has resulted in the finding of a number of irregular patterns. Discovering new patterns is an important problem in both whole- and comparative genomic application domains. The proposed approach can provide an information-theoretic framework for conducting pattern and knowledge discovery on genomic data.
  • Keywords
    bioinformatics; cellular biophysics; data mining; genetics; information theory; molecular biophysics; Drosophila genome; genes; genomic data; genomic sequence content; information theory; irregular patterns; knowledge discovery; pattern matching; repetitive patterns; Bioinformatics; Biological information theory; Diseases; Evolution (biology); Frequency; Genomics; Humans; Scattering; Sequences; Software tools;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08. IEEE Symposium on
  • Conference_Location
    Sun Valley, ID
  • Print_ISBN
    978-1-4244-1778-0
  • Electronic_ISBN
    978-1-4244-1779-7
  • Type

    conf

  • DOI
    10.1109/CIBCB.2008.4675756
  • Filename
    4675756