• DocumentCode
    2341623
  • Title

    DNA sequence compression using the Burrows-Wheeler Transform

  • Author

    Adjeroh, Don ; Zhang, Yong ; Mukherjee, Amar ; Powell, Matt ; Bell, Tim

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Central Florida, Orlando, FL, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    303
  • Lastpage
    313
  • Abstract
    We investigate off-line dictionary oriented approaches to DNA sequence compression, based on the Burrows-Wheeler Transform (BWT). The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here, we propose off-line methods to compress DNA sequences that exploit the different repetition structures inherent in such sequences. Repetition analysis is performed based on the relationship between the BWT and important pattern matching data structures, such as the suffix tree and suffix array. We discuss how the proposed approach can be incorporated in the BWT compression pipeline.
  • Keywords
    DNA; biology computing; data compression; data structures; medical computing; pattern matching; sequences; transforms; Burrows-Wheeler Transform; DNA sequence compression; biological sequences; offline dictionary; pattern matching data structures; repetition structures; short repeating patterns; suffix array; suffix tree; Bioinformatics; Computer science; DNA; Databases; Genomics; Humans; Organisms; Proteins; RNA; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
  • Print_ISBN
    0-7695-1653-X
  • Type

    conf

  • DOI
    10.1109/CSB.2002.1039352
  • Filename
    1039352