• DocumentCode
    3036684
  • Title

    Bi-Directional Context Modeling with Combinatorial Structuring for Genome Sequence Compression

  • Author

    Wenrui Dai ; Hongkai Xiong

  • Author_Institution
    Dept. of Electron. Eng., Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2015
  • fDate
    7-9 April 2015
  • Firstpage
    442
  • Lastpage
    442
  • Abstract
    Summary form only given. This paper proposes a bi-directional context modeling (BCM) technique for reference-free genome sequence compression, which constructs its contexts by combining arbitrary predicted symbols in two directions corresponding to approximate repeats and non-repeat regions. Thus, BCM can sequentially predict DNA sequences with weighted conditional probabilities that simultaneously exploit the correlations among matched approximate repeats and fit the variable-order statistics in non-repeat regions. Moreover, BCM eliminates the overhead of pointer information for specifying approximate repeats, as it is synchronized in both encoder and decoder. In theory, we show that upper bounds of excess model redundancy led by BCM vanish with the growth of sequence size. Experimental results show that BCM outperforms the state-of-the-art reference-free compressors like FCM and CTW+LZ.
  • Keywords
    DNA; biology computing; genomics; probability; statistics; BCM technique; DNA sequence prediction; approximate repeats; bi-directional context modeling; combinatorial structuring; decoder; encoder; excess model redundancy; nonrepeat region; reference-free genome sequence compression; variable-order statistics; weighted conditional probabilities; Bidirectional control; Bioinformatics; Context; Context modeling; DNA; Encoding; Genomics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2015
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Type

    conf

  • DOI
    10.1109/DCC.2015.67
  • Filename
    7149305