• DocumentCode
    838859
  • Title

    Symbol-Based Modeling and Coding of Block Markov Sources

  • Author

    Nagy, Dániel A. ; Gyorgy, Andras ; Linder, Tamás

  • Author_Institution
    Dept. of Math. & Stat., Queen´´s Univ., Kingston, Ont.
  • Volume
    52
  • Issue
    12
  • fYear
    2006
  • Firstpage
    5570
  • Lastpage
    5578
  • Abstract
    Industry-standard lossless compression algorithms (such as LZW) are usually implemented so that they work on bytes as symbols. Experiments indicate that data for which bytes are not the natural choice of symbols compress poorly using these implementations, while algorithms working on a bit level perform reasonably on byte-based data in addition to having computational advantages resulting from operating on a small alphabet. In this correspondence, we offer an information-theoretic explanation to these experimental results by assessing the redundancy (which is approximated by the divergence rate of two source distributions) of a bit-based model when applied to a byte-based source. More specifically, we study the problem of approximating a block Markov source (our model for byte-based data) with higher order Markov sources (which model bit-based Markov encoders), and show that the divergence rate between a block Markov source and the best matching higher order Markov model for that source converges to zero exponentially fast as the memory of the model increases. This result is applied to obtain bounds on the redundancy of certain symbol-based universal codes when they are used for byte-aligned sources
  • Keywords
    Markov processes; block codes; redundancy; source coding; block Markov sources coding; industry-standard lossless compression algorithm; redundancy; symbol-based modeling; Compression algorithms; Councils; Data compression; Entropy; Genetics; Mathematics; Proteins; Redundancy; Scholarships; Statistics; Binary codes; block Markov sources; byte-aligned sources; higher order Markov modeling; lossless coding;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2006.885539
  • Filename
    4016311