• DocumentCode
    3437840
  • Title

    Near-lossless compression of large alphabet sources

  • Author

    Kelly, Benjamin G. ; Wagner, Aaron B.

  • Author_Institution
    GE Global Res., Knowledge Discovery Lab., Niskayuna, NY, USA
  • fYear
    2012
  • fDate
    21-23 March 2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    We study universal compression of independent and identically distributed sources over large alphabets using fixed-rate codes. To model large alphabets, we use sequences of discrete alphabets that increase in size with the blocklength. We show that universal compression is possible using deterministic codes provided that the alphabet growth is sub-linear in the blocklength. For linear alphabet growth, we show that universal compression is not possible, even if the use of randomized encoders and decoders is permitted. However, if only the decoder is provided with the source distribution, then randomized universal coding is always possible for any growth rate. For the non-universal case in which the goal is to compress a source generated by a known sequence of distributions, we show that compression at the entropy of the source sequence is possible if and only if the ratio of the square logarithm of the alphabet size to the blocklength goes to zero.
  • Keywords
    data compression; decoding; encoding; entropy; deterministic codes; discrete alphabet sequence; distributed sources; distribution sequence; entropy; fixed-rate codes; large alphabet modeling; large alphabet sources; large alphabets; linear alphabet growth; near-lossless compression; randomized decoders; randomized encoders; randomized universal coding; source distribution; square logarithm; sub-linear; universal compression; Decoding; Encoding; Entropy; Error probability; Indexes; Manganese; Random variables;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Sciences and Systems (CISS), 2012 46th Annual Conference on
  • Conference_Location
    Princeton, NJ
  • Print_ISBN
    978-1-4673-3139-5
  • Electronic_ISBN
    978-1-4673-3138-8
  • Type

    conf

  • DOI
    10.1109/CISS.2012.6310917
  • Filename
    6310917