DocumentCode
3437840
Title
Near-lossless compression of large alphabet sources
Author
Kelly, Benjamin G. ; Wagner, Aaron B.
Author_Institution
GE Global Res., Knowledge Discovery Lab., Niskayuna, NY, USA
fYear
2012
fDate
21-23 March 2012
Firstpage
1
Lastpage
6
Abstract
We study universal compression of independent and identically distributed sources over large alphabets using fixed-rate codes. To model large alphabets, we use sequences of discrete alphabets that increase in size with the blocklength. We show that universal compression is possible using deterministic codes provided that the alphabet growth is sub-linear in the blocklength. For linear alphabet growth, we show that universal compression is not possible, even if the use of randomized encoders and decoders is permitted. However, if only the decoder is provided with the source distribution, then randomized universal coding is always possible for any growth rate. For the non-universal case in which the goal is to compress a source generated by a known sequence of distributions, we show that compression at the entropy of the source sequence is possible if and only if the ratio of the square logarithm of the alphabet size to the blocklength goes to zero.
Keywords
data compression; decoding; encoding; entropy; deterministic codes; discrete alphabet sequence; distributed sources; distribution sequence; entropy; fixed-rate codes; large alphabet modeling; large alphabet sources; large alphabets; linear alphabet growth; near-lossless compression; randomized decoders; randomized encoders; randomized universal coding; source distribution; square logarithm; sub-linear; universal compression; Decoding; Encoding; Entropy; Error probability; Indexes; Manganese; Random variables;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Sciences and Systems (CISS), 2012 46th Annual Conference on
Conference_Location
Princeton, NJ
Print_ISBN
978-1-4673-3139-5
Electronic_ISBN
978-1-4673-3138-8
Type
conf
DOI
10.1109/CISS.2012.6310917
Filename
6310917
Link To Document