DocumentCode :
1010468
Title :
Universal compression of memoryless sources over unknown alphabets
Author :
Orlitsky, Alon ; Santhanam, Narayana P. ; Zhang, Junan
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of California, La Jolla, CA, USA
Volume :
50
Issue :
7
fYear :
2004
fDate :
7/1/2004 12:00:00 AM
Firstpage :
1469
Lastpage :
1481
Abstract :
It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern-the order in which the symbols appear. Concentrating on the latter, we show that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time. To establish these results, we show that the number of patterns is the Bell number, that the number of patterns with a given number of symbols is the Stirling number of the second kind, and that the redundancy of patterns can be bounded using results of Hardy and Ramanujan on the number of integer partitions. The results also imply an asymptotically optimal solution for the Good-Turing probability-estimation problem.
Keywords :
data compression; redundancy; source coding; Bell number; Good-Turing probability-estimation problem; Stirling number; asymptotically optimal solution; compression redundancy; i.i.d.; independent and identically distributed strings; integer partitions; memoryless sources; pattern number; universal compression; unknown alphabets; Computer science; Encoding; Entropy; H infinity control; Image coding; Information theory; Pixel; Probability distribution;
fLanguage :
English
Journal_Title :
Information Theory, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9448
Type :
jour
DOI :
10.1109/TIT.2004.830761
Filename :
1306545
Link To Document :
بازگشت