Title :
Compressing multisets using tries
Author :
Gripon, Vincent ; Rabbat, Michael ; Skachek, Vitaly ; Gross, Warren J.
Author_Institution :
Dept. of Electr. & Comput. Eng., McGill Univ., Montreal, QC, Canada
Abstract :
We consider the problem of efficient and lossless representation of a multiset of m words drawn with repetition from a set of size 2n. One expects that encoding the (unordered) multiset should lead to significant savings in rate as compared to encoding an (ordered) sequence with the same words, since information about the order of words in the sequence corresponds to a permutation. We propose and analyze a practical multiset encoder/decoder based on the trie data structure. The act of encoding requires O(m(n + log m)) operations, and decoding requires O(mn) operations. Of particular interest is the case where cardinality of the multiset scales as m = 1/c2n for some c >; 1, as n → ∞. Under this scaling, and when the words in the multiset are drawn independently and uniformly, we show that the proposed encoding leads to an arbitrary improvement in rate over encoding an ordered sequence with the same words. Moreover, the expected length of the proposed codes in this setting is asymptotically within a constant factor of 5/3 of the lower bound.
Keywords :
computational complexity; data compression; data structures; set theory; O(m(n + log m)) operation; O(mn) operation; constant factor; multiset cardinality; multiset compression; multiset decoder; multiset encoder; multiset encoding; multiset lossless representation; trie data structure; Channel coding; Complexity theory; Conferences; Decoding; Entropy; Manganese;
Conference_Titel :
Information Theory Workshop (ITW), 2012 IEEE
Conference_Location :
Lausanne
Print_ISBN :
978-1-4673-0224-1
Electronic_ISBN :
978-1-4673-0222-7
DOI :
10.1109/ITW.2012.6404756