Title :
Wavelet Trees: From Theory to Practice
Author :
Grossi, Roberto ; Vitter, Jeffrey Scott ; Xu, Bojian
Author_Institution :
Dipt. di Inf., Univ. di Pisa, Pisa, Italy
Abstract :
The wavelet tree data structure is a space-efficient technique for rank and select queries that generalizes from binary characters to an arbitrary multicharacter alphabet. It has become a key tool in modern full-text indexing and data compression because of its capabilities in compressing, indexing, and searching. We present a comparative study of its practical performance regarding a wide range of options on the dimensions of different coding schemes and tree shapes. Our results are both theoretical and experimental: (1) We show that the run-length δ coding size of wavelet trees achieves the 0-order empirical entropy size of the original string with leading constant 1, when the string´s 0-order empirical entropy is asymptotically less than the logarithm of the alphabet size. This result complements the previous works that are dedicated to analyzing run-length γ-encoded wavelet trees. It also reveals the scenarios when run-length δ encoding becomes practical. (2) We introduce a full generic package of wavelet trees for a wide range of options on the dimensions of coding schemes and tree shapes. Our experimental study reveals the practical performance of the various modifications.
Keywords :
data compression; tree data structures; wavelet transforms; alphabet size; data compression; multicharacter alphabet; space efficient technique; tree shapes; wavelet tree data structure; Arrays; Bismuth; Encoding; Entropy; Indexing; Shape; Vegetation; data compression; full-text indexing; pattern matching; wavelet tree;
Conference_Titel :
Data Compression, Communications and Processing (CCP), 2011 First International Conference on
Conference_Location :
Palinuro
Print_ISBN :
978-1-4577-1458-0
Electronic_ISBN :
978-0-7695-4528-8
DOI :
10.1109/CCP.2011.16