DocumentCode :
147043
Title :
A Practical Implementation of Compressed Suffix Arrays with Applications to Self-Indexing
Author :
Hongwei Huo ; Longgang Chen ; Vitter, Jeffrey Scott ; Nekrich, Yakov
Author_Institution :
Xidian Univ., Xi´an, China
fYear :
2014
fDate :
26-28 March 2014
Firstpage :
292
Lastpage :
301
Abstract :
In this paper we develop a simple and practical text indexing scheme for compressed suffix arrays (CSA). For a text of n characters, our CSA can be constructed in linear time and needs 2nHk + n + o(n) bits of space for any k ≤ clogσn - 1 and any constant c <; 1, where Hk denotes the kth order entropy. We compare the performance of our method with two established compressed indexing methods, the FM-index and the Sad-CSA. Experiments on the Canterbury Corpus and the Pizza&Chili Corpus show significant advantages of our algorithm over two other indexes in terms of compression and query time. Our storage scheme achieves better performance on all types of data present in these two corpora, except for evenly distributed data, such as DNA. The source code for our CSA is available online.
Keywords :
computational complexity; data structures; indexing; text analysis; Canterbury corpus; DNA; FM-index; Pizza&Chili corpus; Sad-CSA; compressed indexing methods; compressed suffix arrays; data structures; distributed data; kth order entropy; linear time; query time; source code; text indexing scheme; Arrays; Context; Decoding; Distributed databases; Encoding; Entropy; Indexes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2014
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Type :
conf
DOI :
10.1109/DCC.2014.49
Filename :
6824437
Link To Document :
بازگشت