Title :
Searching in compressed dictionaries
Author :
Klein, Shmuel T. ; Shapira, Dana
Author_Institution :
Dept. of Comput. Sci., Bar-Ilan Univ., Ramat-Gan, Israel
Abstract :
We introduce two new methods to represent a prefix omission method (POM) file so that direct search can be done in these compressed dictionaries. The processing time is typically twice as fast for the Fibonacci variant than for the Huffman based algorithm, and also compared to decoding a Huffman encoded POM file and searching on the uncompressed version. We see that in the case of small files, which is the important application since dictionaries are usually kept in small chunks, the Fibonacci variant is much faster than decoding and searching or than the POM-Huffman method. Even though the compression performance might be slightly inferior to the character version of Huffman (but still generally better than the bit version), this might well be a price worth paying for faster processing.
Keywords :
Huffman codes; data compression; dictionaries; information retrieval system evaluation; search problems; text analysis; Fibonacci variant; Huffman based algorithm; POM file; compressed dictionaries; direct search; performance; prefix omission method; Computer science; Decoding; Dictionaries; Encoding; Gallium nitride; Information retrieval; Large-scale systems; Natural languages; Pattern matching; Production systems;
Conference_Titel :
Data Compression Conference, 2002. Proceedings. DCC 2002
Print_ISBN :
0-7695-1477-4
DOI :
10.1109/DCC.2002.999952