Title :
Text compression for dynamic document databases
Author :
Moffat, Alistair ; Zobel, Justin ; Sharman, Neil
Author_Institution :
Dept. of Comput. Sci., Melbourne Univ., Parkville, Vic., Australia
Abstract :
For compression of text databases, semi-static word-based methods provide good performance in terms of both speed and disk space, but two problems arise. First, the memory requirements for the compression model during decoding can be unacceptably high. Second, the need to handle document insertions means that the collection must be periodically recompressed if compression efficiency is to be maintained on dynamic collections. The authors show that with careful management the impact of both of these drawbacks can be kept small. Experiments with a word-based model and over 500 Mb of text show that excellent compression rates can be retained even in the presence of severe memory limitations on the decoder, and after significant expansion in the amount of stored text
Keywords :
data compression; database management systems; document handling; compression rates; decoding; document insertion handling; dynamic collection; dynamic document databases; memory limitations; memory requirements; semi-static word-based methods; stored text; text database compression; word-based model; Computer Society; Costs; Databases; Decoding; Frequency; Government; Huffman coding; Indexing; Packaging; Statistical distributions;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on