DocumentCode :
1950576
Title :
Static compression for dynamic texts
Author :
Moffat, Alistair ; Sharman, Neil ; Zobel, Justin
Author_Institution :
Dept. of Comput. Sci., Melbourne Univ., Parkville, Vic., Australia
fYear :
1994
fDate :
29-31 Mar 1994
Firstpage :
126
Lastpage :
135
Abstract :
The authors have explored the particular needs of large information retrieval systems, in which hundreds of megabytes of data are stored, retrieval is non-sequential, and new text is continually being appended. It has been shown that the word-based model can be adapted to cope well both with dynamic environments, and with situations in which decode-time memory is limited. In the latter case as little as 100 Kb of main memory is sufficient to achieve excellent compression, provided a suitable choice of tokens is used as the compression lexicon. To solve the former problem a new paradigm of compression has been introduced, in which some components of the compression model are required to remain static to ensure that all parts of the text can be decoded, and some parts are extensible, so that new text can also influence the assignment of codewords. An additional heuristic-Swap-to-Near-the-Front-allows collections to be seeded with as little as 1/1000 of their final text with minimal loss of compression efficiency. The resulting "almost static" compression method is ideal for large dynamic collections
Keywords :
data compression; image coding; information retrieval systems; word processing; codewords; compression efficiency; compression lexicon; compression model; decode-time memory; dynamic environments; dynamic texts; information retrieval systems; large dynamic collections; static compression; tokens; word-based model; Australia Council; CD-ROMs; Collaborative work; Computer science; Databases; Decoding; Facsimile; Huffman coding; Information retrieval; Workstations;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1994. DCC '94. Proceedings
Conference_Location :
Snowbird, UT
Print_ISBN :
0-8186-5637-9
Type :
conf
DOI :
10.1109/DCC.1994.305920
Filename :
305920
Link To Document :
بازگشت