DocumentCode
3254339
Title
Model based concordance compression
Author
Bookstein, Abraham ; Klein, Shmuel T. ; Raita, Timo
Author_Institution
Center for Inf. & Language Studies, Chicago Univ., IL, USA
fYear
1992
fDate
24-27 March 1992
Firstpage
82
Lastpage
91
Abstract
The authors discuss concordance compression using the framework now customary in compression theory. They begin by creating a mathematical model of concordance generation, and then use optimal compression engines, such as Huffman or arithmetic coding, to do the actual compression. It should be noted that in the context of a static information retrieval system, compression and decompression are not symmetrical tasks. Compression is done only once, while building the system, whereas decompression is needed during the processing of every query and directly affects the response time. One may thus use extensive and costly preprocessing for compression, provided reasonably fast decompression methods are possible. Moreover, compression is applied to the full files (text, concordance, etc.), but decompression is needed only for (possibly many) short pieces, which may be accessed at random by means of pointers to their exact locations. Therefore the use of adaptive methods based on tables that systematically change from the beginning to the end of the file is ruled out. However, their concern is less the speed of encoding or decoding than relating concordance compression conceptually to the modern approach of data compression, and testing the effectiveness of their models.<>
Keywords
data compression; encoding; indexing; information retrieval systems; Huffman coding; adaptive methods; arithmetic coding; compression theory; concordance generation; data compression; decoding; decompression; decompression methods; information retrieval system; mathematical model; optimal compression engines; preprocessing; tables; testing; Arithmetic; Computer science; Data structures; Databases; Delay; Dictionaries; Engines; Gallium nitride; Information retrieval; Mathematical model;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference, 1992. DCC '92.
Conference_Location
Snowbird, UT, USA
Print_ISBN
0-8186-2717-4
Type
conf
DOI
10.1109/DCC.1992.227473
Filename
227473
Link To Document