DocumentCode :
2084418
Title :
Using inverted files to compress text
Author :
Ristov, Strahil
Author_Institution :
Ruder Boskovic Inst., Zagreb, Croatia
fYear :
2002
fDate :
2002
Firstpage :
443
Abstract :
This is the first report on a new approach to text compression. It consists of representing the text file with compressed inverted file index in conjunction with very compact lexicon, where lexicon includes every word in the text. The index is compressed using standard index compression techniques, and lexicon is compressed with original dictionary compression method that gives better compression results than existing procedures. Compression procedure is complex, but decompression time is linear with the file size, although it requires two passes and hence can not be performed online. First experiments show that this method, when refined, can be competitive for larger texts that only need to be decompressed in the real time.
Keywords :
data compression; compact lexicon; compressed inverted file index; decompression time; lexicon compression; standard index compression techniques; text compression; Application software; Computer applications; Data compression; Image coding; Image reconstruction; Indexing; Information technology; Multidimensional systems; Time measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology Interfaces, 2002. ITI 2002. Proceedings of the 24th International Conference on
ISSN :
1330-1012
Print_ISBN :
953-96769-5-9
Type :
conf
DOI :
10.1109/ITI.2002.1024714
Filename :
1024714
Link To Document :
بازگشت