DocumentCode :
519882
Title :
Language-independent word-based text compression with fast decompression
Author :
Grabowski, Szymon ; Swacha, Jakub
Author_Institution :
Comput. Eng. Dept., Tech. Univ. of Lodz, Lodz, Poland
fYear :
2010
fDate :
20-23 April 2010
Firstpage :
158
Lastpage :
162
Abstract :
A classic idea to improve text compression is to replace words with references to a text dictionary, either external or stored together with the archive. We advocate for the second option, as even with one language in mind (e.g., English) it is rather impossible to have a single dictionary fitting well different sorts of modern texts. There are basically two problems to solve, which are how to assign codewords to individual words from the parsed text, and how to represent the dictionary compactly. The resulting data are input for a backend compressor. Since in many scenarios texts are decompressed (read) more often than compressed (written), we focus on LZ77 backend compression algorithms, in particular Deflate, used in zip/gzip standards, whose well-known asset is very fast decompression.
Keywords :
data compression; text analysis; word processing; Deflate; LZ77 backend compression algorithms; codewords; fast decompression; language independent word based text compression; parsed text; text dictionary; zip-gzip standards; Cascading style sheets; Compression algorithms; DNA; Dictionaries; HTML; Natural languages; Postal services; Protein sequence; Spatial databases; XML; byte codes; dictionary compression; text compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Perspective Technologies and Methods in MEMS Design (MEMSTECH), 2010 Proceedings of VIth International Conference on
Conference_Location :
Lviv
Print_ISBN :
978-1-4244-7325-0
Electronic_ISBN :
978-966-2191-11-0
Type :
conf
Filename :
5499297
Link To Document :
بازگشت