DocumentCode
519882
Title
Language-independent word-based text compression with fast decompression
Author
Grabowski, Szymon ; Swacha, Jakub
Author_Institution
Comput. Eng. Dept., Tech. Univ. of Lodz, Lodz, Poland
fYear
2010
fDate
20-23 April 2010
Firstpage
158
Lastpage
162
Abstract
A classic idea to improve text compression is to replace words with references to a text dictionary, either external or stored together with the archive. We advocate for the second option, as even with one language in mind (e.g., English) it is rather impossible to have a single dictionary fitting well different sorts of modern texts. There are basically two problems to solve, which are how to assign codewords to individual words from the parsed text, and how to represent the dictionary compactly. The resulting data are input for a backend compressor. Since in many scenarios texts are decompressed (read) more often than compressed (written), we focus on LZ77 backend compression algorithms, in particular Deflate, used in zip/gzip standards, whose well-known asset is very fast decompression.
Keywords
data compression; text analysis; word processing; Deflate; LZ77 backend compression algorithms; codewords; fast decompression; language independent word based text compression; parsed text; text dictionary; zip-gzip standards; Cascading style sheets; Compression algorithms; DNA; Dictionaries; HTML; Natural languages; Postal services; Protein sequence; Spatial databases; XML; byte codes; dictionary compression; text compression;
fLanguage
English
Publisher
ieee
Conference_Titel
Perspective Technologies and Methods in MEMS Design (MEMSTECH), 2010 Proceedings of VIth International Conference on
Conference_Location
Lviv
Print_ISBN
978-1-4244-7325-0
Electronic_ISBN
978-966-2191-11-0
Type
conf
Filename
5499297
Link To Document