Title of article
Lempel-Ziv compression of highly structured documents
Author/Authors
Joaqu?n Adiego1، نويسنده , , Gonzalo Navarro2، نويسنده , , Pablo de la Fuente3، نويسنده ,
Issue Information
ماهنامه با شماره پیاپی سال 2007
Pages
18
From page
461
To page
478
Abstract
The authors describe Lempel-Ziv to Compress Structure (LZCS), a novel Lempel–Ziv approach suitable for compressing structured documents. LZCS takes advantage of repeated substructures that may appear in the documents, by replacing them with a backward reference to their previous occurrence. The result of the LZCS transformation is still a valid structured document, which is human-readable and can be transmitted by ASCII channels. Moreover, LZCS transformed documents are easy to search, display, access at random, and navigate. In a second stage, the transformed documents can be further compressed using any semistatic technique, so that it is still possible to do all those operations efficiently; or with any adaptive technique to boost compression. LZCS is especially efficient in the compression of collections of highly structured data, such as extensible markup language (XML) forms, invoices, e-commerce, and Web-service exchange documents. The comparison with other structure-aware and standard compressors shows that LZCS is a competitive choice for these type of documents, whereas the others are not well-suited to support navigation or random access. When joined to an adaptive compressor, LZCS obtains by far the best compression ratios.
Journal title
Journal of the American Society for Information Science and Technology
Serial Year
2007
Journal title
Journal of the American Society for Information Science and Technology
Record number
993465
Link To Document