DocumentCode :
2708200
Title :
AXECHOP: a grammar-based compressor for XML
Author :
Leighton, Gregory ; Diamond, Jim ; Müldner, Tomasz
Author_Institution :
Jodrey Sch. of Comput. Sci., Acadia Univ., Wolfville, NS, Canada
fYear :
2005
fDate :
29-31 March 2005
Firstpage :
467
Abstract :
Summary form only given. XML is gaining widespread acceptance as a standard for storing and transmitting structured data. One of the drawbacks of XML is that it is quite verbose: an XML representation of a set of data can easily be ten times as large as a more economical representation of the data. To overcome this limitation, we present a compression scheme tailored specifically to XML named AXECHOP. The compression strategy used in AXECHOP begins by dividing the source XML document into structural and data segments. The former is represented using a byte tokenization scheme that preserves the original structure of the document (i.e. it maintains the proper nesting and ordering of elements, attributes, and data values). The MPM compression algorithm is used to generate a context-free grammar capable of deriving this original structure, and the grammar is passed through an adaptive arithmetic coder before being written to the compressed file. The document´s data is organized into a series of containers (where container membership is determined by the identity of the XML element or attribute that encloses the data) and then the Burrows-Wheeler transform (BWT) is applied to the contents of each dictionary, with the results being appended to the compressed file.
Keywords :
XML; adaptive codes; arithmetic codes; context-free grammars; data compression; data structures; transforms; AXECHOP; Burrows-Wheeler transform; MPM compression algorithm; XML representation; adaptive arithmetic coder; byte tokenization scheme; container membership; context-free grammar; data segments; document structural segments; grammar-based compressor; structured data; Arithmetic; Compression algorithms; Computer science; Containers; Data compression; Dictionaries; Length measurement; Testing; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2005. Proceedings. DCC 2005
ISSN :
1068-0314
Print_ISBN :
0-7695-2309-9
Type :
conf
DOI :
10.1109/DCC.2005.20
Filename :
1402224
Link To Document :
بازگشت