DocumentCode
3421246
Title
Tradeoffs in XML database compression
Author
Cheney, James
Author_Institution
Edinburgh Univ., UK
fYear
2006
fDate
28-30 March 2006
Firstpage
392
Lastpage
401
Abstract
Large XML data files, or XML databases, are now a common way to distribute scientific and bibliographic data, and storing such data efficiently is an important concern. A number of approaches to XML compression have been proposed in the last five years. The most competitive approaches employ one or more statistical text compressors based on PPM or arithmetic coding in which some of the context is provided by the XML document structure. The purpose of this paper is to investigate the relationship between the extant proposals in more detail. We review the two main statistical modeling approaches proposed so far, and evaluate their performance on two representative XML databases. Our main finding is that while a recently-proposed multiple-model approach can provide better overall compression for large databases, it uses much more memory and converges more slowly than an older single-model approach.
Keywords
XML; arithmetic codes; data compression; database management systems; statistical analysis; XML data files; XML database compression; XML document structure; arithmetic coding; multiple-model approach; statistical modeling approaches; statistical text compressors; Arithmetic; Compressors; Containers; Context modeling; Data compression; Databases; Proposals; Proteins; Switches; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference, 2006. DCC 2006. Proceedings
ISSN
1068-0314
Print_ISBN
0-7695-2545-8
Type
conf
DOI
10.1109/DCC.2006.79
Filename
1607274
Link To Document