• DocumentCode
    3421246
  • Title

    Tradeoffs in XML database compression

  • Author

    Cheney, James

  • Author_Institution
    Edinburgh Univ., UK
  • fYear
    2006
  • fDate
    28-30 March 2006
  • Firstpage
    392
  • Lastpage
    401
  • Abstract
    Large XML data files, or XML databases, are now a common way to distribute scientific and bibliographic data, and storing such data efficiently is an important concern. A number of approaches to XML compression have been proposed in the last five years. The most competitive approaches employ one or more statistical text compressors based on PPM or arithmetic coding in which some of the context is provided by the XML document structure. The purpose of this paper is to investigate the relationship between the extant proposals in more detail. We review the two main statistical modeling approaches proposed so far, and evaluate their performance on two representative XML databases. Our main finding is that while a recently-proposed multiple-model approach can provide better overall compression for large databases, it uses much more memory and converges more slowly than an older single-model approach.
  • Keywords
    XML; arithmetic codes; data compression; database management systems; statistical analysis; XML data files; XML database compression; XML document structure; arithmetic coding; multiple-model approach; statistical modeling approaches; statistical text compressors; Arithmetic; Compressors; Containers; Context modeling; Data compression; Databases; Proposals; Proteins; Switches; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2006. DCC 2006. Proceedings
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-2545-8
  • Type

    conf

  • DOI
    10.1109/DCC.2006.79
  • Filename
    1607274