• DocumentCode
    147103
  • Title

    Towards Markup-Aware Text Compression

  • Author

    Moore, John P. T. ; Kheirkhahzadeh, Antonio D. ; Bagale, Jiva N.

  • Author_Institution
    Univ. of West London, London, UK
  • fYear
    2014
  • fDate
    26-28 March 2014
  • Firstpage
    417
  • Lastpage
    417
  • Abstract
    Although text compression can be successfully applied to markup languages, it does so without any semantic knowledge of the data types present within the markup. In this paper we illustrate how this added knowledge can be used to develop a hybrid tool which combines traditional text compression with markup-awareness to improve compression size against existing well known text compression tools. Our results show that for highly structured markup it is possible to improve the level of compression by around 20% compared to the best performing existing tool we studied. We describe the limitations of our approach and discuss potential implementation options with the overall goal being to produce a practical Unix-like tool.
  • Keywords
    XML; data compression; text analysis; Unix-like tool; XML data; XML markup; markup-aware text compression; markup-awareness; Data compression; Educational institutions; Hybrid power systems; Protocols; Roads; Runtime; XML; XML compression; text compression;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2014
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Type

    conf

  • DOI
    10.1109/DCC.2014.80
  • Filename
    6824469