Title of article :
XML tree structure compression using RePair
Author/Authors :
Markus Lohrey، نويسنده , , Sebastian Maneth، نويسنده , , Roy Mennicke، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2013
Pages :
18
From page :
1150
To page :
1167
Abstract :
XML tree structures can conveniently be represented using ordered unranked trees. Due to the repetitiveness of XML markup these trees can be compressed effectively using dictionary-based methods, such as minimal directed acyclic graphs (DAGs) or straight-line context-free (SLCF) tree grammars. While minimal SLCF tree grammars are in general smaller than minimal DAGs, they cannot be computed in polynomial time unless image. Here, we present a new linear time algorithm for computing small SLCF tree grammars, called TreeRePair, and show that it greatly outperforms the best known previous algorithm BPLEX. TreeRePair is a generalization to trees of Larsson and Moffatʹs RePair string compression algorithm. SLCF tree grammars can be used as efficient memory representations of trees. Using TreeRePair, we are able to produce the smallest queryable memory representation of ordered trees that we are aware of. Our investigations over a large corpus of commonly used XML documents show that tree traversals over TreeRePair grammars are 14 times slower than over pointer structures and 5 times slower than over succinct trees, while memory consumption is only 1/43 and 1/6, respectively. With respect to file compression we are able to show that a Huffman-based coding of TreeRePair grammars gives compression ratios comparable to the best known XML file compressors.
Keywords :
Tree structure compression , Memory representation , XML
Journal title :
Information Systems
Serial Year :
2013
Journal title :
Information Systems
Record number :
1230350
Link To Document :
بازگشت