DocumentCode :
2548249
Title :
Approximate Validity of XML Streaming Data
Author :
Cheng, Huang ; Jun, Li ; de Rougemont, M.
Author_Institution :
Univ. Paris-Sud, Orsay
fYear :
2008
fDate :
20-22 July 2008
Firstpage :
149
Lastpage :
156
Abstract :
We present a SAX implementation of the statistical embedding associated with XML data, introduced in [1], [2], which allows to efficiently decide eps-validity to any DTD or Schema, for the Edit Distance with Moves. It associates a generalized k-gram to unranked labelled trees (with k = 1/epsiv) from which any regular property can be approximately decided. We show how to exactly compute the k-gram with a SAX implementation using a memory of size d, the depth of the tree, and an approximate k-gram with queues of size M = 2k and a global memory of size 2k in the worst-case. Experiments on large XML files from the XML benchmark project confirm the error analysis for various values of M.
Keywords :
XML; approximation theory; statistical analysis; tree data structures; SAX implementation; approximate XML streaming data validity; generalized k-gram labelled tree; statistical data embedding; unranked labelled tree; Benchmark testing; Error analysis; Information management; Sampling methods; Scalability; Search problems; Tree graphs; Virtual manufacturing; Web mining; XML; XML Approximation Web-mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
Conference_Location :
Zhangjiajie Hunan
Print_ISBN :
978-0-7695-3185-4
Electronic_ISBN :
978-0-7695-3185-4
Type :
conf
DOI :
10.1109/WAIM.2008.97
Filename :
4597008
Link To Document :
بازگشت