Title :
SketchTree: Approximate Tree Pattern Counts over Streaming Labeled Trees
Author :
Rao, Praveen ; Moon, Bongki
Author_Institution :
University of Arizona
Abstract :
In recent years, there has been a rising interest in developing online approximation algorithms for data streams. Some of the key challenges are posed by the fact that streaming data can be read only once in a fixed order of arrival and only a limited amount of memory is available for storage. In this paper, we address the problem of approximately counting tree patterns over a stream of labeled trees (e.g., XML documents). We propose a new approximation algorithm called SketchTree that computes a synopsis of the stream in a single pass by processing each tree only once. Using a limited amount of memory, SketchTree provides approximate answers for both ordered and unordered tree pattern counts. Furthermore, we discuss a class of count queries that can be handled by SketchTree and their utility. We provide theoretical analyses to show that our algorithm has provably strong guarantees on the error bounds. Experiments on real datasets demonstrate that SketchTree can indeed estimate tree pattern counts within 10-15% relative error with high confidence under various situations.
Keywords :
Algorithm design and analysis; Approximation algorithms; Biology computing; Computer science; Frequency estimation; Monitoring; Moon; Sensor systems and applications; Web and internet services; XML;
Conference_Titel :
Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference on
Print_ISBN :
0-7695-2570-9
DOI :
10.1109/ICDE.2006.141