• DocumentCode
    1208177
  • Title

    Scalable Filtering of Multiple Generalized-Tree-Pattern Queries over XML Streams

  • Author

    Chen, Songting ; Li, Hua-Gang ; Tatemura, Junchi ; Hsiung, Wang-Pin ; Agrawal, Divyakant ; Candan, K. Selçuk

  • Author_Institution
    Turn Inc., Redwood City, CA
  • Volume
    20
  • Issue
    12
  • fYear
    2008
  • Firstpage
    1627
  • Lastpage
    1640
  • Abstract
    An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex generalized-tree-pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via a shared bottom-up path matching. Second, with the aid of this TOP encoding, we can (1) achieve polynomial time and space complexity for post processing, (2) avoid redundant predicate evaluations, (3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches and (4) simplify the processing of GTP queries. Overall our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient post processing for GTP queries. Extensive performance studies show that our GFilter solution not only achieves significantly better filtering performance than state-of-the-art algorithms, but also is capable of efficiently filtering the more complex GTP queries.
  • Keywords
    XML; computational complexity; merging; middleware; pattern matching; query processing; tree data structures; NFA; XML stream; join-based algorithm; merging; polynomial time complexity; prefix sharing; publish-subscribe system; scalable multiple generalized-tree-pattern query filtering; shared bottom-up path matching; space complexity; suffix sharing; tree-of-path encoding; XML filtering; XML streams; generalized-tree-pattern queries; result encoding;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2008.83
  • Filename
    4509431