• DocumentCode
    3023755
  • Title

    Why Not Semijoins for Streams, When Distributed?

  • Author

    Tri Tran ; Byung Suk Lee ; Bovee, M.W.

  • Author_Institution
    Univ. of Vermont, Burlington
  • fYear
    2007
  • fDate
    1-5 July 2007
  • Firstpage
    27
  • Lastpage
    27
  • Abstract
    This paper addresses the semijoin-based window join algorithm over distributed data streams. In distributed stream query processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead over the network. Our observation is that semi- join, effective to reduce communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, of course, lies in the streaming nature of tuples, the processing of which is fundamentally different from processing a set of tuples. We address this challenge by first adapting the window-based stream join to a distributed environment. The resulting join algorithm (called simple join) uses the idea of exporting a window to the query processing site. We then adopt the semijoin to reduce the communication overhead (in return for a marginal increase of the processing overhead). The resulting semijoin-based join algorithm uses the ideas of a mirror window and a partial tuple. That is, it creates a copy of a remote window at the processing site and sends a partial tuple to probe for matching tuples before sending a full tuple. Finally, we analyze the two join algorithms using our proposed cost models and verify the analysis results through a set of experiments.
  • Keywords
    distributed databases; query processing; communication overhead; distributed data streams; distributed database query processing; distributed stream query processing; query execution; remote window; semijoin-based window join algorithm; Algorithm design and analysis; Computer science; Costs; Distributed databases; Mirrors; Monitoring; Probes; Query processing; Strontium; Telecommunication traffic;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Telecommunications, 2007. ICDT '07. Second International Conference on
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    0-7695-2910-0
  • Electronic_ISBN
    0-7695-2910-0
  • Type

    conf

  • DOI
    10.1109/ICDT.2007.38
  • Filename
    4270593