• DocumentCode
    3144222
  • Title

    Join queries on uncertain data: Semantics and efficient processing

  • Author

    Ge, Tingjian

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Kentucky, Lexington, KY, USA
  • fYear
    2011
  • fDate
    11-16 April 2011
  • Firstpage
    697
  • Lastpage
    708
  • Abstract
    Uncertain data is quite common nowadays in a variety of modern database applications. At the same time, the join operation is one of the most important but expensive operations in SQL. However, join queries on uncertain data have not been adequately addressed thus far. In this paper, we study the SQL join operation on uncertain attributes. We observe and formalize two kinds of join operations on such data, namely v-join and d-join. They are each useful for different applications. Using probability theory, we then devise efficient query processing algorithms for these join operations. Specifically, we use probability bounds that are based on the moments of random variables to either early accept or early reject a candidate v-join result tuple. We also devise an indexing mechanism and an algorithm called Two-End Zigzag Join to further save I/O costs. For d-join, we first observe that it can be reduced to a special form of similarity join in a multidimensional space. We then design an efficient algorithm called condensed d-join and an optimal condensation scheme based on dynamic programming. Finally, we perform a comprehensive empirical study using both real datasets and synthetic datasets.
  • Keywords
    SQL; data handling; database management systems; dynamic programming; probability; query processing; SQL join operation; d-join data; dynamic programming; indexing mechanism; join queries; optimal condensation scheme; probability theory; query processing algorithms; two-end zigzag; uncertain data handling; v-join data; Algorithm design and analysis; Indexing; Probabilistic logic; Random variables; Reactive power; Uncertainty;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on
  • Conference_Location
    Hannover
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4244-8959-6
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2011.5767888
  • Filename
    5767888