• DocumentCode
    610315
  • Title

    Cleaning uncertain data for top-k queries

  • Author

    Luyi Mo ; Cheng, Russell ; Xiang Li ; Cheung, David Wai-lok ; Yang, Xiaoping S.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Hong Kong, Hong Kong, China
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    134
  • Lastpage
    145
  • Abstract
    The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this “cleaning operation” may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal.
  • Keywords
    data integration; query processing; cleaning operation; data integration; data uncertainty handling; location-based services; optimal solution; probabilistic databases; probabilistic top-k query; sensor networks; temperature value uncertainty; top-k queries; top-k query quality; uncertain data cleaning; world semantics; Cleaning; Motion pictures; Probabilistic logic; Query processing; Semantics; Uncertainty;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-4909-3
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2013.6544820
  • Filename
    6544820