Title :
Query Aware Determinization of Uncertain Objects
Author :
Jie Xu ; Kalashnikov, Dmitri V. ; Mehrotra, Sanjay
Author_Institution :
Dept. of Comput. Sci., Univ. of California at Irvine, Irvine, CA, USA
Abstract :
This paper considers the problem of determinizing probabilistic data to enable such data to be stored in legacy systems that accept only deterministic input. Probabilistic data may be generated by automated data analysis/enrichment techniques such as entity resolution, information extraction, and speech processing. The legacy system may correspond to pre-existing web applications such as Flickr, Picasa, etc. The goal is to generate a deterministic representation of probabilistic data that optimizes the quality of the end-application built on deterministic data. We explore such a determinization problem in the context of two different data processing tasks-triggers and selection queries. We show that approaches such as thresholding or top-1 selection traditionally used for determinization lead to suboptimal performance for such applications. Instead, we develop a query-aware strategy and show its advantages over existing solutions through a comprehensive empirical evaluation over real and synthetic datasets.
Keywords :
Internet; data analysis; query processing; Web applications; automated data analysis-enrichment techniques; legacy systems; probabilistic data deterministic representation; probabilistic data determinization; query aware determinization; selection queries; trigger; uncertain objects; Approximation algorithms; Data processing; Earthquakes; Measurement; Probabilistic logic; Speech; Speech recognition; Determinzation; branch and bound algorithm; data quality; determinization; query workload; uncertain data;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2013.170