DocumentCode :
1791530
Title :
BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata
Author :
De, Suvranu ; Yuheng Hu ; Yi Chen ; Kambhampati, S.
Author_Institution :
Dept. of Comput. Sci. & Eng., Arizona State Univ., Tempe, AZ, USA
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
15
Lastpage :
24
Abstract :
Recent efforts in data cleaning of structured data have focused exclusively on problems like data deduplication, record matching, and data standardization; none of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this paper, we provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. We thus avoid the necessity for a domain expert or clean master data. We also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. We evaluate our methods over both synthetic and real data.
Keywords :
Big Data; learning (artificial intelligence); query processing; BayesWipe system; Bayesian generative model; attribute values correction; data cleaning; data deduplication; data standardization; database; learning; query answering; record matching; statistical error model; structured Big Data; Bayes methods; Big data; Cleaning; Data models; Mathematical model; Query processing; data cleaning; databases; query rewriting; uncertainty; web databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004207
Filename :
7004207
Link To Document :
بازگشت