DocumentCode :
1446972
Title :
Usher: Improving Data Quality with Dynamic Forms
Author :
Chen, Kuang ; Chen, Harr ; Conway, Neil ; Hellerstein, Joseph M. ; Parikh, Tapan S.
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of California, Berkeley, Berkeley, CA, USA
Volume :
23
Issue :
8
fYear :
2011
Firstpage :
1138
Lastpage :
1153
Abstract :
Data quality is a critical problem in modern databases. data-entry forms present the first and arguably best opportunity for detecting and mitigating errors, but there has been little research into automatic methods for improving data quality at entry time. In this paper, we propose Usher, an end-to-end system for form design, entry, and data quality assurance. Using previous form submissions, Usher learns a probabilistic model over the questions of the form. Usher then applies this model at every step of the data-entry process to improve data quality. Before entry, it induces a form layout that captures the most important data values of a form instance as quickly as possible and reduces the complexity of error-prone questions. During entry, it dynamically adapts the form to the values being entered by providing real-time interface feedback, reasking questions with dubious responses, and simplifying questions by reformulating them. After entry, it revisits question responses that it deems likely to have been entered incorrectly by reasking the question or a reformulation thereof. We evaluate these components of Usher using two real-world data sets. Our results demonstrate that Usher can improve data quality considerably at a reduced cost when compared to current practice.
Keywords :
data handling; database management systems; information science; peer-to-peer computing; user interfaces; Usher; data entry process; data quality; end-to-end system; probabilistic model; real time interface feedback; Adaptation model; Bayesian methods; Cleaning; Data models; Databases; Predictive models; Probabilistic logic; Data quality; adaptive form.; data entry; form design;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2011.31
Filename :
5710916
Link To Document :
بازگشت