• DocumentCode
    610342
  • Title

    Holistic data cleaning: Putting violations into context

  • Author

    Xu Chu ; Ilyas, I.F. ; Papotti, P.

  • Author_Institution
    Univ. of Waterloo, Waterloo, ON, Canada
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    458
  • Lastpage
    469
  • Abstract
    Data cleaning is an important problem and data quality rules are the most promising way to face it with a declarative approach. Previous work has focused on specific formalisms, such as functional dependencies (FDs), conditional functional dependencies (CFDs), and matching dependencies (MDs), and those have always been studied in isolation. Moreover, such techniques are usually applied in a pipeline or interleaved. In this work we tackle the problem in a novel, unified framework. First, we let users specify quality rules using denial constraints with ad-hoc predicates. This language subsumes existing formalisms and can express rules involving numerical values, with predicates such as “greater than” and “less than”. More importantly, we exploit the interaction of the heterogeneous constraints by encoding them in a conflict hypergraph. Such holistic view of the conflicts is the starting point for a novel definition of repair context which allows us to compute automatically repairs of better quality w.r.t. previous approaches in the literature. Experimental results on real datasets show that the holistic approach outperforms previous algorithms in terms of quality and efficiency of the repair.
  • Keywords
    constraint handling; data handling; graph theory; pattern matching; CFD; MD; ad-hoc predicate; conditional functional dependencies; conflict hypergraph; data quality rule; data repair; declarative approach; denial constraint; heterogeneous constraint; holistic data cleaning; matching dependencies; numerical value; Cities and towns; Cleaning; Context; Databases; Maintenance engineering; Proposals; Remuneration;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-4909-3
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2013.6544847
  • Filename
    6544847