• DocumentCode
    3123797
  • Title

    Discovering Conditional Functional Dependencies

  • Author

    Fan, Wenfei ; Geerts, Floris ; Lakshmanan, Laks V S ; Xiong, Ming

  • Author_Institution
    Univ. of Edinburgh, Edinburgh
  • fYear
    2009
  • fDate
    March 29 2009-April 2 2009
  • Firstpage
    1231
  • Lastpage
    1234
  • Abstract
    This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs are a recent extension of functional dependencies (FDs) by supporting patterns of semantically related constants, and can be used as rules for cleaning relational data. However, finding CFDs is an expensive process that involves intensive manual effort. To effectively identify data cleaning rules, we develop techniques for discovering CFDs from sample relations. We provide three methods for CFD discovery. The first, referred to as CFDMiner, is based on techniques for mining closed itemsets, and is used to discover constant CFDs, namely, CFDs with constant patterns only. The other two algorithms are developed for discovering general CFDs. The first algorithm, referred to as CTANE, is a levelwise algorithm that extends TANE, a well-known algorithm for mining FDs. The other, referred to as FastCFD, is based on the depthfirst approach used in FastFD, a method for discovering FDs. It leverages closed-itemset mining to reduce search space. Our experimental results demonstrate the following. (a) CFDMiner can be multiple orders of magnitude faster than CTANE and FastCFD for constant CFD discovery. (b) CTANE works well when a given sample relation is large, but it does not scale well with the arity of the relation. (c) FastCFD is far more efficient than CTANE when the arity of the relation is large.
  • Keywords
    data mining; CFDMiner; CTANE; FastCFD; closed-itemset mining; conditional functional dependencies; data cleaning; relational data; Cities and towns; Cleaning; Computational fluid dynamics; Data engineering; Data mining; Itemsets; USA Councils; Conditional functional dependencies; functional dependencies;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    1084-4627
  • Print_ISBN
    978-1-4244-3422-0
  • Electronic_ISBN
    1084-4627
  • Type

    conf

  • DOI
    10.1109/ICDE.2009.208
  • Filename
    4812508