• DocumentCode
    2168205
  • Title

    HADCLEAN: A hybrid approach to data cleaning in data warehouses

  • Author

    Paul, A. ; Ganesan, V. ; Challa, J.S. ; Sharma, Yogesh

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Syst., Birla Inst. of Technol. & Sci., Pilani, India
  • fYear
    2012
  • fDate
    13-15 March 2012
  • Firstpage
    136
  • Lastpage
    142
  • Abstract
    Data Cleaning is a very important part of the data warehouse management process. It is not a very easy process as many different types of unclean data (bad data, incomplete data, typos, etc) can be present. Also, whether a data is clean or dirty is highly dependent on the nature and source of the raw data. Many attempts have been made to clean the data using blocking algorithms, phonetic algorithms, etc. In this paper an attempt has been made to provide a hybrid approach HADCLEAN for cleaning data which combines modified versions of PNRS and Transitive closure algorithms.
  • Keywords
    data analysis; data warehouses; HADCLEAN; PNRS; blocking algorithm; data cleaning; data warehouse management process; hybrid approach; phonetic algorithm; raw data; transitive closure algorithm; Algorithm design and analysis; Cleaning; Data warehouses; Dictionaries; Heuristic algorithms; Mobile communication; Standards; HADCLEAN; PNRS; data warehouse; near miss; phonetic algorithm; transitive closure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Retrieval & Knowledge Management (CAMP), 2012 International Conference on
  • Conference_Location
    Kuala Lumpur
  • Print_ISBN
    978-1-4673-1091-8
  • Type

    conf

  • DOI
    10.1109/InfRKM.2012.6205022
  • Filename
    6205022