DocumentCode
2168205
Title
HADCLEAN: A hybrid approach to data cleaning in data warehouses
Author
Paul, A. ; Ganesan, V. ; Challa, J.S. ; Sharma, Yogesh
Author_Institution
Dept. of Comput. Sci. & Inf. Syst., Birla Inst. of Technol. & Sci., Pilani, India
fYear
2012
fDate
13-15 March 2012
Firstpage
136
Lastpage
142
Abstract
Data Cleaning is a very important part of the data warehouse management process. It is not a very easy process as many different types of unclean data (bad data, incomplete data, typos, etc) can be present. Also, whether a data is clean or dirty is highly dependent on the nature and source of the raw data. Many attempts have been made to clean the data using blocking algorithms, phonetic algorithms, etc. In this paper an attempt has been made to provide a hybrid approach HADCLEAN for cleaning data which combines modified versions of PNRS and Transitive closure algorithms.
Keywords
data analysis; data warehouses; HADCLEAN; PNRS; blocking algorithm; data cleaning; data warehouse management process; hybrid approach; phonetic algorithm; raw data; transitive closure algorithm; Algorithm design and analysis; Cleaning; Data warehouses; Dictionaries; Heuristic algorithms; Mobile communication; Standards; HADCLEAN; PNRS; data warehouse; near miss; phonetic algorithm; transitive closure;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Retrieval & Knowledge Management (CAMP), 2012 International Conference on
Conference_Location
Kuala Lumpur
Print_ISBN
978-1-4673-1091-8
Type
conf
DOI
10.1109/InfRKM.2012.6205022
Filename
6205022
Link To Document