DocumentCode
2835036
Title
Handling Noisy Data using Attribute Selection and Smart Tokens
Author
Tamilselvi, Jebamalar J. ; Saravanan, V.
Author_Institution
Dept. of Comput. Applic., Karunya Univ., Coimbatore
fYear
2008
fDate
Aug. 29 2008-Sept. 2 2008
Firstpage
770
Lastpage
774
Abstract
Data cleaning is a process of identifying or determining expected problem when integrating data from different sources or from a single source. There are so many problems can be occurred in the data warehouse while loading or integrating data. The main problem in data warehouse is noisy data. This noisy data error is due to the misuse of abbreviations, data entry mistakes, duplicate records and spelling errors. The proposed algorithm will be efficient in handling the noisy data by expanding abbreviation, removing unimportant characters and eliminating duplicates. The attribute selection algorithm is used for the attribute selection before the token formation. An attribute selection algorithm and token formation algorithm is used for data cleaning to reduce a complexity of data cleaning process and to clean data flexibly and effortlessly without any confusion. This research work uses smart token to increase the speed of the mining process and improve the quality of the data.
Keywords
data integrity; data mining; data warehouses; attribute selection; data cleaning; data integration; data mining; data warehouse; noisy data handling; smart tokens; Cleaning; Computer applications; Computer science; Data mining; Data warehouses; Databases; Information resources; Information technology; Sorting; Data Cleaning; Data Quality; Data Warehousing; Smart Tokens;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Technology, 2008. ICCSIT '08. International Conference on
Conference_Location
Singapore
Print_ISBN
978-0-7695-3308-7
Type
conf
DOI
10.1109/ICCSIT.2008.62
Filename
4624972
Link To Document