• DocumentCode
    2835036
  • Title

    Handling Noisy Data using Attribute Selection and Smart Tokens

  • Author

    Tamilselvi, Jebamalar J. ; Saravanan, V.

  • Author_Institution
    Dept. of Comput. Applic., Karunya Univ., Coimbatore
  • fYear
    2008
  • fDate
    Aug. 29 2008-Sept. 2 2008
  • Firstpage
    770
  • Lastpage
    774
  • Abstract
    Data cleaning is a process of identifying or determining expected problem when integrating data from different sources or from a single source. There are so many problems can be occurred in the data warehouse while loading or integrating data. The main problem in data warehouse is noisy data. This noisy data error is due to the misuse of abbreviations, data entry mistakes, duplicate records and spelling errors. The proposed algorithm will be efficient in handling the noisy data by expanding abbreviation, removing unimportant characters and eliminating duplicates. The attribute selection algorithm is used for the attribute selection before the token formation. An attribute selection algorithm and token formation algorithm is used for data cleaning to reduce a complexity of data cleaning process and to clean data flexibly and effortlessly without any confusion. This research work uses smart token to increase the speed of the mining process and improve the quality of the data.
  • Keywords
    data integrity; data mining; data warehouses; attribute selection; data cleaning; data integration; data mining; data warehouse; noisy data handling; smart tokens; Cleaning; Computer applications; Computer science; Data mining; Data warehouses; Databases; Information resources; Information technology; Sorting; Data Cleaning; Data Quality; Data Warehousing; Smart Tokens;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Technology, 2008. ICCSIT '08. International Conference on
  • Conference_Location
    Singapore
  • Print_ISBN
    978-0-7695-3308-7
  • Type

    conf

  • DOI
    10.1109/ICCSIT.2008.62
  • Filename
    4624972