• DocumentCode
    3645086
  • Title

    Optimization of Algorithm to Identification of Duplicate Tuples through Similarity Phonetic Based on Multithreading

  • Author

    Tiago Luis Andrade;Rogeria Cristiane Gratao de Souza;Maurizio Babini;Carlos Roberto Valêncio

  • Author_Institution
    Depto. de Cienc. de Comput. e Estatistica, Univ. Estadual Paulista - Unesp, Sao Jose do Rio Preto, Brazil
  • fYear
    2011
  • Firstpage
    299
  • Lastpage
    304
  • Abstract
    Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this.
  • Keywords
    "Databases","Manuals","Encoding","Algorithm design and analysis","Multithreading","Instruction sets","Reliability"
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2011 12th International Conference on
  • Print_ISBN
    978-1-4577-1807-6
  • Type

    conf

  • DOI
    10.1109/PDCAT.2011.58
  • Filename
    6118917