• DocumentCode
    1624906
  • Title

    Efficient Priority Queue algorithm and Strainer mode Technique for identification and eradication of duplications in XML records

  • Author

    Preetha, Evangeline D. ; Padmasree ; Anandhakumar, P. ; Deepti Raj, G. ; Rajendran, T.

  • Author_Institution
    Dept. of Comput. Technol., Anna Univ., Chennai, India
  • fYear
    2013
  • Firstpage
    106
  • Lastpage
    113
  • Abstract
    Detecting duplicates in the database is necessary but eradicating those detected duplicates is an important task. Inorder to retrieve valuable data some form of data preprocessing must be executed. Data scrubbing or extracting data with quality is one of the data preprocessing techniques and the most decisive task in this is detecting duplicate records. Databases may contain duplicate record which may be due to data entry errors, standardized abbreviations difference in the schemas of the records. If the database contains duplicate records, it is intricate to examine the database as well as difficult to mine the desired data. In this paper, we identify the duplicate records in bibliographical XML database by using a simple yet an efficient algorithm which uses the structure of a Priority Queue. After this, elimination of duplicate records are carried out using Strainer mode Technique which paves way to maintain a reasonable data quality in database. When compared with the existing method, the proposed method proves to be the best in threshold value having 0.8 as the threshold point for duplicate detection.
  • Keywords
    XML; document handling; queueing theory; relational databases; XML records; bibliographical XML database; data extraction; data preprocessing techniques; data quality; data scrubbing; duplicate detection; duplication eradication; duplication identification; extensible markup language; priority queue algorithm; strainer mode technique; threshold value; Abstracts; Complexity theory; Standards; Wireless sensor networks; Duplicate Eradication; Priority Queue Algorithm; Strainer;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computing (ICoAC), 2013 Fifth International Conference on
  • Conference_Location
    Chennai
  • Print_ISBN
    978-1-4799-3447-8
  • Type

    conf

  • DOI
    10.1109/ICoAC.2013.6921935
  • Filename
    6921935