DocumentCode :
1624906
Title :
Efficient Priority Queue algorithm and Strainer mode Technique for identification and eradication of duplications in XML records
Author :
Preetha, Evangeline D. ; Padmasree ; Anandhakumar, P. ; Deepti Raj, G. ; Rajendran, T.
Author_Institution :
Dept. of Comput. Technol., Anna Univ., Chennai, India
fYear :
2013
Firstpage :
106
Lastpage :
113
Abstract :
Detecting duplicates in the database is necessary but eradicating those detected duplicates is an important task. Inorder to retrieve valuable data some form of data preprocessing must be executed. Data scrubbing or extracting data with quality is one of the data preprocessing techniques and the most decisive task in this is detecting duplicate records. Databases may contain duplicate record which may be due to data entry errors, standardized abbreviations difference in the schemas of the records. If the database contains duplicate records, it is intricate to examine the database as well as difficult to mine the desired data. In this paper, we identify the duplicate records in bibliographical XML database by using a simple yet an efficient algorithm which uses the structure of a Priority Queue. After this, elimination of duplicate records are carried out using Strainer mode Technique which paves way to maintain a reasonable data quality in database. When compared with the existing method, the proposed method proves to be the best in threshold value having 0.8 as the threshold point for duplicate detection.
Keywords :
XML; document handling; queueing theory; relational databases; XML records; bibliographical XML database; data extraction; data preprocessing techniques; data quality; data scrubbing; duplicate detection; duplication eradication; duplication identification; extensible markup language; priority queue algorithm; strainer mode technique; threshold value; Abstracts; Complexity theory; Standards; Wireless sensor networks; Duplicate Eradication; Priority Queue Algorithm; Strainer;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computing (ICoAC), 2013 Fifth International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4799-3447-8
Type :
conf
DOI :
10.1109/ICoAC.2013.6921935
Filename :
6921935
Link To Document :
بازگشت