DocumentCode :
2455991
Title :
XML Duplicate Detection with Improved network pruning algorithm
Author :
Borate, Vishal Kisan ; Giri, Sudipta
Author_Institution :
Dept. of Inf. Technol., MAEER´S MIT Coll. of Eng., Pune, India
fYear :
2015
fDate :
8-10 Jan. 2015
Firstpage :
1
Lastpage :
5
Abstract :
Duplicate Detection is critical task of any database of any organization. Duplicates are nothing but the same real time entities or objects are presented in the form of different structure and in the different formats. We can find out the duplicates in relational data, in complex data and hierarchical data like XML. There are lots of works already presented in the past for finding the duplicates in the relational data. But nowadays there is more focus on finding duplicates in the XML data. Because of XML is very popular for data storing and extensively used for data exchange between the organizations. Here we have done an extensive literature survey on this topic and proposed a duplicate detection method that incorporates some of the existing paper´s ideas and some of our original ideas. In addition to improving the efficiency and effectiveness, we also checks for its typographical errors when comparing the two XML elements. To test the correctness of Improved network pruning method, we are comparing it with existing duplicate detection system, and giving more focus on how we get higher precision and recall values in the various datasets we would used.
Keywords :
XML; database management systems; electronic data interchange; XML duplicate detection; XML elements; complex data; data exchange; data storing; database; hierarchical data; network pruning algorithm; relational data; typographical errors; Bayes methods; Color; Data mining; Databases; Image color analysis; Object recognition; XML; Bayesian Network; Data Duplication; Data Mining; KDD; Precision; Recall; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pervasive Computing (ICPC), 2015 International Conference on
Conference_Location :
Pune
Type :
conf
DOI :
10.1109/PERVASIVE.2015.7087007
Filename :
7087007
Link To Document :
بازگشت