DocumentCode
2455991
Title
XML Duplicate Detection with Improved network pruning algorithm
Author
Borate, Vishal Kisan ; Giri, Sudipta
Author_Institution
Dept. of Inf. Technol., MAEER´S MIT Coll. of Eng., Pune, India
fYear
2015
fDate
8-10 Jan. 2015
Firstpage
1
Lastpage
5
Abstract
Duplicate Detection is critical task of any database of any organization. Duplicates are nothing but the same real time entities or objects are presented in the form of different structure and in the different formats. We can find out the duplicates in relational data, in complex data and hierarchical data like XML. There are lots of works already presented in the past for finding the duplicates in the relational data. But nowadays there is more focus on finding duplicates in the XML data. Because of XML is very popular for data storing and extensively used for data exchange between the organizations. Here we have done an extensive literature survey on this topic and proposed a duplicate detection method that incorporates some of the existing paper´s ideas and some of our original ideas. In addition to improving the efficiency and effectiveness, we also checks for its typographical errors when comparing the two XML elements. To test the correctness of Improved network pruning method, we are comparing it with existing duplicate detection system, and giving more focus on how we get higher precision and recall values in the various datasets we would used.
Keywords
XML; database management systems; electronic data interchange; XML duplicate detection; XML elements; complex data; data exchange; data storing; database; hierarchical data; network pruning algorithm; relational data; typographical errors; Bayes methods; Color; Data mining; Databases; Image color analysis; Object recognition; XML; Bayesian Network; Data Duplication; Data Mining; KDD; Precision; Recall; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Pervasive Computing (ICPC), 2015 International Conference on
Conference_Location
Pune
Type
conf
DOI
10.1109/PERVASIVE.2015.7087007
Filename
7087007
Link To Document