DocumentCode :
2135659
Title :
XML Documents Clustering Research Based on Weighted Cosine Measure
Author :
Li, Wei ; Li, Xiong-Fei ; Zhao, Yan
Author_Institution :
Coll. of Comput. Sci. & Technol., Jilin Univ., Changchun, China
fYear :
2010
fDate :
18-22 Aug. 2010
Firstpage :
95
Lastpage :
100
Abstract :
Recently, a large amount of work has been done in XML data mining. However, most of the existing work focuses on the snapshot XML data, while XML data is dynamic in practical application. In order to mine knowledge hidden in the frozen structures (FS) which are not changed or very little changed during the historical changing process of an XML document, we present a method for clustering XML documents via FS. Also, a novel algorithm called weighted cosine measure (WCM) improved from the traditional algorithm has been proposed, and using which we can calculate the similarity between two clusters. Otherwise, we propose a method using the agglomerative hierarchical during the cluster process. Experiments results on our new algorithm indicate that the proposed solution performs significantly. XML documents can be effectively clustered and the results of using the WCM are better than using the traditional cosine measure. Then, XML documents in each cluster have similar structures not often changed.
Keywords :
XML; data mining; document handling; pattern clustering; WCM; XML data mining; XML documents clustering research; frozen structures; weighted cosine measurement; Algorithm design and analysis; Clustering algorithms; Companies; Data mining; Merging; Weight measurement; XML; XML; agglomerative hierarchical method; data mining; document clustering; frozen structure; weighted cosine measure;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontier of Computer Science and Technology (FCST), 2010 Fifth International Conference on
Conference_Location :
Changchun, Jilin Province
Print_ISBN :
978-1-4244-7779-1
Type :
conf
DOI :
10.1109/FCST.2010.46
Filename :
5575647
Link To Document :
بازگشت