DocumentCode :
2054697
Title :
Retrieving Representative Structures from XML Documents Using Clustering Techniques
Author :
Huang, Yin-Fu ; Liou, Po-Lun
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Yunlin Univ. of Sci. & Technol., Douliou, Taiwan
fYear :
2011
fDate :
12-14 Sept. 2011
Firstpage :
332
Lastpage :
339
Abstract :
In the paper, we addressed the problem of finding the common structures in a collection of XML documents. Since an XML document can be represented as a tree structure, the problem how to cluster a collection of XML documents can be considered as how to cluster a collection of tree-structured documents. First, we used SOM (Self-Organizing Map) with the Jaccard coefficient to cluster XML documents. Then, an efficient sequential mining method called GST was applied to find maximum frequent sequences. Finally, we merged the maximum frequent sequences to produce the common structures in a cluster.
Keywords :
XML; information retrieval; pattern clustering; self-organising feature maps; tree data structures; Jaccard coefficient; SOM; XML documents; clustering techniques; representative structure retrieval; selforganizing map; tree structured documents; Clustering algorithms; Clustering methods; Data mining; Electronic mail; Filtering; Merging; XML; XML document; clustering; common structure; sequential pattern mining; tree-structured;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligence and Security Informatics Conference (EISIC), 2011 European
Conference_Location :
Athens
Print_ISBN :
978-1-4577-1464-1
Electronic_ISBN :
978-0-7695-4406-9
Type :
conf
DOI :
10.1109/EISIC.2011.16
Filename :
6061227
Link To Document :
بازگشت