• DocumentCode
    2515133
  • Title

    Clustering XML Search Results Based on Content and Structure Similarity

  • Author

    Min-Juan, Zhong ; Chang-Xuan, Wan ; De-Xi, Liu ; Xian-Pei, Jiao

  • Author_Institution
    Sch. of Inf. & Technol., Jiangxi Univ. of Finance & Econ., Nanchang, China
  • fYear
    2011
  • fDate
    5-6 Nov. 2011
  • Firstpage
    10
  • Lastpage
    14
  • Abstract
    Clustering XML search results is an effective way to improve performance. However, the key problem is how to measure similarity between XML documents. In this paper, we propose a semantic similarity measure method combining content with structure, in which a variety of XML document features, including term element frequency, term inverse element frequency, semantic weight of tag label and level information of the term, are analyzed and applied for computing the similarity between XML documents. In addition, two new performance evaluation methodology, namely ClusterRatio_Relevant and DocuRatio_Relevant, for clustering quality are introduced motivated by the observations of relevant documents distribution and the fact that collection has no classification information. Experiment results show that proposed similarity method(CAS measure)outperforms traditional document clustering(CO measure) in ClusterRatio_Relevant and DocuRatio_Relevant and produces better clustering quality.
  • Keywords
    XML; document handling; pattern classification; pattern clustering; ClusterRatio_Relevant; DocuRatio_Relevant; XML documents; XML search results clustering; classification information; content similarity; documents distribution; structure similarity; Educational institutions; Frequency measurement; Multimedia systems; Performance evaluation; Semantics; Weight measurement; XML; XML Clustering; node level; relevant cluster ratio; relevant document distribution ratio; tag weight;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Management of e-Commerce and e-Government (ICMeCG), 2011 Fifth International Conference on
  • Conference_Location
    Hubei
  • Print_ISBN
    978-1-4577-1659-1
  • Type

    conf

  • DOI
    10.1109/ICMeCG.2011.28
  • Filename
    6092622