• DocumentCode
    3259121
  • Title

    Improving the Results and Performance of Clustering Bit-encoded XML Documents

  • Author

    Kozielski, Michal

  • Author_Institution
    Inst. of Informatics, Silesian Univ. of Technol., Gliwice
  • fYear
    2006
  • fDate
    Dec. 2006
  • Firstpage
    60
  • Lastpage
    64
  • Abstract
    Clustering XML documents according to their structure is one of the techniques that may improve the effectiveness of XML documents storage and retrieval. One of existing approaches to this problem is to encode XML document structure as a string of bits and cluster such feature vectors. High dimensionality and sparseness of the feature vectors are the weaknesses of this method. The paper presents four methods reducing the dimensionality of the bit feature vectors. Two of these methods are novel. They are dedicated to XML documents and should be applied during the encoding process. The results showed good efficiency of these inner-encoding methods and their ability of improving clustering results in some cases. The methods presented in the paper are tested on two datasets of XML documents having different characteristics
  • Keywords
    XML; pattern clustering; XML document clustering; bit-encoded XML document; inner-encoding; storage retrieval; Algorithm design and analysis; Clustering algorithms; Data mining; Database systems; Encoding; Equations; Informatics; Performance analysis; Testing; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2702-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2006.97
  • Filename
    4063599