• DocumentCode
    2862555
  • Title

    Collaborative Clustering of XML Documents

  • Author

    Greco, Sergio ; Gullo, Francesco ; Ponti, Giovanni ; Tagarelli, Andrea

  • Author_Institution
    Dept. of Electron., Comput. & Syst. Sci. (DEIS), Univ. of Calabria, Arcavacata di Rende, Italy
  • fYear
    2009
  • fDate
    22-25 Sept. 2009
  • Firstpage
    579
  • Lastpage
    586
  • Abstract
    This paper presents a distributed collaborative approach to XML document clustering. According to a previous study, XML documents are mapped to a transactional domain, based on a data representation model which exploits the notion of XML tree tuple. This XML transactional model is well-suited to the identification of semantically cohesive substructures from XML documents, according to structure as well as content information. The proposed clustering framework employs a centroid-based partitional clustering paradigm in a distributed environment. Each peer in the network is allowed to compute a local clustering solution over its own data, then exchanges cluster centroids with other peers. The exchanged centroids correspond to recommendations offered by a peer to peers allowed to compute global representatives. Exploiting these recommendations, each peer becomes responsible for computing a global set of centroids for a given set of clusters. The overall clustering solution is hence computed in a collaborative way according to data from all the peers. Our approach has been evaluated on real XML document collections varying the number of peers. Results have shown that collaborative clustering leads to accurate overall clustering solutions with a relatively low load in the network.
  • Keywords
    XML; data structures; groupware; pattern clustering; peer-to-peer computing; trees (mathematics); XML documents; XML tree tuple; centroid-based partitional clustering; collaborative clustering; data representation; distributed collaborative approach; Collaborative work; Computer networks; Concurrent computing; Distributed computing; Information retrieval; International collaboration; Parallel processing; Peer to peer computing; Query processing; XML; XML; XML structure and content information; collaborative distributed clustering; transactional data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing Workshops, 2009. ICPPW '09. International Conference on
  • Conference_Location
    Vienna
  • ISSN
    1530-2016
  • Print_ISBN
    978-1-4244-4923-1
  • Electronic_ISBN
    1530-2016
  • Type

    conf

  • DOI
    10.1109/ICPPW.2009.58
  • Filename
    5366138