• DocumentCode
    443977
  • Title

    Semantic based clustering of Web documents

  • Author

    Lin, Tsau Young ; Chiang, I-Jen

  • Author_Institution
    Dept. of Comput. Sci., San Jose State Univ., CA, USA
  • Volume
    1
  • fYear
    2005
  • fDate
    25-27 July 2005
  • Firstpage
    189
  • Abstract
    A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed: a primitive concept is represented by a top dimension simplex, and a connected component represents a concept. Based on these structures, documents can be clustered into some meaningful classes. Experiments with three different data sets from web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and hierarchical clustering (HAC). This abstract geometric model seems have captured the intrinsic semantics of the documents.
  • Keywords
    document handling; geometry; pattern clustering; semantic Web; Web document; Web page; abstract geometric model; data set; semantic document collection; simplicial complex geometry; unsupervised clustering; Biomedical informatics; Clustering algorithms; Computer science; Geometry; Humans; Microcomputers; Skeleton; Solid modeling; Topology; Web pages; clustering; document; polyhedron; semantics; web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Granular Computing, 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9017-2
  • Type

    conf

  • DOI
    10.1109/GRC.2005.1547264
  • Filename
    1547264