DocumentCode
443977
Title
Semantic based clustering of Web documents
Author
Lin, Tsau Young ; Chiang, I-Jen
Author_Institution
Dept. of Comput. Sci., San Jose State Univ., CA, USA
Volume
1
fYear
2005
fDate
25-27 July 2005
Firstpage
189
Abstract
A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed: a primitive concept is represented by a top dimension simplex, and a connected component represents a concept. Based on these structures, documents can be clustered into some meaningful classes. Experiments with three different data sets from web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and hierarchical clustering (HAC). This abstract geometric model seems have captured the intrinsic semantics of the documents.
Keywords
document handling; geometry; pattern clustering; semantic Web; Web document; Web page; abstract geometric model; data set; semantic document collection; simplicial complex geometry; unsupervised clustering; Biomedical informatics; Clustering algorithms; Computer science; Geometry; Humans; Microcomputers; Skeleton; Solid modeling; Topology; Web pages; clustering; document; polyhedron; semantics; web;
fLanguage
English
Publisher
ieee
Conference_Titel
Granular Computing, 2005 IEEE International Conference on
Print_ISBN
0-7803-9017-2
Type
conf
DOI
10.1109/GRC.2005.1547264
Filename
1547264
Link To Document