DocumentCode
2796438
Title
PH-SSBM: Phrase Semantic Similarity Based Model for Document Clustering
Author
Gad, Walaa K. ; Kamel, Mohamed S.
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
Volume
2
fYear
2009
fDate
Nov. 30 2009-Dec. 1 2009
Firstpage
197
Lastpage
200
Abstract
In this paper, a novel document representation model the phrases semantic similarity based model (PHSSBM), is proposed. This model combines phrases analysis as well as words analysis with the use of WordNet as background knowledge to explore better ways of documents representation for clustering. The PH-SSBM assigns semantic weights to both document words and phrases. The new weights reflect the semantic relatedness between documents terms and capture the semantic information in the documents. The PH-SSBM finds similarity between documents based on matching terms (phrases and words) and their semantic weights. Experimental results show that the phrases semantic similarity based model (PH-SSBM) in conjunction with WordNet has a promising performance improvement for text clustering.
Keywords
text analysis; word processing; PH-SSBM; WordNet; document clustering; document representation; matching terms; phrases semantic similarity based model; text clustering; Clustering algorithms; Entropy; Fellows; Frequency; Knowledge acquisition; Ontologies; Performance evaluation; Speech; Testing; Text mining; Clustering; Phrases-based analysis; semantic similarity;
fLanguage
English
Publisher
ieee
Conference_Titel
Knowledge Acquisition and Modeling, 2009. KAM '09. Second International Symposium on
Conference_Location
Wuhan
Print_ISBN
978-0-7695-3888-4
Type
conf
DOI
10.1109/KAM.2009.191
Filename
5362122
Link To Document