• DocumentCode
    2796438
  • Title

    PH-SSBM: Phrase Semantic Similarity Based Model for Document Clustering

  • Author

    Gad, Walaa K. ; Kamel, Mohamed S.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
  • Volume
    2
  • fYear
    2009
  • fDate
    Nov. 30 2009-Dec. 1 2009
  • Firstpage
    197
  • Lastpage
    200
  • Abstract
    In this paper, a novel document representation model the phrases semantic similarity based model (PHSSBM), is proposed. This model combines phrases analysis as well as words analysis with the use of WordNet as background knowledge to explore better ways of documents representation for clustering. The PH-SSBM assigns semantic weights to both document words and phrases. The new weights reflect the semantic relatedness between documents terms and capture the semantic information in the documents. The PH-SSBM finds similarity between documents based on matching terms (phrases and words) and their semantic weights. Experimental results show that the phrases semantic similarity based model (PH-SSBM) in conjunction with WordNet has a promising performance improvement for text clustering.
  • Keywords
    text analysis; word processing; PH-SSBM; WordNet; document clustering; document representation; matching terms; phrases semantic similarity based model; text clustering; Clustering algorithms; Entropy; Fellows; Frequency; Knowledge acquisition; Ontologies; Performance evaluation; Speech; Testing; Text mining; Clustering; Phrases-based analysis; semantic similarity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge Acquisition and Modeling, 2009. KAM '09. Second International Symposium on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-0-7695-3888-4
  • Type

    conf

  • DOI
    10.1109/KAM.2009.191
  • Filename
    5362122