• Title of article

    Exploiting noun phrases and semantic relationships for text document clustering

  • Author/Authors

    Hai-Tao Zheng، نويسنده , , Bo-Yeong Kang، نويسنده , , Hong-Gee Kim، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2009
  • Pages
    14
  • From page
    2249
  • To page
    2262
  • Abstract
    Text document clustering plays an important role in providing better document retrieval, document browsing, and text mining. Traditionally, clustering techniques do not consider the semantic relationships between words, such as synonymy and hypernymy. To exploit semantic relationships, ontologies such as WordNet have been used to improve clustering results. However, WordNet-based clustering methods mostly rely on single-term analysis of text; they do not perform any phrase-based analysis. In addition, these methods utilize synonymy to identify concepts and only explore hypernymy to calculate concept frequencies, without considering other semantic relationships such as hyponymy. To address these issues, we combine detection of noun phrases with the use of WordNet as background knowledge to explore better ways of representing documents semantically for clustering. First, based on noun phrases as well as single-term analysis, we exploit different document representation methods to analyze the effectiveness of hypernymy, hyponymy, holonymy, and meronymy. Second, we choose the most effective method and compare it with the WordNet-based clustering method proposed by others. The experimental results show the effectiveness of semantic relationships for clustering are (from highest to lowest): hypernymy, hyponymy, meronymy, and holonymy. Moreover, we found that noun phrase analysis improves the WordNet-based clustering method.
  • Keywords
    Holonymy , Meronymy , Ontology , wordnet , Text document clustering , noun phrase , Hypernymy , hyponymy
  • Journal title
    Information Sciences
  • Serial Year
    2009
  • Journal title
    Information Sciences
  • Record number

    1213650