Title of article :
Exploiting noun phrases and semantic relationships for text document clustering
Author/Authors :
Hai-Tao Zheng، نويسنده , , Bo-Yeong Kang، نويسنده , , Hong-Gee Kim، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2009
Pages :
14
From page :
2249
To page :
2262
Abstract :
Text document clustering plays an important role in providing better document retrieval, document browsing, and text mining. Traditionally, clustering techniques do not consider the semantic relationships between words, such as synonymy and hypernymy. To exploit semantic relationships, ontologies such as WordNet have been used to improve clustering results. However, WordNet-based clustering methods mostly rely on single-term analysis of text; they do not perform any phrase-based analysis. In addition, these methods utilize synonymy to identify concepts and only explore hypernymy to calculate concept frequencies, without considering other semantic relationships such as hyponymy. To address these issues, we combine detection of noun phrases with the use of WordNet as background knowledge to explore better ways of representing documents semantically for clustering. First, based on noun phrases as well as single-term analysis, we exploit different document representation methods to analyze the effectiveness of hypernymy, hyponymy, holonymy, and meronymy. Second, we choose the most effective method and compare it with the WordNet-based clustering method proposed by others. The experimental results show the effectiveness of semantic relationships for clustering are (from highest to lowest): hypernymy, hyponymy, meronymy, and holonymy. Moreover, we found that noun phrase analysis improves the WordNet-based clustering method.
Keywords :
Holonymy , Meronymy , Ontology , wordnet , Text document clustering , noun phrase , Hypernymy , hyponymy
Journal title :
Information Sciences
Serial Year :
2009
Journal title :
Information Sciences
Record number :
1213650
Link To Document :
بازگشت