DocumentCode
495513
Title
Ensemble Similarity Measures for Clustering Terms
Author
Ittoo, Ashwin ; Maruster, Laura
Author_Institution
Fac. of Econ. & Bus., Univ. of Groningen, Groningen, Netherlands
Volume
4
fYear
2009
fDate
March 31 2009-April 2 2009
Firstpage
315
Lastpage
319
Abstract
Clustering semantically related terms is crucial for many applications such as document categorization, and word sense disambiguation. However, automatically identifying semantically similar terms is challenging. We present a novel approach for automatically determining the degree of relatedness between terms to facilitate their subsequent clustering. Using the analogy of ensemble classifiers in machine learning, we combine multiple techniques like contextual similarity and semantic relatedness to boost the accuracy of our computations. A new method, based on Yarowskypsilas word sense disambiguation approach, to generate high-quality topic signatures for contextual similarity computations, is presented. A technique to measure semantic relatedness between multi-word terms, based on the work of Hirst and St. Onge is also proposed. Experimental evaluation reveals that our method outperforms similar related works. We also investigate the effects of assigning different importance levels to the different similarity measures based on the corpus characteristics.
Keywords
learning (artificial intelligence); pattern classification; pattern clustering; text analysis; contextual similarity; document categorization; ensemble classifier; ensemble similarity measure; high-quality topic signature; machine learning; semantically related term clustering; text analysis; word sense disambiguation; Application software; Computer science; Glass; Length measurement; Machine learning; Mutual information; Ontologies; Position measurement; Pressing; natural language processing; semantic relatetdness; text clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Engineering, 2009 WRI World Congress on
Conference_Location
Los Angeles, CA
Print_ISBN
978-0-7695-3507-4
Type
conf
DOI
10.1109/CSIE.2009.764
Filename
5171010
Link To Document