• DocumentCode
    495513
  • Title

    Ensemble Similarity Measures for Clustering Terms

  • Author

    Ittoo, Ashwin ; Maruster, Laura

  • Author_Institution
    Fac. of Econ. & Bus., Univ. of Groningen, Groningen, Netherlands
  • Volume
    4
  • fYear
    2009
  • fDate
    March 31 2009-April 2 2009
  • Firstpage
    315
  • Lastpage
    319
  • Abstract
    Clustering semantically related terms is crucial for many applications such as document categorization, and word sense disambiguation. However, automatically identifying semantically similar terms is challenging. We present a novel approach for automatically determining the degree of relatedness between terms to facilitate their subsequent clustering. Using the analogy of ensemble classifiers in machine learning, we combine multiple techniques like contextual similarity and semantic relatedness to boost the accuracy of our computations. A new method, based on Yarowskypsilas word sense disambiguation approach, to generate high-quality topic signatures for contextual similarity computations, is presented. A technique to measure semantic relatedness between multi-word terms, based on the work of Hirst and St. Onge is also proposed. Experimental evaluation reveals that our method outperforms similar related works. We also investigate the effects of assigning different importance levels to the different similarity measures based on the corpus characteristics.
  • Keywords
    learning (artificial intelligence); pattern classification; pattern clustering; text analysis; contextual similarity; document categorization; ensemble classifier; ensemble similarity measure; high-quality topic signature; machine learning; semantically related term clustering; text analysis; word sense disambiguation; Application software; Computer science; Glass; Length measurement; Machine learning; Mutual information; Ontologies; Position measurement; Pressing; natural language processing; semantic relatetdness; text clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Engineering, 2009 WRI World Congress on
  • Conference_Location
    Los Angeles, CA
  • Print_ISBN
    978-0-7695-3507-4
  • Type

    conf

  • DOI
    10.1109/CSIE.2009.764
  • Filename
    5171010