Title :
Combining statistical similarity measures for automatic induction of semantic classes
Author :
Pangos, Apostolos ; Iosif, Elias ; Potamianos, Alexandros ; Fosler-Lussier, Eric
Author_Institution :
Dept. of Electron. & Comput. Eng., Tech. Univ. Crete, Chania
Abstract :
In this paper, an unsupervised semantic class induction algorithm is proposed that is based on the principle that similarity of context implies similarity of meaning. Two semantic similarity metrics that are variations of the vector product distance are used in order to measure the semantic distance between words and to automatically generate semantic classes. The first metric computes "wide-context" similarity between words using a "bag-of-words" model, while the second metric computes "narrow-context" similarity using a bigram language model. A hybrid metric that is defined as the linear combination of the wide and narrow-context metrics is also proposed and evaluated. To cluster words into semantic classes an iterative clustering algorithm is used. The semantic metrics are evaluated on two corpora: a semantically heterogeneous Web news domain (HR-Net) and an application-specific travel reservation corpus (ATIS). For the hybrid metric, semantic class member precision of 85% is achieved at 17% recall for the HR-Net task and precision of 85% is achieved at 55% recall for the ATIS task
Keywords :
iterative methods; natural languages; pattern clustering; unsupervised learning; application-specific travel reservation corpus; bigram language model; heterogeneous Web news domain; iterative clustering algorithm; statistical similarity measures; unsupervised semantic class induction algorithm; vector product distance; Application software; Clustering algorithms; Computer science; Data mining; Induction generators; Iterative algorithms; Natural language processing; Natural languages; Ontologies; Speech;
Conference_Titel :
Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on
Conference_Location :
San Juan
Print_ISBN :
0-7803-9478-X
Electronic_ISBN :
0-7803-9479-8
DOI :
10.1109/ASRU.2005.1566510