Author_Institution :
Grad. Sch. of Inf. Sci., Univ. of Tokyo, Tokyo, Japan
Abstract :
The similarity between the semantic relations that exist between two word pairs is defined as their relational similarity. For example, the semantic relation, is a large holds between the words in the word pair (lion, cat) and (ostrich, bird), because lion is a large cat, and ostrich is the largest living bird on earth. Consequently, the two word pairs, (lion, cat) and (ostrich, bird), are considered to be relationally similar. A high degree of relational similarity can be observed between analogous pairs of words. Measuring the relational similarity between word pairs is important in numerous natural language processing tasks such as solving word analogy questions, classifying noun-modifier relations and disambiguating word senses. We propose a supervised ranking-based method to detect relationally similar word pairs to a given word pair using information retrieved from a Web search engine. First, each pair of words is represented by a vector of automatically extracted lexical patterns. Then a ranking Support Vector Machine is trained to recognize word pairs with similar semantic relations to a given word pair. To train and evaluate the proposed method, we use a benchmark dataset that contains 374 SAT multiple-choice word-analogy questions. To represent the relations that exist between two word pairs, we experiment with 11 different feature functions, including both symmetric and asymmetric feature functions. Our experimental results show that the proposed ranking-based approach outperforms several previously proposed relational similarity measures on this benchmark dataset, achieving an SAT score of 46.9.
Keywords :
Internet; computational linguistics; feature extraction; information retrieval; search engines; support vector machines; text analysis; word processing; Web search engine; feature function; information retrieval; lexical pattern extraction; natural language processing; noun-modifier relation; relational similarity; semantic relation; supervised ranking; support vector machine; word analogy; word pair detection; word pair recognition; word sense disambiguation; Birds; Kernel; Measurement; Search engines; Semantics; Support vector machines; Web search; ranking SVMs; relational similarity;