DocumentCode :
3540620
Title :
New information content metric and nominalization relation for a new WordNet-based method to measure the semantic relatedness
Author :
Taieb, Mohamed Ali Hadj ; Ben Aouicha, Mohamed ; Tmar, Mohamed ; Ben Hamadou, Abdelmajid
Author_Institution :
MIRACL Lab., FSEGS, Sfax, Tunisia
fYear :
2011
fDate :
1-2 Sept. 2011
Firstpage :
51
Lastpage :
58
Abstract :
Semantic similarity techniques are used to compute the semantic similarity (common shared information) between two concepts according to certain language or domain resources like ontologies, taxonomies, corpora, etc. Semantic similarity techniques constitute important components in most Information Retrieval (IR) and knowledge-based systems. Taking semantics into account passes by the use of external semantic resources coupled with the initial documentation on which it is necessary to have semantic similarity measurements to carry out comparisons between concepts. This paper presents a new approach for measuring semantic relatedness between words and concepts. It combines a new information content (IC) metric using the WordNet thesaurus and the nominalization relation provided by the Java WordNet Library (JWNL). Specifically, the proposed method offers a thorough use of the relation hypernym/hyponym (noun and verb “is a” taxonomy) without external corpus statistical information. Mainly, we use the subgraph formed by hypernyms of the concerned concept which inherits the whole features of its hypernyms and we quantify the contribution of each concept pertaining to this subgraph in its information content. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value 0.70 with a benchmark based on human similarity judgments and especially a large dataset composed of 260 Finkelstein word pairs (Appendix 1 and 2).
Keywords :
Java; information retrieval; knowledge based systems; natural language processing; ontologies (artificial intelligence); semantic Web; statistical analysis; Finkelstein word pairs; Java WordNet library; WordNet based method; WordNet thesaurus; corpora; domain resources; human similarity judgments; information content metric; information retrieval; knowledge based systems; language resources; nominalization relation; ontologies; semantic relatedness; semantic similarity techniques; statistical information; taxonomies; Humans; Integrated circuits; Intelligent systems; Joining processes; Measurement; Semantics; Taxonomy; Information Content; JWNL; Nominalization; Semantic Similarity; WordNet;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cybernetic Intelligent Systems (CIS), 2011 IEEE 10th International Conference on
Conference_Location :
London
Print_ISBN :
978-1-4673-0687-4
Type :
conf
DOI :
10.1109/CIS.2011.6169134
Filename :
6169134
Link To Document :
بازگشت