Title :
Measuring Semantic Similarity between Words Using Wikipedia
Author :
Zhiqiang, Lu ; Werimin, Shao ; Zhenhua, Yu
Author_Institution :
Sch. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
Abstract :
Semantic similarity measures play an important role in the extraction of semantic relations. Semantic similarity measures are widely used in natural language processing (NLP) and information retrieval (IR). This paper presents a new Web-based method for measuring the semantic similarity between words. Different from other methods which are based on taxonomy or search engine in Internet, our method uses snippets from Wikipedia1 to calculate the semantic similarity between words by using cosine similarity and TF-IDF. Also, the stemmer algorithm and stop words are used in preprocessing the snippets from Wikipedia. We set different threshold to evaluate our results in order to decrease the interference from noise and redundancy. Our method was empirically evaluated using Rubenstein-Good enough benchmark dataset. It gives higher correlation value (with 0.615) than some existing methods. Evaluation results show that our method improves ac-curacy and more robust for measuring semantic similarity between words.
Keywords :
Internet; information filtering; natural language processing; search engines; Internet; Rubenstein-Goodenough benchmark dataset; Web-based method; Wikipedia; cosine similarity; information retrieval; natural language processing; search engine; semantic similarity; Data mining; Humans; Information retrieval; Information systems; Internet; Natural language processing; Search engines; Taxonomy; Web pages; Wikipedia; TF-IDF; cosine similarity; semantic similarity; wikipedia;
Conference_Titel :
Web Information Systems and Mining, 2009. WISM 2009. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3817-4
DOI :
10.1109/WISM.2009.59