Title :
Utilizing background corpus and dictionary to calculate similarity between unknown words
Author :
Fan, Xinghua ; Chen, Xianlin ; Hu, Hongge
Author_Institution :
College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, 400065, China
Abstract :
This paper presents a method of utilizing background corpus and dictionary to calculate similarity between unknown words. In the method, the best concept expression of unknown word in corpus was obtained from the background of it, then constructed context for the best concept expression. The connotation meaning of unknown word was determined by the difference between the context of the best concept expression and its own context. The similarity between unknown words was calculated by utilizing semantic dictionary. This method avoids the problems of mistaken segmentation and abused segmentation, which exist in the traditional method of calculating similarity between unknown words, which is based on segmentation strategy. Experimental results show that the method proposed in this paper is high effective.
Keywords :
Computational modeling; Computer science; Context; Dictionaries; Semantics; Statistical analysis; Telecommunications; HowNet; segmentation; similarity of words; unknown word;
Conference_Titel :
Information Science and Engineering (ICISE), 2010 2nd International Conference on
Conference_Location :
Hangzhou, China
Print_ISBN :
978-1-4244-7616-9
DOI :
10.1109/ICISE.2010.5690768