Title :
Computing Word Similarity on Large-Scale Corpus
Author :
Xu, Tao ; Qu, Weiguang ; Tang, Xuri ; Ding, Dexin ; Li, Bin ; Li, Hui
Author_Institution :
Sch. of Comput. Sci., Nanjing Normal Univ., Nanjing, China
Abstract :
This paper proposes a novel approach for word similarity computation based on word sense vectors. The word sense vector is built using HIT-IR Tongyici Cilin (extended) for concept generalization and is further modified by the use of relative and absolute frequency filters. Experiments show that the approach not only overcomes the problem of similarity computation of unseen words but also yields a result closer to human judgment when compared to word similarity computation approaches based on dictionaries.
Keywords :
computational linguistics; dictionaries; natural language processing; absolute frequency filters; dictionaries; large-scale corpus; natural language processing; relative filters; word sense vectors; word similarity computation; Computer science; Dictionaries; Frequency; Humans; Information security; Large-scale systems; Multidimensional systems; Natural language processing; Natural languages; Tagging;
Conference_Titel :
Innovative Computing, Information and Control (ICICIC), 2009 Fourth International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-1-4244-5543-0
DOI :
10.1109/ICICIC.2009.145