Title :
Multistage Chinese collocation extraction
Author :
Xu, Rui-Feng ; Lu, Qin
Author_Institution :
Dept. of Comput., Hong Kong Polytech. Univ., China
Abstract :
Collocation is a recurrent and conventional natural language expression. In this research, Chinese collocations are categorized into four types. Based on the statistical analysis of different types of typical collocations, a multi-stage window-based collocation extraction system is designed, in which lexical statistic, synonyms information, syntactic information, and dependency knowledge, are used to extract n-gram collocations and different types of bi-gram collocations separately. Experimental results show that this system achieves a better precision and recall performance, compared with existed statistical collocation extraction techniques.
Keywords :
computational linguistics; natural languages; statistical analysis; dependency knowledge; lexical statistic; multistage Chinese collocation extraction; natural language expression; natural language processing; statistical analysis; statistical collocation extraction techniques; synonyms information; syntactic information; Computational linguistics; Data mining; Frequency; Information retrieval; Natural language processing; Natural languages; Statistical analysis; Statistical distributions; Statistics; Testing; Collocation extraction; multi-stage extraction; natural language processing;
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
DOI :
10.1109/ICMLC.2005.1527504