DocumentCode
441984
Title
Multistage Chinese collocation extraction
Author
Xu, Rui-Feng ; Lu, Qin
Author_Institution
Dept. of Comput., Hong Kong Polytech. Univ., China
Volume
5
fYear
2005
fDate
18-21 Aug. 2005
Firstpage
3254
Abstract
Collocation is a recurrent and conventional natural language expression. In this research, Chinese collocations are categorized into four types. Based on the statistical analysis of different types of typical collocations, a multi-stage window-based collocation extraction system is designed, in which lexical statistic, synonyms information, syntactic information, and dependency knowledge, are used to extract n-gram collocations and different types of bi-gram collocations separately. Experimental results show that this system achieves a better precision and recall performance, compared with existed statistical collocation extraction techniques.
Keywords
computational linguistics; natural languages; statistical analysis; dependency knowledge; lexical statistic; multistage Chinese collocation extraction; natural language expression; natural language processing; statistical analysis; statistical collocation extraction techniques; synonyms information; syntactic information; Computational linguistics; Data mining; Frequency; Information retrieval; Natural language processing; Natural languages; Statistical analysis; Statistical distributions; Statistics; Testing; Collocation extraction; multi-stage extraction; natural language processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location
Guangzhou, China
Print_ISBN
0-7803-9091-1
Type
conf
DOI
10.1109/ICMLC.2005.1527504
Filename
1527504
Link To Document