DocumentCode :
1909862
Title :
Chinese Typed Collocation Extraction using Corpus-based Syntactic Collocation Patterns
Author :
LI, Wanyin ; Lu, Qin ; Liu, James
Author_Institution :
Hong Kong Polytech. Univ., Kowloon
fYear :
2007
fDate :
Aug. 30 2007-Sept. 1 2007
Firstpage :
248
Lastpage :
255
Abstract :
Collocations play significant role in many application and extraction them automatically is useful in NLP. Syntactic-based phrase patterns used in collocation extraction have brought advantages due to the well-formedness of results and automatically classifying the candidates into syntactically congeneric categories. However, due to the language independency, the arbitrary choice of syntactic patterns for target collocations brings drawbacks for evaluation as well as adaptation for a new language. This work presents a corpus-driven framework to generate collocation templates for nouns and verbs phrase at first and then integrate them with statistical association measures for noun/verb phrase collocation extraction, namely typed collocation extraction. The experiment results show a higher average precision of 84.80% and a so called local recall value of 55.99% based on a randomly selected noun and verb headwords.
Keywords :
natural language processing; statistical analysis; Chinese typed collocation extraction; collocation templates; corpus-based syntactic collocation patterns; phrase collocation extraction; statistical association measures; syntactic-based phrase patterns; syntactically congeneric categories; target collocations; Data mining; Entropy; Explosions; Extraterrestrial phenomena; Frequency; Pattern analysis; Pattern matching; Statistical analysis; Tagging; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1611-0
Electronic_ISBN :
978-1-4244-1611-0
Type :
conf
DOI :
10.1109/NLPKE.2007.4368039
Filename :
4368039
Link To Document :
بازگشت