DocumentCode :
2664946
Title :
Machine learning for collocation identification
Author :
YANG, Shouxun
Author_Institution :
Foreign Language Teaching & Res. Press, Beijing Foreign Studies Univ., China
fYear :
2003
fDate :
26-29 Oct. 2003
Firstpage :
315
Lastpage :
320
Abstract :
Previous works on automatic identification or extraction of collocations from large scale corpora generally make use of certain statistical measures to test for significance of association to yield n-best collocation candidates for human scrutiny, optionally with linguistic preprocessing and linguistic filtering. The drawback of these approaches is we can only take advantage of one single statistical test (optionally in association with simple frequency threshold), even though we often calculate the values of several statistical tests. Manually exploring a scheme to combine two or more different tests is out of the question. We report experiments with machine learning for collocation identification using a variety of statistical association measurements. In particular, we develop a new decision tree learning algorithm based on C4.5 to be used for learning tasks with unbalanced data. The experiment results are presented and briefly discussed.
Keywords :
computational linguistics; decision trees; learning (artificial intelligence); statistical testing; C4.5 algorithm; collocation extraction; collocation identification; decision tree learning algorithm; linguistic filtering; linguistic preprocessing; machine learning; statistical association measurement; Data mining; Decision trees; Education; Frequency; Humans; Large-scale systems; Machine learning; Machine learning algorithms; Statistics; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
Type :
conf
DOI :
10.1109/NLPKE.2003.1275921
Filename :
1275921
Link To Document :
بازگشت