DocumentCode :
2896736
Title :
A Divide-Conquer Strategy for English Text Chunking
Author :
Liang, Ying-Hong ; Wang, Ni-Hong ; Su, Jian-min ; Ren, Hong-e
Author_Institution :
Sch. of Inf. & Comput. Eng., North East Forestry Univ., Harbin
fYear :
2006
fDate :
13-16 Aug. 2006
Firstpage :
3370
Lastpage :
3375
Abstract :
The traditional English text chunking approach identifies phrases by using only one model and phrases with the same types of features. It has been shown that the limitations of using only one model are that: the use of the same types of features is not suitable for all phrases, and data sparseness may also result. In this paper, the divide-conquer approach is proposed and applied in the identification of English phrases. This strategy divides the task of chunking into several sub-tasks according to sensitive features of each phrase and identifies different phrases in parallel. Then, a two-stage decreasing conflict strategy is used to synthesize each sub-task´s answer. By applying and testing the approach on the public training and test corpus, the F score for arbitrary phrases identification using divide-conquer strategy achieves 94.14% compared to the previous best F score of 94.17%
Keywords :
divide and conquer methods; feature extraction; grammars; learning (artificial intelligence); natural languages; text analysis; English phrase identification; English text chunking approach; data sparseness; divide-conquer strategy; phrase feature identification; shallow parsing method; two-stage decreasing conflict strategy; Cybernetics; Data mining; Electronic mail; Forestry; Information analysis; Laboratories; Learning systems; Machine learning; Natural language processing; Speech processing; Testing; Text processing; Text chunking; divide-conquer strategy; sensitive features;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location :
Dalian, China
Print_ISBN :
1-4244-0061-9
Type :
conf
DOI :
10.1109/ICMLC.2006.258477
Filename :
4028650
Link To Document :
بازگشت