DocumentCode :
3300972
Title :
Exploiting lexical information for function tag labeling
Author :
Yuan, Caixia ; Wang, Xiaojie ; Ren, Fuji
Author_Institution :
Fac. of Eng., Univ. of Tokushima. Japan, Tokushima
fYear :
2008
fDate :
19-22 Oct. 2008
Firstpage :
1
Lastpage :
8
Abstract :
This paper proposes an novel approach to annotate function tags for unparsed text. What distinguishes our work from other attempts in such task is that we assign function tags directly basing on lexical information other than on parsed trees. In order to demonstrate the effectiveness and versatility of our method, we investigate two statistical models for automatic annotation, one is log-linear maximum entropy model and the other is margin maximum based support vector machine model, which achieve the best F-score of 82.8 and 86.4 respectively when tested on the text from Penn Chinese Treebank. We also quantity the effect of POS tagger accuracy on system performance. Our results indicate that the function tag types could be determined via flexible and powerful feature representations from words, POS tags and word position indicators, and that, similarly to syntactic parsing, the main difficulty lies in complex constituents with long-distance dependency.
Keywords :
grammars; maximum entropy methods; natural language processing; statistical analysis; support vector machines; text analysis; POS tagger; automatic annotation; function tag labeling; lexical information; log-linear maximum entropy; margin maximum based support vector machine; statistical model; syntactic parsing; Automatic testing; Data mining; Entropy; Labeling; Natural languages; Power engineering and energy; Power system modeling; Support vector machines; System performance; Tagging; Chinese language; Function tagging; unparsed text;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-4515-8
Electronic_ISBN :
978-1-4244-2780-2
Type :
conf
DOI :
10.1109/NLPKE.2008.4906787
Filename :
4906787
Link To Document :
بازگشت