DocumentCode
3300972
Title
Exploiting lexical information for function tag labeling
Author
Yuan, Caixia ; Wang, Xiaojie ; Ren, Fuji
Author_Institution
Fac. of Eng., Univ. of Tokushima. Japan, Tokushima
fYear
2008
fDate
19-22 Oct. 2008
Firstpage
1
Lastpage
8
Abstract
This paper proposes an novel approach to annotate function tags for unparsed text. What distinguishes our work from other attempts in such task is that we assign function tags directly basing on lexical information other than on parsed trees. In order to demonstrate the effectiveness and versatility of our method, we investigate two statistical models for automatic annotation, one is log-linear maximum entropy model and the other is margin maximum based support vector machine model, which achieve the best F-score of 82.8 and 86.4 respectively when tested on the text from Penn Chinese Treebank. We also quantity the effect of POS tagger accuracy on system performance. Our results indicate that the function tag types could be determined via flexible and powerful feature representations from words, POS tags and word position indicators, and that, similarly to syntactic parsing, the main difficulty lies in complex constituents with long-distance dependency.
Keywords
grammars; maximum entropy methods; natural language processing; statistical analysis; support vector machines; text analysis; POS tagger; automatic annotation; function tag labeling; lexical information; log-linear maximum entropy; margin maximum based support vector machine; statistical model; syntactic parsing; Automatic testing; Data mining; Entropy; Labeling; Natural languages; Power engineering and energy; Power system modeling; Support vector machines; System performance; Tagging; Chinese language; Function tagging; unparsed text;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-4515-8
Electronic_ISBN
978-1-4244-2780-2
Type
conf
DOI
10.1109/NLPKE.2008.4906787
Filename
4906787
Link To Document