DocumentCode
1909800
Title
Phrase-based Part-of-Speech Tagging
Author
Finch, Andrew ; Sumita, Eiichiro
Author_Institution
NICTf-ATRJ Kyoto, Kyoto
fYear
2007
fDate
Aug. 30 2007-Sept. 1 2007
Firstpage
215
Lastpage
220
Abstract
This paper presents a new approach to part-of-speech (POS) tagging in which the basic unit being tagged is a contiguous sequence of words rather than a single word. We run experiments on two different tagsets: the UPENN treebank and a treebank annotated with more ambiguous tags that have a semantic component. We show that the phrase-based system alone is a respectable tagger that exceeds the performance of the ME tagger on the ambiguous tagset. Moreover, when a log-linear model is built using features from both phrase-and word-based techniques, the tagging accuracy improved on both of our data sets yielding the highest reported performance to date on the more ambiguous tagset.
Keywords
natural languages; text analysis; UPENN treebank; contiguous word sequence; log-linear model; phrase-based part-of-speech tagging; Context modeling; Degradation; Entropy; History; Labeling; Parameter estimation; Predictive models; Radio access networks; System performance; Tagging;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-1611-0
Electronic_ISBN
978-1-4244-1611-0
Type
conf
DOI
10.1109/NLPKE.2007.4368036
Filename
4368036
Link To Document