• DocumentCode
    1909800
  • Title

    Phrase-based Part-of-Speech Tagging

  • Author

    Finch, Andrew ; Sumita, Eiichiro

  • Author_Institution
    NICTf-ATRJ Kyoto, Kyoto
  • fYear
    2007
  • fDate
    Aug. 30 2007-Sept. 1 2007
  • Firstpage
    215
  • Lastpage
    220
  • Abstract
    This paper presents a new approach to part-of-speech (POS) tagging in which the basic unit being tagged is a contiguous sequence of words rather than a single word. We run experiments on two different tagsets: the UPENN treebank and a treebank annotated with more ambiguous tags that have a semantic component. We show that the phrase-based system alone is a respectable tagger that exceeds the performance of the ME tagger on the ambiguous tagset. Moreover, when a log-linear model is built using features from both phrase-and word-based techniques, the tagging accuracy improved on both of our data sets yielding the highest reported performance to date on the more ambiguous tagset.
  • Keywords
    natural languages; text analysis; UPENN treebank; contiguous word sequence; log-linear model; phrase-based part-of-speech tagging; Context modeling; Degradation; Entropy; History; Labeling; Parameter estimation; Predictive models; Radio access networks; System performance; Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-1611-0
  • Electronic_ISBN
    978-1-4244-1611-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2007.4368036
  • Filename
    4368036