Title :
Synther - a new m-gram POS tagger
Author :
Sündermann, David ; Ney, Hermanil
Author_Institution :
Comput. Sci. Dept., Univ. of Technol., Aachen, Germany
Abstract :
The part-of-speech (POS) tagger synther based on m-gram statistics is described. After explaining its basic architecture, three smoothing approaches and the strategy for handling unknown words is exposed. Subsequently, synther´s performance is evaluated in comparison with four state-of-the-art POS taggers. All of them are trained and tested on three corpora of different languages and domains. In the course of this evaluation, synther resulted in the lowest error rates or at least below average error rates. Finally, it is shown that the linear interpolation smoothing strategy with coverage-dependent weights features better properties than the two other approaches.
Keywords :
interpolation; natural languages; speech synthesis; statistical analysis; coverage-dependent weights; linear interpolation smoothing strategy; m-gram statistics; synther m-gram part-of-speech tagger; Computer science; Error analysis; Frequency estimation; History; Interpolation; Smoothing methods; Statistics; Tagging; Testing; Training data;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
DOI :
10.1109/NLPKE.2003.1275981