DocumentCode
3381484
Title
A comparison of part of speech taggers in the task of changing to a new domain
Author
Boggess, Lois ; Hamaker, Janna S. ; Duncan, Richard ; Klimek, Lee ; Wu, Yufeng ; Zeng, Yu
Author_Institution
Dept. of Comput. Sci., Mississippi State Univ., MS, USA
fYear
1999
fDate
1999
Firstpage
574
Lastpage
578
Abstract
Part-of-speech tagging in real-world applications is performed on text in domains which are different from the publicly available large training data sets. The two most successful part-of-speech taggers are trained on the Wall Street Journal corpus, a corpus of millions of words. We compare their performance on a test set from a different domain-astronomy-from documents that are available on the World Wide Web. The Maximum Entropy Part of Speech Tagger (MXPOST) and the Transformation-Based Learning Tagger are well-known and widely used in language research and development systems. The two taggers were tested in several modes: (1) after training on the Wall Street Journal corpus only, (2) after training on only a small body of text from our astronomy domain, (3) with and without an auxiliary lexicon derived from many astronomy-related Web documents, and (4) after incremental training-that is, having been trained on the Wall Street Journal, with additional training from the specific domain. One conclusion from the experiment is that different taggers exhibit different biases when trained on the same data
Keywords
astronomy computing; grammars; information resources; learning (artificial intelligence); maximum entropy methods; natural languages; text analysis; MXPOST; Maximum Entropy Part of Speech Tagger; Transformation-Based Learning Tagger; Wall Street Journal corpus; World Wide Web; astronomy-related Web documents; auxiliary lexicon; bias; incremental training; language R&D systems; performance; text domains; Data mining; Electrical capacitance tomography; Entropy; Laboratories; Natural language processing; Natural languages; Research and development; Speech; Tagging; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Intelligence and Systems, 1999. Proceedings. 1999 International Conference on
Conference_Location
Bethesda, MD
Print_ISBN
0-7695-0446-9
Type
conf
DOI
10.1109/ICIIS.1999.810350
Filename
810350
Link To Document