DocumentCode
1323458
Title
Part-of-Speech Tagging by Latent Analogy
Author
Bellegarda, Jerome R.
Author_Institution
Apple Inc, Speech & Language Technol., Cupertino, CA, USA
Volume
4
Issue
6
fYear
2010
Firstpage
985
Lastpage
993
Abstract
Part-of-speech tagging is often a critical first step in various speech and language processing tasks. High-accuracy taggers (e.g., based on conditional random fields) rely on well chosen feature functions to ensure that important characteristics of the empirical training distribution are reflected in the trained model. This makes them vulnerable to any discrepancy between training and tagging corpora, and, in particular, accuracy is adversely affected by the presence of out-of-vocabulary words. This paper explores an alternative tagging strategy based on the principle of latent analogy, which was originally introduced in the context of a speech synthesis application. In this approach, locally optimal tag subsequences emerge automatically from an appropriate representation of global sentence-level information. This solution eliminates the need for feature engineering, while exploiting a broader context more conducive to word sense disambiguation. Empirical evidence suggests that, in practice, tagging by latent analogy is essentially competitive with conventional Markovian techniques, while benefiting from substantially less onerous training costs. This opens up the possibility that integration with such techniques may lead to further improvements in tagging accuracy.
Keywords
Markov processes; natural language processing; speech synthesis; Markovian techniques; empirical training distribution; global sentence-level information representation; language processing; latent analogy; part-of-speech tagging; speech processing; speech synthesis; word sense disambiguation; Hidden Markov models; Natural language processing; Semantics; Speech recognition; Statistical learning; Tagging; Training; Latent semantic mapping (LSM); natural language processing (NLP); part-of-speech (POS) disambiguation; sequence labeling; statistical modeling;
fLanguage
English
Journal_Title
Selected Topics in Signal Processing, IEEE Journal of
Publisher
ieee
ISSN
1932-4553
Type
jour
DOI
10.1109/JSTSP.2010.2075970
Filename
5570877
Link To Document