DocumentCode
312128
Title
Estimating Markov model structures
Author
Brants, Thorsten
Author_Institution
Comput. Linguistics, Saarlandes Univ., Saarbrucken, Germany
Volume
2
fYear
1996
fDate
3-6 Oct 1996
Firstpage
893
Abstract
The author investigates the derivation of Markov model structures from text corpora. The structure of a Markov model is its number of states plus the set of outputs and transitions with non-zero probability. The domain of the investigated models is part-of-speech tagging. The investigations concern two methods to derive Markov models and their structures. Both are able to form categories and allow words to belong to more than one of them. The first method is model merging, which starts with a large and corpus-specific model and successively merges states to generate smaller and more general models. The second method is model splitting, which is the inverse procedure and starts with a small and general model. States are successively split to generate larger and more specific models. In an experiment, the author shows that the combination of these techniques yields tagging accuracies that are at least equivalent to those of standard approaches
Keywords
Markov processes; merging; probability; speech recognition; Markov model structure estimation; categories; corpus-specific model; model merging; model splitting; nonzero probability outputs; nonzero probability transitions; part-of-speech tagging; text corpora; words; Clustering algorithms; Computational linguistics; Merging; Natural languages; Parameter estimation; Speech recognition; Tagging;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location
Philadelphia, PA
Print_ISBN
0-7803-3555-4
Type
conf
DOI
10.1109/ICSLP.1996.607745
Filename
607745
Link To Document