Title :
Estimating Markov model structures
Author :
Brants, Thorsten
Author_Institution :
Comput. Linguistics, Saarlandes Univ., Saarbrucken, Germany
Abstract :
The author investigates the derivation of Markov model structures from text corpora. The structure of a Markov model is its number of states plus the set of outputs and transitions with non-zero probability. The domain of the investigated models is part-of-speech tagging. The investigations concern two methods to derive Markov models and their structures. Both are able to form categories and allow words to belong to more than one of them. The first method is model merging, which starts with a large and corpus-specific model and successively merges states to generate smaller and more general models. The second method is model splitting, which is the inverse procedure and starts with a small and general model. States are successively split to generate larger and more specific models. In an experiment, the author shows that the combination of these techniques yields tagging accuracies that are at least equivalent to those of standard approaches
Keywords :
Markov processes; merging; probability; speech recognition; Markov model structure estimation; categories; corpus-specific model; model merging; model splitting; nonzero probability outputs; nonzero probability transitions; part-of-speech tagging; text corpora; words; Clustering algorithms; Computational linguistics; Merging; Natural languages; Parameter estimation; Speech recognition; Tagging;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607745