• DocumentCode
    312128
  • Title

    Estimating Markov model structures

  • Author

    Brants, Thorsten

  • Author_Institution
    Comput. Linguistics, Saarlandes Univ., Saarbrucken, Germany
  • Volume
    2
  • fYear
    1996
  • fDate
    3-6 Oct 1996
  • Firstpage
    893
  • Abstract
    The author investigates the derivation of Markov model structures from text corpora. The structure of a Markov model is its number of states plus the set of outputs and transitions with non-zero probability. The domain of the investigated models is part-of-speech tagging. The investigations concern two methods to derive Markov models and their structures. Both are able to form categories and allow words to belong to more than one of them. The first method is model merging, which starts with a large and corpus-specific model and successively merges states to generate smaller and more general models. The second method is model splitting, which is the inverse procedure and starts with a small and general model. States are successively split to generate larger and more specific models. In an experiment, the author shows that the combination of these techniques yields tagging accuracies that are at least equivalent to those of standard approaches
  • Keywords
    Markov processes; merging; probability; speech recognition; Markov model structure estimation; categories; corpus-specific model; model merging; model splitting; nonzero probability outputs; nonzero probability transitions; part-of-speech tagging; text corpora; words; Clustering algorithms; Computational linguistics; Merging; Natural languages; Parameter estimation; Speech recognition; Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    0-7803-3555-4
  • Type

    conf

  • DOI
    10.1109/ICSLP.1996.607745
  • Filename
    607745