• DocumentCode
    2768960
  • Title

    Investigating linguistic knowledge in a maximum entropy token-based language model

  • Author

    Cui, Jia ; Su, Yi ; Hall, Keith ; Jelinek, Frederick

  • Author_Institution
    Johns Hopkins Univ., Baltimore
  • fYear
    2007
  • fDate
    9-13 Dec. 2007
  • Firstpage
    171
  • Lastpage
    176
  • Abstract
    We present a novel language model capable of incorporating various types of linguistic information as encoded in the form of a token, a (word, label)-tuple. Using tokens as hidden states, our model is effectively a hidden Markov model (HMM) producing sequences of words with trivial output distributions. The transition probabilities, however, are computed using a maximum entropy model to take advantage of potentially overlapping features. We investigated different types of labels with a wide range of linguistic implications. These models outperform Kneser-Ney smoothed n-gram models both in terms of perplexity on standard datasets and in terms of word error rate for a large vocabulary speech recognition system.
  • Keywords
    hidden Markov models; linguistics; maximum entropy methods; speech recognition; Kneser-Ney smoothed n-gram models; hidden Markov model; linguistic knowledge; maximum entropy token-based language model; speech recognition system; token encoding; Context modeling; Entropy; Error analysis; Hidden Markov models; Natural languages; Predictive models; Speech processing; Speech recognition; Testing; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
  • Conference_Location
    Kyoto
  • Print_ISBN
    978-1-4244-1746-9
  • Electronic_ISBN
    978-1-4244-1746-9
  • Type

    conf

  • DOI
    10.1109/ASRU.2007.4430104
  • Filename
    4430104