• DocumentCode
    1103789
  • Title

    Estimation of probabilities in the language model of the IBM speech recognition system

  • Author

    NÁdas, Arthur

  • Author_Institution
    IBM T.J. Watson Research Center, Yorktown Heights, NY
  • Volume
    32
  • Issue
    4
  • fYear
    1984
  • fDate
    8/1/1984 12:00:00 AM
  • Firstpage
    859
  • Lastpage
    861
  • Abstract
    The language model probabilities are estimated by an empirical Bayes approach in which a prior distribution for the unknown probabilities is itself estimated through a novel choice of data. The predictive power of the model thus fitted is compared by means of its experimental perplexity [1] to the model as fitted by the Jelinek-Mercer deleted estimator and as fitted by the Turing-Good formulas for probabilities of unseen or rarely seen events.
  • Keywords
    Bayesian methods; Cities and towns; Helium; Natural languages; Power system modeling; Predictive models; Probability; Smoothing methods; Speech recognition; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Acoustics, Speech and Signal Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0096-3518
  • Type

    jour

  • DOI
    10.1109/TASSP.1984.1164378
  • Filename
    1164378