• DocumentCode
    2259584
  • Title

    Techniques for approximating a trigram language model

  • Author

    Brugnara, Fabio ; Federico, Marcello

  • Author_Institution
    Inst. per la Ricerca Sci. e Tecnol., Trento, Italy
  • Volume
    4
  • fYear
    1996
  • fDate
    3-6 Oct 1996
  • Firstpage
    2075
  • Abstract
    Several methods are proposed for reducing the size of a trigram language model (LM), which is often the biggest data structure in a continuous speech recognizer, without affecting its performance. The common factor shared by the different approaches is to select only a subset of the available trigrams, trying to identify those trigrams that mostly contribute to the performance of the full trigram LM. The proposed selection criteria apply to trigram contexts, both of length one or two. These criteria rely on information theory concepts, the back-off probabilities estimated by the LM, or on a measure of the phonetic/linguistic uncertainty relative to a given context. Performance of the reduced trigram LMs are compared both in terms of perplexity and recognition accuracy. Results show that all the considered methods perform better than the naive frequency shifting method. In fact, a 50% size reduction is obtained on a shift-1 trigram LM, at the cost of a 5% increase in word error rate. Moreover, the reduced LMs improve by around 15% the word error rate of a bigram LM of the same size
  • Keywords
    information theory; natural languages; probability; search problems; speech recognition; stochastic processes; word processing; back-off probabilities; bigram LM; continuous speech recognizer; data structure; information theory concepts; naive frequency shifting method; perplexity; phonetic/linguistic uncertainty; recognition accuracy; reduced trigram LMs; selection criteria; shift-1 trigram LM; trigram contexts; trigram language model approximation; word error rate; Counting circuits; Data structures; Dictionaries; Error analysis; Frequency; Information theory; Natural languages; Probability; Speech; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    0-7803-3555-4
  • Type

    conf

  • DOI
    10.1109/ICSLP.1996.607210
  • Filename
    607210