• DocumentCode
    294549
  • Title

    Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams

  • Author

    Deligne, Sabine ; Bimbot, Frédéric

  • Author_Institution
    Telecom Paris, France
  • Volume
    1
  • fYear
    1995
  • fDate
    9-12 May 1995
  • Firstpage
    169
  • Abstract
    The multigram model assumes that language can be described as the output of a memoryless source that emits variable-length sequences of words. The estimation of the model parameters can be formulated as a maximum likelihood estimation problem from incomplete data. We show that estimates of the model parameters can be computed through an iterative expectation-maximization algorithm and we describe a forward-backward procedure for its implementation. We report the results of a systematical evaluation of multigrams for language modeling on the ATIS database. The objective performance measure is the test set perplexity. Our results show that multigrams outperform conventional n-grams for this task
  • Keywords
    estimation theory; grammars; iterative methods; maximum likelihood estimation; natural languages; speech processing; ATIS database; forward-backward procedure; incomplete data; iterative expectation-maximization algorithm; language modeling; maximum likelihood estimation; memoryless source; multigram model; objective performance measure; parameter estimation; test set perplexity; variable length sequences; Bismuth; Databases; Dictionaries; Electronic mail; Expectation-maximization algorithms; Maximum likelihood estimation; Parameter estimation; Probability; Telecommunications; Testing; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on
  • Conference_Location
    Detroit, MI
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-2431-5
  • Type

    conf

  • DOI
    10.1109/ICASSP.1995.479391
  • Filename
    479391