• DocumentCode
    792045
  • Title

    Variable-length sequence modeling: multigrams

  • Author

    Bimbot, Frodkric ; Pieraccini, Roberto ; Levin, Esther ; Atal, Bishnu

  • Author_Institution
    Dept. Signal, ENST, Paris, France
  • Volume
    2
  • Issue
    6
  • fYear
    1995
  • fDate
    6/1/1995 12:00:00 AM
  • Firstpage
    111
  • Lastpage
    113
  • Abstract
    The conventional n-gram language model exploits dependencies between words and their fixed-length past. This letter presents a model that represents sentences as a concatenation of variable-length sequences of units and describes an algorithm for unsupervised estimation of the model parameters. The approach is illustrated for the segmentation of sequences of letters into subword-like units. It is evaluated as a language model on a corpus of transcribed spoken sentences. Multigrams can provide a significantly lower test set perplexity than n-gram models.<>
  • Keywords
    estimation theory; natural languages; speech recognition; algorithm; concatenation; conventional n-gram language model; fixed-length past; language model; model parameters; multigrams; sentences; subword-like units; transcribed spoken sentences; unsupervised estimation; variable-length sequence modeling; words; Acoustic testing; Context modeling; Encoding; Finishing; History; Mathematical model; Natural languages; Parameter estimation; Signal processing algorithms; Speech;
  • fLanguage
    English
  • Journal_Title
    Signal Processing Letters, IEEE
  • Publisher
    ieee
  • ISSN
    1070-9908
  • Type

    jour

  • DOI
    10.1109/97.388911
  • Filename
    388911