مرکز منطقه ای اطلاع رساني علوم و فناوري - Variable-length sequence modeling: multigrams

DocumentCode :

792045

Title :

Variable-length sequence modeling: multigrams

Author :

Bimbot, Frodkric ; Pieraccini, Roberto ; Levin, Esther ; Atal, Bishnu

Author_Institution :

Dept. Signal, ENST, Paris, France

Volume :

Issue :

fYear :

1995

fDate :

6/1/1995 12:00:00 AM

Firstpage :

111

Lastpage :

113

Abstract :

The conventional n-gram language model exploits dependencies between words and their fixed-length past. This letter presents a model that represents sentences as a concatenation of variable-length sequences of units and describes an algorithm for unsupervised estimation of the model parameters. The approach is illustrated for the segmentation of sequences of letters into subword-like units. It is evaluated as a language model on a corpus of transcribed spoken sentences. Multigrams can provide a significantly lower test set perplexity than n-gram models.<>

Keywords :

estimation theory; natural languages; speech recognition; algorithm; concatenation; conventional n-gram language model; fixed-length past; language model; model parameters; multigrams; sentences; subword-like units; transcribed spoken sentences; unsupervised estimation; variable-length sequence modeling; words; Acoustic testing; Context modeling; Encoding; Finishing; History; Mathematical model; Natural languages; Parameter estimation; Signal processing algorithms; Speech;

fLanguage :

English

Journal_Title :

Signal Processing Letters, IEEE

Publisher :

ieee

ISSN :

1070-9908

Type :

jour

DOI :

10.1109/97.388911

Filename :

388911

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=792045