Title :
Maximum-likelihood training of the PLCG-based language model
Author :
Van Uytsel, Dong Hoon ; Van Compernolle, Dirk ; Wambacq, Patrick
Author_Institution :
ESAT/PSI, Katholieke Univ., Leuven, Belgium
Abstract :
In Van Uytsel et al. (2001) a parsing language model based on a probabilistic left-comer grammar (PLCG) was proposed and encouraging performance on a speech recognition task using the PLCG-based language model was reported. In this paper we show how the PLCG-based language model can be further optimized by iterative parameter reestimation on unannotated training data. The precalculation of forward, inner and outer probabilities of states in the PLCG network provides an elegant crosscut to the computation of transition frequency expectations, which are needed in each iteration of the proposed reestimation procedure. The training algorithm enables model training on very large corpora. In our experiments, test set perplexity is close to saturation after three iterations, 5 to 16% lower than initially. We however observed no significant improvement of recognition accuracy after reestimation.
Keywords :
frequency estimation; grammars; iterative methods; maximum likelihood estimation; optimisation; probability; speech recognition; state estimation; PLCG-based language model; iterative parameter reestimation; maximum-likelihood training; model training; optimization; parsing language model; performance; probabilistic left-comer grammar; speech recognition; state probabilities; test set perplexity; transition frequency expectations; unannotated training data; very large corpora; Computer networks; Iterative algorithms; Large-scale systems; Maximum likelihood estimation; Natural languages; Predictive models; Speech recognition; Stochastic processes; Testing; Training data;
Conference_Titel :
Automatic Speech Recognition and Understanding, 2001. ASRU '01. IEEE Workshop on
Print_ISBN :
0-7803-7343-X
DOI :
10.1109/ASRU.2001.1034624