Title :
Good-Turing estimation from word lattices for unsupervised language model adaptation
Author :
Riley, Michael ; Roark, Brian ; Sproat, Richard
Author_Institution :
AT&T Labs.-Res., USA
fDate :
30 Nov.-3 Dec. 2003
Abstract :
We present a comparison of using the weighted word lattice output of a recognizer versus its one-best transcription for unsupervised language model adaptation. We begin with a general analysis of how to smooth word probabilities when the sample is hidden, as is the case with recognizer lattices. For each smoothing technique for the known sample case, we show there is a natural generalization to the hidden case. In particular, we use this generalization with the well-known Good-Turing estimate on word lattices, and show results using Monte Carlo methods for building Katz backoff models. In our realistic adaptation task, with mismatched acoustic and language models, we find that Katz backoff models trained on word lattice samples provide a small, consistent benefit over those trained on one-best output, most notably when there is a limited amount of adaptation data (less than 100 hours). Thus, while the recognizer one-best transcription can provide an effective approximation for the purpose of language model adaptation under certain circumstances, the word lattice provides information that can be exploited for more robust language modeling.
Keywords :
Monte Carlo methods; learning (artificial intelligence); natural languages; parameter estimation; probability; smoothing methods; speech recognition; Good-Turing estimation; Katz backoff models; Monte Carlo methods; acoustic models; automatic speech recognition; language models; one-best transcription; smoothing technique; unsupervised language model adaptation; weighted word lattice output; word probabilities; Acoustic applications; Adaptation model; Automatic speech recognition; Frequency estimation; Lattices; Natural languages; Probability distribution; Robustness; Sampling methods; Smoothing methods;
Conference_Titel :
Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
Print_ISBN :
0-7803-7980-2
DOI :
10.1109/ASRU.2003.1318483