DocumentCode :
2019326
Title :
The estimation of powerful language models from small and large corpora
Author :
Placeway, Paul ; Schwartz, Richard ; Fung, Pascale ; Nguyen, Long
Author_Institution :
Bolt Beranek & Newman Inc., Cambridge, MA, USA
Volume :
2
fYear :
1993
fDate :
27-30 April 1993
Firstpage :
33
Abstract :
The authors consider the estimation of powerful statistical language models using a technique that scales from very small to very large amounts of domain-dependent data. They begin with improved modeling of the grammar statistics, based on a combination of the backing-off technique and zero-frequency techniques. These are extended to be more amenable to the particular system considered here. The resulting technique is greatly simplified, more robust, and gives improved recognition performance over either of the previous techniques. The authors also consider the problem of robustness of a model based on a small training corpus by grouping words into obvious semantic classes. This significantly improves the robustness of the resulting statistical grammar. A technique that allows the estimation of a high-order model on modest computation resources is also presented. This makes it possible to run a 4-gram statistical model of a 50-million word corpus on a workstation of only modest capability and cost. Finally, the authors discuss results from applying a 2-gram statistical language model integrated in the HMM (hidden Markov model) search, obtaining a list of the N-best recognition results, and rescoring this list with a higher-order statistical model.<>
Keywords :
computational linguistics; grammars; hidden Markov models; search problems; speech recognition; backing-off technique; domain-dependent data; hidden Markov model; recognition performance; rescoring; robustness; semantic classes; statistical grammar; statistical language models; zero-frequency techniques;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on
Conference_Location :
Minneapolis, MN, USA
ISSN :
1520-6149
Print_ISBN :
0-7803-7402-9
Type :
conf
DOI :
10.1109/ICASSP.1993.319222
Filename :
319222
Link To Document :
بازگشت