DocumentCode
60759
Title
Learning Lexicons From Speech Using a Pronunciation Mixture Model
Author
McGraw, Ian ; Badr, Ibrahim ; Glass, James R.
Author_Institution
Electr. Eng. & Comput. Sci. Dept., Massachusetts Inst. of Technol., Cambridge, MA, USA
Volume
21
Issue
2
fYear
2013
fDate
Feb. 2013
Firstpage
357
Lastpage
366
Abstract
In many ways, the lexicon remains the Achilles heel of modern automatic speech recognizers. Unlike stochastic acoustic and language models that learn the values of their parameters from training data, the baseform pronunciations of words in a recognizer´s lexicon are typically specified manually, and do not change, unless they are edited by an expert. Our work presents a novel generative framework that uses speech data to learn stochastic lexicons, thereby taking a step towards alleviating the need for manual intervention and automatically learning high-quality pronunciations for words. We test our model on continuous speech in a weather information domain. In our experiments, we see significant improvements over a manually specified “expert-pronunciation” lexicon. We then analyze variations of the parameter settings used to achieve these gains.
Keywords
learning (artificial intelligence); speech recognition; stochastic processes; high-quality pronunciations; language models; manual intervention; manually specified expert-pronunciation lexicon; modern automatic speech recognizers; parameter settings; pronunciation mixture model; recognizer lexicon; speech data; stochastic acoustic; stochastic lexicon learning; training data; weather information domain; Acoustics; Mathematical model; Speech; Speech processing; Speech recognition; Stochastic processes; Training; Baseform generation; dictionary training with acoustics via EM; pronunciation learning; stochastic lexicon;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2012.2226158
Filename
6338277
Link To Document