Title :
Fix it where it fails: Pronunciation learning by mining error corrections from speech logs
Author :
Zhenzhen Kou ; Stanton, Daisy ; Fuchun Peng ; Beaufays, Francoise ; Strohman, Trevor
Author_Institution :
Google Inc., Mountain View, CA, USA
Abstract :
The pronunciation dictionary, or lexicon, is an essential component in an automatic speech recognition (ASR) system in that incorrect pronunciations cause systematic misrecognitions. It typically consists of a list of word-pronunciation pairs written by linguists, and a grapheme-to-phoneme (G2P) engine to generate pronunciations for words not in the list. The hand-generated list can never keep pace with the growing vocabulary of a live speech recognition system, and the G2P is usually of limited accuracy. This is especially true for proper names whose pronunciations may be influenced by various historical or foreign-origin factors. In this paper, we propose a language-independent approach to detect misrecognitions and their corrections from voice search logs. We learn previously unknown pronunciations from this data, and demonstrate that they significantly improve the quality of a production-quality speech recognition system.
Keywords :
linguistics; speech recognition; ASR system; G2P engine; automatic speech recognition; foreign-origin factors; grapheme-phoneme engine; hand-generated list; language-independent approach; lexicon; limited accuracy; linguists; live speech recognition system; mining error correction; production-quality speech recognition system; pronunciation dictionary; pronunciation learning; speech logs; systematic misrecognitions; voice search logs; word-pronunciation pairs; Acoustics; Data mining; Engines; Keyboards; Motion pictures; Speech; Speech recognition; data extraction; logistic regression; pronunciation learning; speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178846