DocumentCode
672382
Title
Acoustic data-driven pronunciation lexicon for large vocabulary speech recognition
Author
Liang Lu ; Ghoshal, Arnab ; Renals, Steve
Author_Institution
Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK
fYear
2013
fDate
8-12 Dec. 2013
Firstpage
374
Lastpage
379
Abstract
Speech recognition systems normally use handcrafted pronunciation lexicons designed by linguistic experts. Building and maintaining such a lexicon is expensive and time consuming. This paper concerns automatically learning a pronunciation lexicon for speech recognition. We assume the availability of a small seed lexicon and then learn the pronunciations of new words directly from speech that is transcribed at word-level. We present two implementations for refining the putative pronunciations of new words based on acoustic evidence. The first one is an expectation maximization (EM) algorithm based on weighted finite state transducers (WFSTs) and the other is its Viterbi approximation. We carried out experiments on the Switchboard corpus of conversational telephone speech. The expert lexicon has a size of more than 30,000 words, from which we randomly selected 5,000 words to form the seed lexicon. By using the proposed lexicon learning method, we have significantly improved the accuracy compared with a lexicon learned using a grapheme-to-phoneme transformation, and have obtained a word error rate that approaches that achieved using a fully handcrafted lexicon.
Keywords
acoustic transducers; approximation theory; expectation-maximisation algorithm; learning (artificial intelligence); linguistics; reliability; speech recognition; vocabulary; EM algorithm; Viterbi approximation; WFST; acoustic data-driven pronunciation lexicon; conversational telephone speech; expectation maximization algorithm; grapheme-to-phoneme transformation; handcrafted pronunciation lexicon; large vocabulary speech recognition system; lexicon learning method; linguistic expert; small seed lexicon availability; switchboard corpus; weighted finite state transducer; Acoustics; Data models; Speech recognition; Training; Training data; Transducers; Viterbi algorithm; Automatic speech recognition; Lexical modelling; Probabilistic pronunciation model;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location
Olomouc
Type
conf
DOI
10.1109/ASRU.2013.6707759
Filename
6707759
Link To Document