DocumentCode :
2793443
Title :
Discriminatively estimated joint acoustic, duration, and language model for speech recognition
Author :
Lehr, Maider ; Shafran, Izhak
Author_Institution :
Center for Spoken Language Understanding (CSLU), Oregon Health & Sci. Univ., Portland, OR, USA
fYear :
2010
fDate :
14-19 March 2010
Firstpage :
5542
Lastpage :
5545
Abstract :
We introduce a discriminative model for speech recognition that integrates acoustic, duration and language components. In the framework of finite state machines, a general model for speech recognition G is a finite state transduction from acoustic state sequences to word sequences (e.g., search graph in many speech recognizers). The lattices from a baseline recognizer can be viewed as an a posteriori version of G after having observed an utterance. So far, discriminative language models have been proposed to correct the output side of G and is applied on the lattices. The acoustic state sequences on the input side of these lattice can also be exploited to improve the choice of the best hypotheses through the lattice. Taking this view, the model proposed in this paper jointly estimates the parameters for acoustic and language components in a discriminative setting. The resulting model can be factored as corrections for the input and the output sides of the general model G. This formulation allows us to incorporate duration cues seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.6% absolute. Through a series of experiments we analyze the contributions from and interactions between acoustic, duration and language components to find that duration cues play an important role in Arabic task.
Keywords :
linguistics; speech recognition; acoustic modeling; acoustic state sequences; discriminative language model; duration cues; duration modeling; finite state transduction; language modeling; large vocabulary Arabic GALE task; speech recognition; Automata; Decoding; Error analysis; Lattices; Natural languages; Parameter estimation; Performance gain; Speech recognition; Vectors; Vocabulary; acoustic modeling; discriminative modeling; duration modeling; language modeling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
ISSN :
1520-6149
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2010.5495227
Filename :
5495227
Link To Document :
بازگشت