Title :
Learning a Discriminative Weighted Finite-State Transducer for Speech Recognition
Author :
Lehr, Maider ; Shafran, Izhak
Author_Institution :
Center for Spoken Language Understanding, Oregon Health & Sci. Univ., Portland, OR, USA
fDate :
7/1/2011 12:00:00 AM
Abstract :
Weighted finite-state transducers (WFSTs) have been widely adopted as efficient representations of a general speech recognition model. The WFST for speech recognizer is typically assembled or composed from the several components-the language model, the pronunciation mapping and the acoustic model-which are estimated separately without any end-to-end optimization. This paper examines how the weights of such transducers can be learned in a manner that captures the interaction between the components. The paths in the transducer are represented as n -grams defined over the input and output sequences whose linear weights are learned using a discriminative criterion. The resulting linear model factors into two weighted finite-state acceptors (WFSAs) which can be applied as corrections to the input and the output side of the initial WFST. This formulation allows duration cues to be incorporated seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.5%-1.7% absolute. Through a series of experiments, we analyze the contributions from and interactions between acoustic, duration, and language components to find that duration cues play an important role in a large-vocabulary Arabic speech recognition task. Although this paper focuses on speech recognition, the proposed framework for learning the weights of a finite transducer is more general in nature and can be applied to other tasks such as utterance classification.
Keywords :
speech recognition; transducers; acoustic model; discriminative weighted finite-state transducer; end-to-end optimization; language model; learning; linear weights; pronunciation mapping; speech recognition; weighted finite-state acceptors; word error rate; Acoustics; Computational modeling; Data models; Hidden Markov models; Speech; Speech recognition; Transducers; Acoustic modeling; discriminative learning; duration modeling; finite-state transducers; language modeling; learning finite-state transducers;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2010.2090518