DocumentCode :
1020920
Title :
An N-best candidates-based discriminative training for speech recognition applications
Author :
Chen, Jung-kuei ; Soong, Frank K.
Author_Institution :
Telecommun. Lab., Minist. of Commun., Chung-Li, Taiwan
Volume :
2
Issue :
1
fYear :
1994
Firstpage :
206
Lastpage :
216
Abstract :
The authors propose an N-best candidates-based discriminative training procedure for constructing high-performance HMM speech recognizers. The algorithm has two distinct features: N-best hypotheses are used for training discriminative models; and a new frame-level loss function is minimized to improve the separation between the correct and incorrect hypotheses. The N-best candidates are decoded based on their recently proposed tree-trellis fast search algorithm. The new frame-level loss function, which is defined as a halfwave rectified log-likelihood difference between the correct and competing hypotheses, is minimized over all training tokens. The minimization is carried out by adjusting the HMM parameters along a gradient descent direction. Two speech recognition applications have been tested, including a speaker independent, small vocabulary (ten Mandarin Chinese digits), continuous speech recognition, and a speaker-trained, large vocabulary (5000 commonly used Chinese words), isolated word recognition. Significant performance improvement over the traditional maximum likelihood trained HMMs has been obtained. In the connected Chinese digit recognition experiment, the string error rate is reduced from 17.0 to 10.8% for unknown length decoding and from 8.2 to 5.2% for known length decoding. In the large vocabulary, isolated word recognition experiment, the recognition error rate is reduced from 7.2 to 3.8%. Additionally, they have found that using more relaxed decoding constraints in preparing N-best hypotheses yields better recognition results.
Keywords :
decoding; hidden Markov models; maximum likelihood estimation; minimisation; speech recognition; HMM parameters; HMM speech recognizers; Mandarin Chinese digits; N-best candidates discriminative training; N-best hypotheses; algorithm; connected Chinese digit recognition; continuous speech recognition; discriminative models; frame-level loss function; gradient descent direction; halfwave rectified log-likelihood difference; isolated word recognition; large vocabulary; minimization; small vocabulary; speaker independent recognition; speaker-trained recognition; speech recognition applications; string error rate; training tokens; tree-trellis fast search algorithm; Decoding; Error analysis; Hidden Markov models; Iterative algorithms; Maximum likelihood estimation; Probability distribution; Speech recognition; Testing; Training data; Vocabulary;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/89.260363
Filename :
260363
Link To Document :
بازگشت