Title :
Noise robust speech recognition with state duration constraints
Author_Institution :
Speech & Audio Syst. Lab., Nokia Res. Center, Tampere, Finland
Abstract :
In this paper, we present a method to incorporate and re-estimate state duration constraints within the maximum likelihood training of hidden Markov models. In the recognition phase we find the optimal state sequence fulfilling the state duration constraints obtained in the training phase. Our target is to get speaker-dependent training and recognition to perform well with a very small amount of training data in the case of mismatch between the training and testing environments. We take advantage of the fact that speakers tend to preserve their speaking style in similar situations (e.g. when speaking to a machine) and our main means to reach the target is to force similar state segmentations in the training and recognition phases. We show that with the proposed method we can substantially improve the robustness of a speech recognizer and decrease the error rates by over 93% when compared with a standard approach
Keywords :
acoustic noise; error statistics; estimation theory; hidden Markov models; maximum likelihood estimation; sequences; speech recognition; state estimation; error rates; hidden Markov models; maximum likelihood training; noise robust speech recognition; optimal state sequence; recognition phase; speaker-dependent recognition; speaker-dependent training; speaking style; speech recognizer; state duration constraints; state segmentations; training phase; Audio systems; Error analysis; Hidden Markov models; Maximum likelihood estimation; Noise robustness; Parameter estimation; Speech recognition; State estimation; Target recognition; Viterbi algorithm;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location :
Munich
Print_ISBN :
0-8186-7919-0
DOI :
10.1109/ICASSP.1997.596074