Title :
Robust speech recognition training via duration and spectral-based stress token generation
Author :
Hansen, John H L ; Bou-Ghazale, Sahar E.
Author_Institution :
Dept. of Electr. Eng., Duke Univ., Durham, NC, USA
fDate :
9/1/1995 12:00:00 AM
Abstract :
It is known that speech recognition performance degrades if systems are not trained and tested under similar speaking conditions. This is particularly true if a speaker is exposed to demanding workload stress or noise. For recognition systems to be successful in applications susceptible to stress, speech recognizers should address the adverse conditions experienced by the user. The authors consider the problem of improved recognition training for speech recognition for various stressed speaking conditions (e.g., slow, loud, and Lombard effect speaking styles). The main objective is to devise a training procedure that produces a hidden Markov model recognizer that better characterizes a given stressed speaking style, without the need for directly collecting such stressed data. The novel approach is to construct a word production model using a previously suggested source generator framework [Hansen 1994], by employing knowledge of the statistical nature of duration and spectral variation of speech under stress. This model is used in turn to produce simulated stressed speech training tokens from neutral speech tokens. The token generation training method is shown to improve isolated word recognition by 24% for Lombard speech when compared to a neutral trained isolated word recognizer. Further results are reported for isolated and keyword recognition scenarios
Keywords :
hidden Markov models; spectral analysis; speech recognition; Lombard speech; adverse conditions; duration; hidden Markov model recognizer; performance; recognition training; robust speech recognition training; speaking conditions; speaking style; spectral-based stress token generation; stressed speaking condition; word production model; Automatic speech recognition; Data mining; Degradation; Hidden Markov models; Humans; Robustness; Speech enhancement; Speech processing; Speech recognition; Stress;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on