DocumentCode :
3530217
Title :
Temporal contrast normalization and edge-preserved smoothing on temporal modulation structure for robust speech recognition
Author :
Lu, X. ; Matsuda, S. ; Unoki, M. ; Shimizu, T. ; Nakamura, S.
Author_Institution :
ATR-SLC
fYear :
2009
fDate :
19-24 April 2009
Firstpage :
4573
Lastpage :
4576
Abstract :
In this paper, we propose a two-step processing algorithm which adaptively normalizes the temporal modulation of speech to extract robust speech feature for automatic speech recognition systems. The first step processing is to normalize the temporal modulation contrast (TMC) of the cepstral time series for both clean and noisy speech. The second step processing is to smooth the normalized temporal modulation structure to reduce the artifacts due to noise while preserving the speech modulation events (edges). We tested our algorithm on speech recognition experiments in additive noise condition (AURORA-2J data corpus), reverberant noise condition (convolution of clean speech utterances from AURORA-2J with a smart room impulse response), and noisy condition with both reverberant and additive noise (air conditioner noise in a smart room). For comparison, the ETSI advanced front-end (AFE) algorithm was used. Our results showed that the algorithm provided: (1) for additive noise condition, 57.26% relative word error reduction (RWER) rate for clean conditional training (59.37% for AFE), and 33.52% RWER rate for multi-conditional training (35.77% for AFE), (2) for reverberant condition, 51.28% RWER rate (10.17% for AFE) and (3) for noisy condition with both reverberant and additive noise, 71.74% RWER rate (48.86% for AFE).
Keywords :
cepstral analysis; modulation; smoothing methods; speech recognition; time series; transient response; AURORA-2J data corpus; additive noise condition; advanced front-end algorithm; automatic speech recognition systems; cepstral time series; clean conditional training; clean speech utterances; edge-preserved smoothing; impulse response; multi-conditional training; noisy speech; normalized temporal modulation structure; relative word error reduction rate; reverberant condition; reverberant noise condition; robust speech feature; robust speech recognition; speech modulation events; speech recognition experiments; temporal contrast normalization; temporal modulation contrast; Additive noise; Automatic speech recognition; Cepstral analysis; Feature extraction; Noise reduction; Noise robustness; Smoothing methods; Speech enhancement; Speech processing; Speech recognition; cepstral mean and variance normalization; modulation transfer function; robust speech recognition; temporal modulation contrast normalization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
ISSN :
1520-6149
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2009.4960648
Filename :
4960648
Link To Document :
بازگشت