مرکز منطقه ای اطلاع رساني علوم و فناوري - Temporal contrast normalization and edge-preserved smoothing on temporal modulation structure for robust speech recognition

DocumentCode :

3530217

Title :

Temporal contrast normalization and edge-preserved smoothing on temporal modulation structure for robust speech recognition

Author :

Lu, X. ; Matsuda, S. ; Unoki, M. ; Shimizu, T. ; Nakamura, S.

Author_Institution :

ATR-SLC

fYear :

2009

fDate :

19-24 April 2009

Firstpage :

4573

Lastpage :

4576

Abstract :

In this paper, we propose a two-step processing algorithm which adaptively normalizes the temporal modulation of speech to extract robust speech feature for automatic speech recognition systems. The first step processing is to normalize the temporal modulation contrast (TMC) of the cepstral time series for both clean and noisy speech. The second step processing is to smooth the normalized temporal modulation structure to reduce the artifacts due to noise while preserving the speech modulation events (edges). We tested our algorithm on speech recognition experiments in additive noise condition (AURORA-2J data corpus), reverberant noise condition (convolution of clean speech utterances from AURORA-2J with a smart room impulse response), and noisy condition with both reverberant and additive noise (air conditioner noise in a smart room). For comparison, the ETSI advanced front-end (AFE) algorithm was used. Our results showed that the algorithm provided: (1) for additive noise condition, 57.26% relative word error reduction (RWER) rate for clean conditional training (59.37% for AFE), and 33.52% RWER rate for multi-conditional training (35.77% for AFE), (2) for reverberant condition, 51.28% RWER rate (10.17% for AFE) and (3) for noisy condition with both reverberant and additive noise, 71.74% RWER rate (48.86% for AFE).

Keywords :

cepstral analysis; modulation; smoothing methods; speech recognition; time series; transient response; AURORA-2J data corpus; additive noise condition; advanced front-end algorithm; automatic speech recognition systems; cepstral time series; clean conditional training; clean speech utterances; edge-preserved smoothing; impulse response; multi-conditional training; noisy speech; normalized temporal modulation structure; relative word error reduction rate; reverberant condition; reverberant noise condition; robust speech feature; robust speech recognition; speech modulation events; speech recognition experiments; temporal contrast normalization; temporal modulation contrast; Additive noise; Automatic speech recognition; Cepstral analysis; Feature extraction; Noise reduction; Noise robustness; Smoothing methods; Speech enhancement; Speech processing; Speech recognition; cepstral mean and variance normalization; modulation transfer function; robust speech recognition; temporal modulation contrast normalization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on

Conference_Location :

Taipei

ISSN :

1520-6149

Print_ISBN :

978-1-4244-2353-8

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2009.4960648

Filename :

4960648

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3530217