Title :
Robust speech recognition using warped DFT-based cepstral features in clean and multistyle training
Author :
Alam, Mohammad Jahangir ; Kenny, P. ; Dumouchel, P. ; O´Shaughnessy, D.
Author_Institution :
CRIM, Montreal, QC, Canada
Abstract :
This paper investigates the robustness of the warped discrete Fourier transform (WDFT)-based cepstral features for continuous speech recognition under clean and multistyle training conditions. In the MFCC and PLP front-ends, in order to approximate the nonlinear characteristics of the human auditory system in frequency, the speech spectrum is warped using the Mel-scale filterbank, which typically consists of overlapping triangular filters. It is well known that such nonlinear frequency transformation-based features provide better speech recognition accuracy than linear frequency scale features. It has been found that warping the DFT spectrum directly, rather than using filterbank averaging, provides a more precise approximation to the perceptual scales. WDFT provides non-uniform resolution filter-banks whereas DFT provides uniform resolution filter-banks. Here, we provide a performance evaluation of the following variants of the warped cepstral features: WDFT, and WDFT-linear prediction-based MFCC features. Experiments were carried out on the AURORA-4 task. Experimental results demonstrate that the WDFT-based cepstral features outperform the conventional MFCC and PLP both in clean and multistyle training conditions in terms of recognition error rates.
Keywords :
channel bank filters; discrete Fourier transforms; speech recognition; AURORA-4 task; MFCC front end; Mel-scale filter bank; PLP front end; clean training; human auditory system nonlinear characteristics; multistyle training; perceptual scale; robust speech recognition; warped DFT based cepstral features; warped discrete Fourier transform; Discrete Fourier transforms; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech recognition; Training; Warped DFT; linear prediction; multi-style training; spectrum enhancement; speech recognition;
Conference_Titel :
Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European
Conference_Location :
Lisbon