Title :
A study on cepstral sub-band normalization for robust ASR
Author :
Syu-Siang Wang ; Jeih-weih Hung ; Yu Tsao
Author_Institution :
Res. Center for Inf. Technol. Innovation, Taipei, Taiwan
Abstract :
In this paper, we propose a cepstral subband normalization (CSN) approach for robust speech recognition. The CSN approach first applies the discrete wavelet transform (DWT) to decompose the original cepstral feature sequence into low and high frequency band (LFB and HFB) parts. Then, CSN normalizes the LFB components and zeros out the HFB components. Finally, an inverse DWT is applied on LFB and HFB components to form the normalized cepstral features. When using the Haar functions as the DWT bases, the calculation of CSN can be processed efficiently with a 50% reduction on the amount of feature components. In addition, our experimental results on the Aurora-2 task show that CSN outperforms the conventional cepstral mean subtraction (CMS), cepstral mean and variance normalization (CMVN), and histogram equalization (HEQ). We also integrate CSN with advanced frontend (AFE) for feature extraction. Experimental results indicate that the integrated AFE+CSN achieves notable improvements over the original AFE. The simple calculation, compact in form, and effective noise robustness properties enable CSN to perform suitably for mobile applications.
Keywords :
Haar transforms; cepstral analysis; discrete wavelet transforms; feature extraction; speech recognition; AFE; Aurora-2 task; CMS; CMVN; CSN approach; DWT; HEQ; HFB components; Haar functions; LFB components; advanced front-end; automatic speech recognition; cepstral feature sequence; cepstral mean and variance normalization; cepstral mean subtraction; cepstral subband normalization; discrete wavelet transform; feature extraction; high frequency band; histogram equalization; low frequency band; mobile applications; noise robustness properties; robust ASR; robust speech recognition; Cepstral analysis; Discrete wavelet transforms; Noise; Robustness; Speech; Speech recognition; Training; CMS; CMVN; RASTA; discrete wavelet transform; noise robust; speech recognition;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on
Conference_Location :
Kowloon
Print_ISBN :
978-1-4673-2506-6
Electronic_ISBN :
978-1-4673-2505-9
DOI :
10.1109/ISCSLP.2012.6423484