Title :
Leveraging threshold denoising on DCT-based modulation spectrum for noise robust speech recognition
Author :
Yen-chih Cheng ; Jun-Shan Lin ; Jeih-weih Hung
Author_Institution :
Dept. of Electr. Eng., Nat. Chi Nan Univ., Puli, Taiwan
Abstract :
This paper presents a novel noise robustness algorithm to enhance speech features in noisy speech recognition. In the presented algorithm, the temporal speech feature sequence is first converted to its spectrum via discrete cosine transform (DCT), and then the DCT spectrum is compensated by a thresholding function in order to further shrink the smaller portion. Finally, the updated DCT spectrum is converted back to the temporal domain to obtain the new feature sequence. One advantage of the presented method is that the overall compensation process is unsupervised in the sense that no information about noise embedded in speech signals is required. The evaluation via the Aurora-2 connected digit database and task revealed that the presented method can provide significant improvement in recognition accuracy to the speech features pre-processed by any of the statistics normalization algorithms, including cepstral mean and variance normalization (CMVN), MVN plus ARMA filtering (MVA) and cepstral gain normalization (CGN). We further showed that, using the presented method, simply compensating the low frequency portion gives similar performance on a par with that achieved by compensation over the entire frequency band.
Keywords :
autoregressive moving average processes; cepstral analysis; discrete cosine transforms; encoding; filtering theory; modulation spectra; signal denoising; speech recognition; statistical analysis; Aurora-2 connected digit database; CGN; CMVN; DCT-based modulation spectrum compensation; MVA; MVN-plus-ARMA filtering; cepstral gain normalization; cepstral mean-and-variance normalization; discrete cosine transform; frequency band; low-frequency portion compensation; noise robust speech recognition; speech feature enhancement; speech feature preprocessing; speech signals; statistics normalization algorithms; temporal domain; temporal speech feature sequence; threshold denoising leveraging; unsupervised compensation process; Discrete cosine transforms; Frequency modulation; Mel frequency cepstral coefficient; Noise; Speech; Speech recognition; discrete cosine transform; modulation spectrum; noise robustness; speech recognition;
Conference_Titel :
Control & Automation (ICCA), 11th IEEE International Conference on
Conference_Location :
Taichung
DOI :
10.1109/ICCA.2014.6871133