Title :
Robust noisy speech recognition with adaptive frequency bank selection
Author :
Tian, Ye ; Wu, Ji ; Wang, Zuoying ; Lu, Dajin
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
Abstract :
With the development of automatic speech recognition technology, the robustness problem of speech recognition systems is becoming more and more important. This paper addresses the problem of speech recognition in an additive background noise environment. Since the frequency energy of different types of noise focuses on different frequency banks, the effects of additive noise on each frequency bank are different. The seriously obscured frequency banks have little word signal information left, and are harmful for subsequence speech processing. Wu and Lin (2000) applied the frequency bank selection theory to robust word boundary detection in a noisy environment, and obtained good detection results. In this paper, this theory is extended to noisy speech recognition. Unlike the standard MFCC which uses all frequency banks for cepstral coefficients, we only use the frequency banks that are slightly corrupted and discard the seriously obscured ones. Cepstral coefficients are calculated only on the selected frequency banks. Moreover, an acoustic model is also adapted to match the modification of the acoustic feature. Experiments on continuous digital speech recognition show that the proposed algorithm leads to better performance than spectral subtraction and cepstral mean normalization at low SNRs.
Keywords :
cepstral analysis; noise; performance evaluation; speech processing; speech recognition; MFCC; acoustic model; adaptive frequency bank selection; additive background noise environment; cepstral coefficients; cepstral mean normalization; continuous digital speech recognition; experiments; performance; robust noisy speech recognition; robust word boundary detection; spectral subtraction; word signal information; Additive noise; Automatic speech recognition; Background noise; Cepstral analysis; Frequency; Noise robustness; Signal processing; Speech processing; Speech recognition; Working environment noise;
Conference_Titel :
Multimodal Interfaces, 2002. Proceedings. Fourth IEEE International Conference on
Print_ISBN :
0-7695-1834-6
DOI :
10.1109/ICMI.2002.1166972