DocumentCode :
2113535
Title :
Robustifying cepstral features by mitigating the outlier effect for noisy speech recognition
Author :
Hao-teng Fan ; Kuan-wei Hsieh ; Chien-Hao Huang ; Jeih-weih Hung
Author_Institution :
Dept. of Electr. Eng., Nat. Chi Nan Univ., Puli, Taiwan
fYear :
2013
fDate :
23-25 July 2013
Firstpage :
935
Lastpage :
939
Abstract :
The performance of automatic speech recognition (ASR) systems is often seriously degraded by noise interference. Among the techniques to reduce the noise effect, cepstral mean-and-variance normalization (CMVN) is a simple yet quite effective approach for processing MFCC speech features. However, the features processed by CMVN contain a significant number of outliers, which very likely weakens the effect of CMVN. This paper primarily proposes to deal with the outliers left by CMVN with two directions. The first one is to apply a sigmoid function transformation, which provides explicit lower and upper bounds for the outliers, and the second one exploits the well-known median filter to remove the impulse-like outliers in the CMVN features. Under the Aurora-2 digit recognition database and task, the presented two frameworks give rise to around 5% in absolute accuracy improvement in comparison with CMVN, and the corresponding word error rate reduction relative to the MFCC baseline is as high as 50%.
Keywords :
audio databases; cepstral analysis; feature extraction; median filters; speech recognition; ASR systems; Aurora-2 digit recognition database; CMVN; MFCC speech feature processing; automatic speech recognition system; cepstral features robustification; cepstral mean-and-variance normalization; impulse-like outliers; median filter; noise effect; noise interference; noisy speech recognition; outlier effect mitigation; sigmoid function transformation; word error rate reduction; Frequency modulation; Fuzzy systems; Knowledge discovery; Mel frequency cepstral coefficient; Robustness; Signal to noise ratio; cepstral mean and variance normalization; median filter; noise robustness; sigmoid function; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2013 10th International Conference on
Conference_Location :
Shenyang
Type :
conf
DOI :
10.1109/FSKD.2013.6816329
Filename :
6816329
Link To Document :
بازگشت