Title :
Auditory Features Revisited for Robust Speech Recognition
Author :
Kelly, Finnian ; Harte, Naomi
Author_Institution :
Dept. of Electron. & Electr. Eng., Trinity Coll. Dublin, Dublin, Ireland
Abstract :
Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC). Standard Mel-Frequency Cepstral Coefficients (MFCC) are also tested for reference. The performance of all features is reported on the TIMIT database using a HMM-based recogniser. It is found that the PNCC features outperform MFCC in clean conditions and are most robust to noise. ZCPA performance is shown to vary widely with filter bank configuration and frame length. The ZCPA performance is poor in clean conditions but is the least affected by white noise. PNCC is shown to be the most promising new feature set for robust ASR in recent years.
Keywords :
audio signal processing; cepstral analysis; filtering theory; hidden Markov models; speech recognition; Mel-frequency cepstral coefficients; TIMIT database; auditory based front-ends; automatic speech recognition; filter bank configuration; hidden Markov model-based recogniser; noise robustness; power-bias subtraction; power-law nonlinearity; zero-crossings with peak amplitudes; Accuracy; Feature extraction; Finite impulse response filter; Mel frequency cepstral coefficient; Robustness; Speech; Speech recognition; auditory features; speech recognition;
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4244-7542-1
DOI :
10.1109/ICPR.2010.1082