A comparison of auditory features for robust speech recognition

Author

Kelly, Finnian ; Harte, Naomi

Author_Institution

Dept. of Electron. & Electr. Eng., Sigmedia Group, Trinity Coll. Dublin, Dublin, Ireland

fYear

2010

fDate

23-27 Aug. 2010

Firstpage

1968

Lastpage

1972

Abstract

This paper presents a detailed comparison of the performance of two auditory based feature extraction algorithms for automatic speech recognition (ASR). The feature sets are Zero-Crossings with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC). Standard Mel-Frequency Cepstral Coefficients (MFCC) are also tested for comparison. Although front-ends have been compared in previous papers, this work focuses on two of the most promising algorithms for noise robustness. The performance of all features is reported on the TIMIT database using a HMM system. It is found that the PNCC features outperform MFCC in clean conditions and are robust to noise. ZCPA performance is shown to vary widely with filterbank configuration and frame length. The ZCPA performance is poor in clean conditions but is the least affected by white noise. PNCC is shown to be the most promising new feature set for robust ASR in recent years.

Keywords

channel bank filters; feature extraction; hidden Markov models; speech recognition; ASR; HMM system; MFCC; Mel-frequency cepstral coefficients; PNCC; TIMIT database; auditory based feature extraction; automatic speech recognition; filterbank configuration; peak amplitudes; power bias subtraction; power law nonlinearity; robust speech recognition; zero crossings; Accuracy; Feature extraction; Finite impulse response filters; Histograms; Mel frequency cepstral coefficient; Speech; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing Conference, 2010 18th European

Conference_Location

Aalborg

ISSN

2219-5491

Type

conf

Filename

7096363