Automatic speech recognition with an adaptation model motivated by auditory processing

Author

Holmberg, Marcus ; Gelbart, David ; Hemmert, Werner

Author_Institution

Infineon Technol. AG, Munich, Germany

Volume

14

Issue

1

fYear

2006

Firstpage

43

Lastpage

49

Abstract

The mel-frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) feature extraction typically used for automatic speech recognition (ASR) employ several principles which have known counterparts in the cochlea and auditory nerve: frequency decomposition, mel- or bark-warping of the frequency axis, and compression of amplitudes. It seems natural to ask if one can profitably employ a counterpart of the next physiological processing step, synaptic adaptation. We, therefore, incorporated a simplified model of short-term adaptation into MFCC feature extraction. We evaluated the resulting ASR performance on the AURORA 2 and AURORA 3 tasks, in comparison to ordinary MFCCs, MFCCs processed by RASTA, and MFCCs processed by cepstral mean subtraction (CMS), and both in comparison to and in combination with Wiener filtering. The results suggest that our approach offers a simple, causal robustness strategy which is competitive with RASTA, CMS, and Wiener filtering and performs well in combination with Wiener filtering. Compared to the structurally related RASTA, our adaptation model provides superior performance on AURORA 2 and, if Wiener filtering is used prior to both approaches, on AURORA 3 as well.

Keywords

Wiener filters; cepstral analysis; feature extraction; speech recognition; Wiener filtering; adaptation model; amplitude compression; auditory processing; automatic speech recognition; bark-warping; cepstral mean subtraction; frequency decomposition; mel-frequency cepstral coefficient; mel-warping; perceptual linear prediction feature extraction; physiological processing; synaptic adaptation; Adaptation model; Automatic speech recognition; Cepstral analysis; Collision mitigation; Feature extraction; Humans; Mel frequency cepstral coefficient; Psychoacoustic models; Speech recognition; Wiener filter; Neural adaptation; noise robustness; speech recognition;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TSA.2005.860349

Filename

1561262