Early auditory processing inspired features for robust automatic speech recognition

Author

Kalinli, Ozlem ; Narayanan, Shrikanth

Author_Institution

Dept. of Electr. Eng.-Syst., Univ. of Southern California, Los Angeles, CA, USA

fYear

2007

fDate

3-7 Sept. 2007

Firstpage

2385

Lastpage

2389

Abstract

In this paper, we derive bio-inspired features for automatic speech recognition based on the early processing stages in the human auditory system. The utility and robustness of the derived features are validated in a speech recognition task under a variety of noise conditions. First, we develop an auditory based feature by replacing the filterbank analysis stage of Mel-frequency cepstral coefficients (MFCC) feature extraction with an auditory model that consists of cochlear filtering, inner hair cell, and lateral inhibitory network stages. Then, we propose a new feature set that retains only the cochlear channel outputs that are more likely to fire the neurons in the central auditory system. This feature set is extracted by principal component analysis (PCA) of nonlinearly compressed early auditory spectrum. When evaluated in a connected digit recognition task using the Aurora 2.0 database, the proposed feature set has 40% and 18% average word error rate improvement relative to the MFCC and RelAtive SpecTrAl (RASTA) features, respectively.

Keywords

channel bank filters; feature extraction; principal component analysis; speech recognition; Aurora 2.0 database; MFCC; MFCC feature extraction; Mel-frequency cepstral coefficients; PCA; RASTA; auditory processing inspired features; bioinspired features; central auditory system; cochlear channel outputs; cochlear filtering; digit recognition task; filter bank analysis stage; human auditory system; inner hair cell; lateral inhibitory network stages; noise conditions; principal component analysis; relative spectral features; robust automatic speech recognition; Auditory system; Feature extraction; Mel frequency cepstral coefficient; Noise; Robustness; Speech; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing Conference, 2007 15th European

Conference_Location

Poznan

Print_ISBN

978-839-2134-04-6

Type

conf

Filename

7099235