DocumentCode
706298
Title
Early auditory processing inspired features for robust automatic speech recognition
Author
Kalinli, Ozlem ; Narayanan, Shrikanth
Author_Institution
Dept. of Electr. Eng.-Syst., Univ. of Southern California, Los Angeles, CA, USA
fYear
2007
fDate
3-7 Sept. 2007
Firstpage
2385
Lastpage
2389
Abstract
In this paper, we derive bio-inspired features for automatic speech recognition based on the early processing stages in the human auditory system. The utility and robustness of the derived features are validated in a speech recognition task under a variety of noise conditions. First, we develop an auditory based feature by replacing the filterbank analysis stage of Mel-frequency cepstral coefficients (MFCC) feature extraction with an auditory model that consists of cochlear filtering, inner hair cell, and lateral inhibitory network stages. Then, we propose a new feature set that retains only the cochlear channel outputs that are more likely to fire the neurons in the central auditory system. This feature set is extracted by principal component analysis (PCA) of nonlinearly compressed early auditory spectrum. When evaluated in a connected digit recognition task using the Aurora 2.0 database, the proposed feature set has 40% and 18% average word error rate improvement relative to the MFCC and RelAtive SpecTrAl (RASTA) features, respectively.
Keywords
channel bank filters; feature extraction; principal component analysis; speech recognition; Aurora 2.0 database; MFCC; MFCC feature extraction; Mel-frequency cepstral coefficients; PCA; RASTA; auditory processing inspired features; bioinspired features; central auditory system; cochlear channel outputs; cochlear filtering; digit recognition task; filter bank analysis stage; human auditory system; inner hair cell; lateral inhibitory network stages; noise conditions; principal component analysis; relative spectral features; robust automatic speech recognition; Auditory system; Feature extraction; Mel frequency cepstral coefficient; Noise; Robustness; Speech; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing Conference, 2007 15th European
Conference_Location
Poznan
Print_ISBN
978-839-2134-04-6
Type
conf
Filename
7099235
Link To Document