• DocumentCode
    706298
  • Title

    Early auditory processing inspired features for robust automatic speech recognition

  • Author

    Kalinli, Ozlem ; Narayanan, Shrikanth

  • Author_Institution
    Dept. of Electr. Eng.-Syst., Univ. of Southern California, Los Angeles, CA, USA
  • fYear
    2007
  • fDate
    3-7 Sept. 2007
  • Firstpage
    2385
  • Lastpage
    2389
  • Abstract
    In this paper, we derive bio-inspired features for automatic speech recognition based on the early processing stages in the human auditory system. The utility and robustness of the derived features are validated in a speech recognition task under a variety of noise conditions. First, we develop an auditory based feature by replacing the filterbank analysis stage of Mel-frequency cepstral coefficients (MFCC) feature extraction with an auditory model that consists of cochlear filtering, inner hair cell, and lateral inhibitory network stages. Then, we propose a new feature set that retains only the cochlear channel outputs that are more likely to fire the neurons in the central auditory system. This feature set is extracted by principal component analysis (PCA) of nonlinearly compressed early auditory spectrum. When evaluated in a connected digit recognition task using the Aurora 2.0 database, the proposed feature set has 40% and 18% average word error rate improvement relative to the MFCC and RelAtive SpecTrAl (RASTA) features, respectively.
  • Keywords
    channel bank filters; feature extraction; principal component analysis; speech recognition; Aurora 2.0 database; MFCC; MFCC feature extraction; Mel-frequency cepstral coefficients; PCA; RASTA; auditory processing inspired features; bioinspired features; central auditory system; cochlear channel outputs; cochlear filtering; digit recognition task; filter bank analysis stage; human auditory system; inner hair cell; lateral inhibitory network stages; noise conditions; principal component analysis; relative spectral features; robust automatic speech recognition; Auditory system; Feature extraction; Mel frequency cepstral coefficient; Noise; Robustness; Speech; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference, 2007 15th European
  • Conference_Location
    Poznan
  • Print_ISBN
    978-839-2134-04-6
  • Type

    conf

  • Filename
    7099235