• DocumentCode
    763398
  • Title

    Automatic speech recognition with an adaptation model motivated by auditory processing

  • Author

    Holmberg, Marcus ; Gelbart, David ; Hemmert, Werner

  • Author_Institution
    Infineon Technol. AG, Munich, Germany
  • Volume
    14
  • Issue
    1
  • fYear
    2006
  • Firstpage
    43
  • Lastpage
    49
  • Abstract
    The mel-frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) feature extraction typically used for automatic speech recognition (ASR) employ several principles which have known counterparts in the cochlea and auditory nerve: frequency decomposition, mel- or bark-warping of the frequency axis, and compression of amplitudes. It seems natural to ask if one can profitably employ a counterpart of the next physiological processing step, synaptic adaptation. We, therefore, incorporated a simplified model of short-term adaptation into MFCC feature extraction. We evaluated the resulting ASR performance on the AURORA 2 and AURORA 3 tasks, in comparison to ordinary MFCCs, MFCCs processed by RASTA, and MFCCs processed by cepstral mean subtraction (CMS), and both in comparison to and in combination with Wiener filtering. The results suggest that our approach offers a simple, causal robustness strategy which is competitive with RASTA, CMS, and Wiener filtering and performs well in combination with Wiener filtering. Compared to the structurally related RASTA, our adaptation model provides superior performance on AURORA 2 and, if Wiener filtering is used prior to both approaches, on AURORA 3 as well.
  • Keywords
    Wiener filters; cepstral analysis; feature extraction; speech recognition; Wiener filtering; adaptation model; amplitude compression; auditory processing; automatic speech recognition; bark-warping; cepstral mean subtraction; frequency decomposition; mel-frequency cepstral coefficient; mel-warping; perceptual linear prediction feature extraction; physiological processing; synaptic adaptation; Adaptation model; Automatic speech recognition; Cepstral analysis; Collision mitigation; Feature extraction; Humans; Mel frequency cepstral coefficient; Psychoacoustic models; Speech recognition; Wiener filter; Neural adaptation; noise robustness; speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TSA.2005.860349
  • Filename
    1561262