• DocumentCode
    626520
  • Title

    Auditory features based on Gammatone filters for robust speech recognition

  • Author

    Jun Qi ; Dong Wang ; Yi Jiang ; Runsheng Liu

  • Author_Institution
    Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
  • fYear
    2013
  • fDate
    19-23 May 2013
  • Firstpage
    305
  • Lastpage
    308
  • Abstract
    A major challenge for automatic speech recognition (ASR) relates to significant performance reduction in noisy environments. Recent research has shown that auditory features based on Gammatone filters are promising to improve robustness of ASR systems against noise, though the research is far from extensive and generalizability of the new features is unknown. This paper presents our implementation of the Gamma-tone filter-based feature and the experimental results on Mandarin speech data. By some thorough designs, we obtained significant performance gains with the new feature in various noise conditions when compared with the widely used MFCC and PLP features. A particular novelty of our implementation is that the filter design is purely in the time domain. This means that the channel signals are obtained with a set of Gammatone filters applied directly on the speech signals in time domain, which is totally different from the commonly adopted frequency-domain design that first converts signals to spectra and then applies the filter banks upon them. The time-domain implementation on the one hand avoids the approximation introduced by short-time spectral analysis and hence is more precise; and on the other hand, it avoids the complex spectral computation and hence simplifies hardware realization.
  • Keywords
    audio signal processing; digital filters; feature extraction; spectral analysis; speech recognition; time-domain analysis; ASR; Gammatone filters; Mandarin speech data; automatic speech recognition; complex spectral computation; hardware realization; noisy environments; performance reduction; short-time spectral analysis; speech signals; time domain implementation; Frequency-domain analysis; Mel frequency cepstral coefficient; Noise; Robustness; Speech; Speech recognition; Time-domain analysis; Gammatone filters; feature extraction; robust speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits and Systems (ISCAS), 2013 IEEE International Symposium on
  • Conference_Location
    Beijing
  • ISSN
    0271-4302
  • Print_ISBN
    978-1-4673-5760-9
  • Type

    conf

  • DOI
    10.1109/ISCAS.2013.6571843
  • Filename
    6571843