Auditory features based on Gammatone filters for robust speech recognition

Author

Jun Qi ; Dong Wang ; Yi Jiang ; Runsheng Liu

Author_Institution

Dept. of Electron. Eng., Tsinghua Univ., Beijing, China

fYear

2013

fDate

19-23 May 2013

Firstpage

305

Lastpage

308

Abstract

A major challenge for automatic speech recognition (ASR) relates to significant performance reduction in noisy environments. Recent research has shown that auditory features based on Gammatone filters are promising to improve robustness of ASR systems against noise, though the research is far from extensive and generalizability of the new features is unknown. This paper presents our implementation of the Gamma-tone filter-based feature and the experimental results on Mandarin speech data. By some thorough designs, we obtained significant performance gains with the new feature in various noise conditions when compared with the widely used MFCC and PLP features. A particular novelty of our implementation is that the filter design is purely in the time domain. This means that the channel signals are obtained with a set of Gammatone filters applied directly on the speech signals in time domain, which is totally different from the commonly adopted frequency-domain design that first converts signals to spectra and then applies the filter banks upon them. The time-domain implementation on the one hand avoids the approximation introduced by short-time spectral analysis and hence is more precise; and on the other hand, it avoids the complex spectral computation and hence simplifies hardware realization.

Keywords

audio signal processing; digital filters; feature extraction; spectral analysis; speech recognition; time-domain analysis; ASR; Gammatone filters; Mandarin speech data; automatic speech recognition; complex spectral computation; hardware realization; noisy environments; performance reduction; short-time spectral analysis; speech signals; time domain implementation; Frequency-domain analysis; Mel frequency cepstral coefficient; Noise; Robustness; Speech; Speech recognition; Time-domain analysis; Gammatone filters; feature extraction; robust speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Circuits and Systems (ISCAS), 2013 IEEE International Symposium on

Conference_Location

Beijing

ISSN

0271-4302

Print_ISBN

978-1-4673-5760-9

Type

conf

DOI

10.1109/ISCAS.2013.6571843

Filename

6571843