Learning-Based Auditory Encoding for Robust Speech Recognition

Author

Chiu, Yu-Hsiang Bosco ; Raj, Bhiksha ; Stern, Richard M.

Author_Institution

Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA

Volume

20

Issue

3

fYear

2012

fDate

3/1/2012 12:00:00 AM

Firstpage

900

Lastpage

914

Abstract

This paper describes an approach to the optimization of the nonlinear component of a physiologically motivated feature extraction system for automatic speech recognition. Most computational models of the peripheral auditory system include a sigmoidal nonlinear function that relates the log of signal intensity to output level, which we represent by a set of frequency dependent logistic functions. The parameters of these rate-level functions are estimated to maximize the a posteriori probability of the correct class in training data. The performance of this approach was verified by the results of a series of experiments conducted with the CMU S phinx-III speech recognition system on the DARPA Resource Management, Wall Street Journal databases, and on the AURORA 2 database. In general, it was shown that feature extraction that incorporates the learned rate-nonlinearity, combined with a complementary loudness compensation function, results in better recognition accuracy in the presence of background noise than traditional MFCC feature extraction without the optimized nonlinearity when the system is trained on clean speech and tested in noise. We also describe the use of lattice structure that constraints the training process, enabling training with much more complicated acoustic models.

Keywords

encoding; feature extraction; learning (artificial intelligence); probability; speech coding; speech recognition; AURORA 2 database; CMU Sphinx-III speech recognition system; DARPA resource management; MFCC feature extraction; a posteriori probability; automatic speech recognition; learning-based auditory encoding; peripheral auditory system; physiologically motivated feature extraction system; robust speech recognition; Computational modeling; Feature extraction; Hidden Markov models; Noise; Speech; Speech recognition; Training; Auditory model; discriminative training; feature extraction; robust automatic speech recognition;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2011.2168209

Filename

6020747