• DocumentCode
    1322001
  • Title

    Learning-Based Auditory Encoding for Robust Speech Recognition

  • Author

    Chiu, Yu-Hsiang Bosco ; Raj, Bhiksha ; Stern, Richard M.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • Volume
    20
  • Issue
    3
  • fYear
    2012
  • fDate
    3/1/2012 12:00:00 AM
  • Firstpage
    900
  • Lastpage
    914
  • Abstract
    This paper describes an approach to the optimization of the nonlinear component of a physiologically motivated feature extraction system for automatic speech recognition. Most computational models of the peripheral auditory system include a sigmoidal nonlinear function that relates the log of signal intensity to output level, which we represent by a set of frequency dependent logistic functions. The parameters of these rate-level functions are estimated to maximize the a posteriori probability of the correct class in training data. The performance of this approach was verified by the results of a series of experiments conducted with the CMU S phinx-III speech recognition system on the DARPA Resource Management, Wall Street Journal databases, and on the AURORA 2 database. In general, it was shown that feature extraction that incorporates the learned rate-nonlinearity, combined with a complementary loudness compensation function, results in better recognition accuracy in the presence of background noise than traditional MFCC feature extraction without the optimized nonlinearity when the system is trained on clean speech and tested in noise. We also describe the use of lattice structure that constraints the training process, enabling training with much more complicated acoustic models.
  • Keywords
    encoding; feature extraction; learning (artificial intelligence); probability; speech coding; speech recognition; AURORA 2 database; CMU Sphinx-III speech recognition system; DARPA resource management; MFCC feature extraction; a posteriori probability; automatic speech recognition; learning-based auditory encoding; peripheral auditory system; physiologically motivated feature extraction system; robust speech recognition; Computational modeling; Feature extraction; Hidden Markov models; Noise; Speech; Speech recognition; Training; Auditory model; discriminative training; feature extraction; robust automatic speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2011.2168209
  • Filename
    6020747