• DocumentCode
    1409306
  • Title

    Auditory Model-Based Design and Optimization of Feature Vectors for Automatic Speech Recognition

  • Author

    Chatterjee, Saikat ; Kleijn, W. Bastiaan

  • Author_Institution
    Commun. Theor. Lab., KTH-R. Inst. of Technol., Stockholm, Sweden
  • Volume
    19
  • Issue
    6
  • fYear
    2011
  • Firstpage
    1813
  • Lastpage
    1825
  • Abstract
    Using spectral and spectro-temporal auditory models along with perturbation-based analysis, we develop a new framework to optimize a feature vector such that it emulates the behavior of the human auditory system. The optimization is carried out in an offline manner based on the conjecture that the local geometries of the feature vector domain and the perceptual auditory domain should be similar. Using this principle along with a static spectral auditory model, we modify and optimize the static spectral mel frequency cepstral coefficients (MFCCs) without considering any feedback from the speech recognition system. We then extend the work to include spectro-temporal auditory properties into designing a new dynamic spectro-temporal feature vector. Using a spectro-temporal auditory model, we design and optimize the dynamic feature vector to incorporate the behavior of human auditory response across time and frequency. We show that a significant improvement in automatic speech recognition (ASR) performance is obtained for any environmental condition, clean as well as noisy.
  • Keywords
    feature extraction; speech recognition; ASR performance; MFCC; automatic speech recognition; dynamic spectro-temporal feature vector; feature vector domain; feature vector optimization; human auditory system; perturbation-based analysis; spectro-temporal auditory models; static spectral auditory model; Acoustic distortion; Computational modeling; Distortion measurement; Optimization; Sensitivity; Speech; Speech recognition; Auditory models; mel frequency cepstral coefficient (MFCC); speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2101597
  • Filename
    5672771