• DocumentCode
    3131550
  • Title

    Robust speech recognition based on the second-order difference cochlear model

  • Author

    Wan, Wunggen ; Au, Oscar C.

  • Author_Institution
    Multimedia Innovation Centre, Hong Kong Polytech., Kowloon, China
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    543
  • Lastpage
    546
  • Abstract
    MFCC (Mel-Frequency Cepstral Coefficients) is a kind of traditional speech feature widely used in speech recognition. The error rate of the speech recognition algorithm using MFCC and CDHMM is known to be very low in a clean speech environment, but it increases greatly in a noisy environment, especially in the white noisy environment. We propose a new kind of speech feature called the auditory spectrum based feature (ASBF) that is based on the second-order difference cochlear model of the human auditory system. This new speech feature can track the speech formants and the selection scheme of this feature is based on both the second-order difference cochlear model and primary auditory nerve processing model of the human auditory system. In our experiment, the performance of MFCC and ASBF are compared in both clean and noisy environments when left-to-right CDHMM with 6 states and 5 Gaussian mixtures is used. The experimental result shows that the ASBF is much more robust to noise than MFCC. When only 5 frequency components are used in ASBF, the error rate is approximately 38% lower than the traditional MFCC with 39 parameters in the condition of S/N=10 dB with white noise
  • Keywords
    cepstral analysis; hearing; speech recognition; white noise; Gaussian mixtures; Mel-Frequency Cepstral Coefficients; auditory spectrum based feature; error rate; experimental result; human auditory system; noisy environment; primary auditory nerve processing model; robust speech recognition; second-order difference cochlear model; white noise; Auditory system; Cepstral analysis; Error analysis; Humans; Mel frequency cepstral coefficient; Robustness; Speech processing; Speech recognition; White noise; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Multimedia, Video and Speech Processing, 2001. Proceedings of 2001 International Symposium on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    962-85766-2-3
  • Type

    conf

  • DOI
    10.1109/ISIMP.2001.925453
  • Filename
    925453