• DocumentCode
    1425063
  • Title

    HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features

  • Author

    Chengalvarayan, Rathinavelu ; Deng, Li

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada
  • Volume
    5
  • Issue
    3
  • fYear
    1997
  • fDate
    5/1/1997 12:00:00 AM
  • Firstpage
    243
  • Lastpage
    256
  • Abstract
    In the study reported in this paper, we investigate interactions of front-end feature extraction and back-end classification techniques in hidden Markov model-based (HMM-based) speech recognition. The proposed model focuses on dimensionality reduction of the mel-warped discrete Fourier transform (DFT) feature space subject to maximal preservation of speech classification information, and aims at finding an optimal linear transformation on the mel-warped DFT according to the minimum classification error (MCE) criterion. This linear transformation, along with the HMM parameters, are automatically trained using the gradient descent method to minimize a measure of overall empirical error counts. A further generalization of the model allows integration of the discriminatively derived state-dependent transformation with the construction of dynamic feature parameters. Experimental results show that state-dependent transformation on mel-warped DFT features is superior in performance to the mel-frequency cepstral coefficients (MFCC´s). An error rate reduction of 15% is obtained on a standard 39-class TIMIT phone classification task, in comparison with the conventional MCE-trained HMM using MFCC´s that have not been subject to optimization during training
  • Keywords
    discrete Fourier transforms; error analysis; feature extraction; hidden Markov models; minimisation; speech recognition; HMM-based speech recognition; back-end classification techniques; dimensionality reduction; discrete Fourier transform; empirical error counts; error rate reduction; front-end feature extraction; gradient descent method; hidden Markov model-based speech recognition; linear transformation; mel-frequency cepstral coefficients; mel-warped DFT features; minimum classification error; optimal linear transformation; optimization; speech classification information; standard 39-class TIMIT phone classification task; state-dependent discriminatively derived transforms; training; Cepstral analysis; Data mining; Discrete Fourier transforms; Discrete cosine transforms; Feature extraction; Filter bank; Hidden Markov models; Psychoacoustic models; Speech processing; Speech recognition;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.568731
  • Filename
    568731