• DocumentCode
    1066504
  • Title

    Higher Order Cepstral Moment Normalization for Improved Robust Speech Recognition

  • Author

    Hsu, Chang-Wen ; Lee, Lin-shan

  • Author_Institution
    Nat. Taiwan Univ., Taipei
  • Volume
    17
  • Issue
    2
  • fYear
    2009
  • Firstpage
    205
  • Lastpage
    220
  • Abstract
    Cepstral normalization has widely been used as a powerful approach to produce robust features for speech recognition. Good examples of this approach include cepstral mean subtraction, and cepstral mean and variance normalization, in which either the first or both the first and the second moments of the Mel-frequency cepstral coefficients (MFCCs) are normalized. In this paper, we propose the family of higher order cepstral moment normalization, in which the MFCC parameters are normalized with respect to a few moments of orders higher than 1 or 2. The basic idea is that the higher order moments are more dominated by samples with larger values, which are very likely the primary sources of the asymmetry and abnormal flatness or tail size of the parameter distributions. Normalization with respect to these moments therefore puts more emphasis on these signal components and constrains the distributions to be more symmetric with more reasonable flatness and tail size. The fundamental principles behind this approach are also analyzed and discussed based on the statistical properties of the distributions of the MFCC parameters. Experimental results based on the AURORA 2, AURORA 3, AURORA 4, and Resource Management (RM) testing environments show that with the proposed approach, recognition accuracy can be significantly and consistently improved for all types of noise and all SNR conditions.
  • Keywords
    cepstral analysis; speech recognition; statistical analysis; AURORA testing environment; MFCC parameter; Mel-frequency cepstral coefficient; cepstral mean subtraction; cepstral mean-variance normalization; higher order cepstral moment normalization; resource management testing environment; robust speech recognition; statistical property; Acoustic distortion; Acoustic testing; Additive noise; Cepstral analysis; Collision mitigation; Mel frequency cepstral coefficient; Noise robustness; Probability distribution; Speech recognition; Upper bound; $N$th-order moment; Cepstral mean and variance normalization (CMVN); cepstral normalization; robust speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2008.2006575
  • Filename
    4749448