• DocumentCode
    1522183
  • Title

    Feature Extraction Based on Pitch-Synchronous Averaging for Robust Speech Recognition

  • Author

    Morales-Cordovilla, Juan A. ; Peinado, Antonio M. ; Sánchez, Victoria ; González, José A.

  • Author_Institution
    Dept. of Teor. de la Senal Telematica y Comun., Univ. de Granada, Granada, Spain
  • Volume
    19
  • Issue
    3
  • fYear
    2011
  • fDate
    3/1/2011 12:00:00 AM
  • Firstpage
    640
  • Lastpage
    651
  • Abstract
    In this paper, we propose two estimators for the autocorrelation sequence of a periodic signal in additive noise. Both estimators are formulated employing tables which contain all the possible products of sample pairs in a speech signal frame. The first estimator is based on a pitch-synchronous averaging. This estimator is statistically analyzed and we show that the signal-to-noise ratio (SNR) can be increased up to a factor equal to the number of available periods. The second estimator is similar to the former one but it avoids the use of those sample products more likely affected by noise. We prove that, under certain conditions, this estimator can remove the effect of an additive noise in a statistical sense. Both estimators are employed to extract mel frequency cepstral coefficients (MFCCs) as features for robust speech recognition. Although these estimators are initially conceived for voiced speech frames, we extend their application to unvoiced sounds in order to obtain a coherent feature extractor. The experimental results show the superiority of the proposed approach over other MFCC-based front-ends such as the higher-lag autocorrelation spectrum estimation (HASE), which also employs the idea of avoiding those autocorrelation coefficients more likely affected by noise.
  • Keywords
    cepstral analysis; correlation methods; feature extraction; speech recognition; autocorrelation coefficient; autocorrelation sequence; feature extraction; higher lag autocorrelation spectrum estimation; mel frequency cepstral coefficient; periodic signal; pitch synchronous averaging; robust speech recognition; signal-to-noise ratio; speech signal frame; voiced speech frames; Acoustic noise; Additive noise; Autocorrelation; Feature extraction; Frequency estimation; Mel frequency cepstral coefficient; Noise robustness; Signal to noise ratio; Spectral analysis; Speech recognition; Acoustic noise; autocorrelation estimation; autocorrelation-based mel frequency cepstral coefficient (AMFCC); pitch-synchronous analysis; robust speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2053846
  • Filename
    5492175