Feature Extraction Based on Pitch-Synchronous Averaging for Robust Speech Recognition

Author

Morales-Cordovilla, Juan A. ; Peinado, Antonio M. ; Sánchez, Victoria ; González, José A.

Author_Institution

Dept. of Teor. de la Senal Telematica y Comun., Univ. de Granada, Granada, Spain

Volume

19

Issue

3

fYear

2011

fDate

3/1/2011 12:00:00 AM

Firstpage

640

Lastpage

651

Abstract

In this paper, we propose two estimators for the autocorrelation sequence of a periodic signal in additive noise. Both estimators are formulated employing tables which contain all the possible products of sample pairs in a speech signal frame. The first estimator is based on a pitch-synchronous averaging. This estimator is statistically analyzed and we show that the signal-to-noise ratio (SNR) can be increased up to a factor equal to the number of available periods. The second estimator is similar to the former one but it avoids the use of those sample products more likely affected by noise. We prove that, under certain conditions, this estimator can remove the effect of an additive noise in a statistical sense. Both estimators are employed to extract mel frequency cepstral coefficients (MFCCs) as features for robust speech recognition. Although these estimators are initially conceived for voiced speech frames, we extend their application to unvoiced sounds in order to obtain a coherent feature extractor. The experimental results show the superiority of the proposed approach over other MFCC-based front-ends such as the higher-lag autocorrelation spectrum estimation (HASE), which also employs the idea of avoiding those autocorrelation coefficients more likely affected by noise.

Keywords

cepstral analysis; correlation methods; feature extraction; speech recognition; autocorrelation coefficient; autocorrelation sequence; feature extraction; higher lag autocorrelation spectrum estimation; mel frequency cepstral coefficient; periodic signal; pitch synchronous averaging; robust speech recognition; signal-to-noise ratio; speech signal frame; voiced speech frames; Acoustic noise; Additive noise; Autocorrelation; Feature extraction; Frequency estimation; Mel frequency cepstral coefficient; Noise robustness; Signal to noise ratio; Spectral analysis; Speech recognition; Acoustic noise; autocorrelation estimation; autocorrelation-based mel frequency cepstral coefficient (AMFCC); pitch-synchronous analysis; robust speech recognition;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2010.2053846

Filename

5492175