DocumentCode :
39264
Title :
Robust Feature Extraction Using Modulation Filtering of Autoregressive Models
Author :
Ganapathy, Shrikanth ; Mallidi, S.H. ; Hermansky, Hynek
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
22
Issue :
8
fYear :
2014
fDate :
Aug. 2014
Firstpage :
1285
Lastpage :
1295
Abstract :
Speaker and language recognition in noisy and degraded channel conditions continue to be a challenging problem mainly due to the mismatch between clean training and noisy test conditions. In the presence of noise, the most reliable portions of the signal are the high energy regions which can be used for robust feature extraction. In this paper, we propose a front end processing scheme based on autoregressive (AR) models that represent the high energy regions with good accuracy followed by a modulation filtering process. The AR model of the spectrogram is derived using two separable time and frequency AR transforms. The first AR model (temporal AR model) of the sub-band Hilbert envelopes is derived using frequency domain linear prediction (FDLP). This is followed by a spectral AR model applied on the FDLP envelopes. The output 2-D AR model represents a low-pass modulation filtered spectrogram of the speech signal. The band-pass modulation filtered spectrograms can further be derived by dividing two AR models with different model orders (cut-off frequencies). The modulation filtered spectrograms are converted to cepstral coefficients and are used for a speaker recognition task in noisy and reverberant conditions. Various speaker recognition experiments are performed with clean and noisy versions of the NIST-2010 speaker recognition evaluation (SRE) database using the state-of-the-art speaker recognition system. In these experiments, the proposed front-end analysis provides substantial improvements (relative improvements of up to 25%) compared to baseline techniques. Furthermore, we also illustrate the generalizability of the proposed methods using language identification (LID) experiments on highly degraded high-frequency (HF) radio channels and speech recognition experiments on noisy data.
Keywords :
Hilbert transforms; autoregressive processes; feature extraction; frequency-domain analysis; low-pass filters; speaker recognition; 2D AR model; AR transforms; FDLP; HF radio channels; LID experiments; NIST-2010 speaker recognition evaluation database; SRE database; autoregressive model; band-pass modulation filtered spectrograms; cepstral coefficients; frequency domain linear prediction; front end processing scheme; graded channel conditions; high-frequency radio channels; language identification experiments; language recognition; low-pass modulation filtered spectrogram; robust feature extraction; speech signal; subband Hilbert envelopes; Modulation; Noise; Noise measurement; Robustness; Speaker recognition; Spectrogram; Speech; Autoregressive modeling; feature extraction; modulation filtering; speaker and language recognition;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2014.2329190
Filename :
6826560
Link To Document :
بازگشت