Noise and speaker compensation in the Log filter bank domain

Author

Joshi, Vikas ; Bilgi, Raghavendra ; Umesh, S. ; Garcia, L. ; Benitez, C.

Author_Institution

Dept. of Electr. Eng., Indian Inst. of Technol., Madras, Chennai, India

fYear

2012

fDate

25-30 March 2012

Firstpage

4709

Lastpage

4712

Abstract

In this paper, we propose a method to compensate for noise and speaker-variability directly in the Log filter-bank (FB) domain, so that MFCC features are robust to noise and speaker-variations. For noise-compensation, we use Vector Taylor Series (VTS) approach in the Log FB domain, and speaker-normalization is also done in the Log FB domain using Linear Vocal tract length (VTLN) matrices. For VTLN, optimal selection of warp-factor is done in Log FB domain using canonical GMM model, avoiding the two-pass approach needed by a HMM model. Further, this can be efficiently implemented using sufficient statistics obtained from the GMM and the FB-VTLN-matrices. The warp-factor selection using GMM can also be done in cepstral domain by applying DCT matrices without the usual approximations associated with conventional linear-VTLN. The elegance of the proposed approach is that given the speech data, we obtain directly MFCC features that are robust to noise and speaker-variations. The proposed approach, show a significant relative improvement of 31% over baseline on Aurora-4 task.

Keywords

channel bank filters; hidden Markov models; noise; speaker recognition; Aurora-4 task; FB-VTLN-matrices; HMM model; MFCC features; VTLN matrices; VTS approach; canonical GMM model; linear vocal tract length; log FB domain; log filter bank domain; log filter-bank domain; noise compensation; noise-variations; speaker compensation; speaker-normalization; speaker-variability; speaker-variations; vector Taylor series; Cepstral analysis; Estimation; Hidden Markov models; Histograms; Noise; Noise measurement; Speech; Noise Compensation; Noise and Speaker compensation; Speaker Normalization; TVTLN; VTS;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location

Kyoto

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2012.6288970

Filename

6288970