• DocumentCode
    3165142
  • Title

    Noise and speaker compensation in the Log filter bank domain

  • Author

    Joshi, Vikas ; Bilgi, Raghavendra ; Umesh, S. ; Garcia, L. ; Benitez, C.

  • Author_Institution
    Dept. of Electr. Eng., Indian Inst. of Technol., Madras, Chennai, India
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4709
  • Lastpage
    4712
  • Abstract
    In this paper, we propose a method to compensate for noise and speaker-variability directly in the Log filter-bank (FB) domain, so that MFCC features are robust to noise and speaker-variations. For noise-compensation, we use Vector Taylor Series (VTS) approach in the Log FB domain, and speaker-normalization is also done in the Log FB domain using Linear Vocal tract length (VTLN) matrices. For VTLN, optimal selection of warp-factor is done in Log FB domain using canonical GMM model, avoiding the two-pass approach needed by a HMM model. Further, this can be efficiently implemented using sufficient statistics obtained from the GMM and the FB-VTLN-matrices. The warp-factor selection using GMM can also be done in cepstral domain by applying DCT matrices without the usual approximations associated with conventional linear-VTLN. The elegance of the proposed approach is that given the speech data, we obtain directly MFCC features that are robust to noise and speaker-variations. The proposed approach, show a significant relative improvement of 31% over baseline on Aurora-4 task.
  • Keywords
    channel bank filters; hidden Markov models; noise; speaker recognition; Aurora-4 task; FB-VTLN-matrices; HMM model; MFCC features; VTLN matrices; VTS approach; canonical GMM model; linear vocal tract length; log FB domain; log filter bank domain; log filter-bank domain; noise compensation; noise-variations; speaker compensation; speaker-normalization; speaker-variability; speaker-variations; vector Taylor series; Cepstral analysis; Estimation; Hidden Markov models; Histograms; Noise; Noise measurement; Speech; Noise Compensation; Noise and Speaker compensation; Speaker Normalization; TVTLN; VTS;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6288970
  • Filename
    6288970