• DocumentCode
    3422857
  • Title

    Perceptual MVDR-based cepstral coefficients (PMCCs) for speaker recognition

  • Author

    Liang, Chunyan ; Zhang, Xiang ; Yang, Lin ; Zhang, Jianping ; Yan, Yonghong

  • Author_Institution
    Think IT Speech Lab., CAS, Beijing, China
  • fYear
    2010
  • fDate
    24-28 Oct. 2010
  • Firstpage
    1386
  • Lastpage
    1389
  • Abstract
    Acoustic feature extraction from speech is a fundamental part in both automatic speech recognition and automatic speaker recognition. Mel-frequency cepstral coefficients (MFCCs) are widely used in both of the above two research directions. A new feature extraction technique named perceptual MVDR-based cepstral coefficients (PMCCs) has been demonstrated to perform superior in automatic speech recognition. Unlike the MFCCs in which a mel-scaled filterbank is applied to the short term FFT spectrum to obtain a perceptually meaningful smoothed gross spectrum, PMCCs use the Minimum Variance Distortionless Response (MVDR) all-pole model to represent the spectral envelope of the perceptual spectrum. In this study, we extract PMCCs and model them using Gaussian Mixture Models (GMMs) for speaker recognition. In order to compensate for speaker and channel variability effects, joint factor analysis (JFA) is used. The experiments are carried out on the core conditions of NIST 2008 speaker recognition evaluation data. The experimental results indicate that the systems based on PMCCs can achieve comparable performance to those based on MFCCs. Besides, the fusion of the two kinds of systems can make significant performance improvement compared to the MFCCs system alone.
  • Keywords
    Gaussian processes; acoustic signal processing; cepstral analysis; channel bank filters; feature extraction; speaker recognition; FFT spectrum; GMM; Gaussian mixture model; JFA; MFCC; PMCC; acoustic feature extraction; automatic speaker recognition; automatic speech recognition; channel variability effect; joint factor analysis; mel-frequency cepstral coefficient; mel-scaled filterbank; minimum variance distortionless response; perceptual MVDR-based cepstral coefficient; spectral envelope; Interviews; Loading; Mel frequency cepstral coefficient; NIST; Speaker recognition; Speech; MVDR; PMCC; joint factor analysis; speaker recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing (ICSP), 2010 IEEE 10th International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-5897-4
  • Type

    conf

  • DOI
    10.1109/ICOSP.2010.5656906
  • Filename
    5656906