Perceptual MVDR-based cepstral coefficients (PMCCs) for speaker recognition

Author

Liang, Chunyan ; Zhang, Xiang ; Yang, Lin ; Zhang, Jianping ; Yan, Yonghong

Author_Institution

Think IT Speech Lab., CAS, Beijing, China

fYear

2010

fDate

24-28 Oct. 2010

Firstpage

1386

Lastpage

1389

Abstract

Acoustic feature extraction from speech is a fundamental part in both automatic speech recognition and automatic speaker recognition. Mel-frequency cepstral coefficients (MFCCs) are widely used in both of the above two research directions. A new feature extraction technique named perceptual MVDR-based cepstral coefficients (PMCCs) has been demonstrated to perform superior in automatic speech recognition. Unlike the MFCCs in which a mel-scaled filterbank is applied to the short term FFT spectrum to obtain a perceptually meaningful smoothed gross spectrum, PMCCs use the Minimum Variance Distortionless Response (MVDR) all-pole model to represent the spectral envelope of the perceptual spectrum. In this study, we extract PMCCs and model them using Gaussian Mixture Models (GMMs) for speaker recognition. In order to compensate for speaker and channel variability effects, joint factor analysis (JFA) is used. The experiments are carried out on the core conditions of NIST 2008 speaker recognition evaluation data. The experimental results indicate that the systems based on PMCCs can achieve comparable performance to those based on MFCCs. Besides, the fusion of the two kinds of systems can make significant performance improvement compared to the MFCCs system alone.

Keywords

Gaussian processes; acoustic signal processing; cepstral analysis; channel bank filters; feature extraction; speaker recognition; FFT spectrum; GMM; Gaussian mixture model; JFA; MFCC; PMCC; acoustic feature extraction; automatic speaker recognition; automatic speech recognition; channel variability effect; joint factor analysis; mel-frequency cepstral coefficient; mel-scaled filterbank; minimum variance distortionless response; perceptual MVDR-based cepstral coefficient; spectral envelope; Interviews; Loading; Mel frequency cepstral coefficient; NIST; Speaker recognition; Speech; MVDR; PMCC; joint factor analysis; speaker recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing (ICSP), 2010 IEEE 10th International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4244-5897-4

Type

conf

DOI

10.1109/ICOSP.2010.5656906

Filename

5656906