Title :
Perceptual MVDR-based cepstral coefficients (PMCCs) for speaker recognition
Author :
Liang, Chunyan ; Zhang, Xiang ; Yang, Lin ; Zhang, Jianping ; Yan, Yonghong
Author_Institution :
Think IT Speech Lab., CAS, Beijing, China
Abstract :
Acoustic feature extraction from speech is a fundamental part in both automatic speech recognition and automatic speaker recognition. Mel-frequency cepstral coefficients (MFCCs) are widely used in both of the above two research directions. A new feature extraction technique named perceptual MVDR-based cepstral coefficients (PMCCs) has been demonstrated to perform superior in automatic speech recognition. Unlike the MFCCs in which a mel-scaled filterbank is applied to the short term FFT spectrum to obtain a perceptually meaningful smoothed gross spectrum, PMCCs use the Minimum Variance Distortionless Response (MVDR) all-pole model to represent the spectral envelope of the perceptual spectrum. In this study, we extract PMCCs and model them using Gaussian Mixture Models (GMMs) for speaker recognition. In order to compensate for speaker and channel variability effects, joint factor analysis (JFA) is used. The experiments are carried out on the core conditions of NIST 2008 speaker recognition evaluation data. The experimental results indicate that the systems based on PMCCs can achieve comparable performance to those based on MFCCs. Besides, the fusion of the two kinds of systems can make significant performance improvement compared to the MFCCs system alone.
Keywords :
Gaussian processes; acoustic signal processing; cepstral analysis; channel bank filters; feature extraction; speaker recognition; FFT spectrum; GMM; Gaussian mixture model; JFA; MFCC; PMCC; acoustic feature extraction; automatic speaker recognition; automatic speech recognition; channel variability effect; joint factor analysis; mel-frequency cepstral coefficient; mel-scaled filterbank; minimum variance distortionless response; perceptual MVDR-based cepstral coefficient; spectral envelope; Interviews; Loading; Mel frequency cepstral coefficient; NIST; Speaker recognition; Speech; MVDR; PMCC; joint factor analysis; speaker recognition;
Conference_Titel :
Signal Processing (ICSP), 2010 IEEE 10th International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-5897-4
DOI :
10.1109/ICOSP.2010.5656906