DocumentCode
3422857
Title
Perceptual MVDR-based cepstral coefficients (PMCCs) for speaker recognition
Author
Liang, Chunyan ; Zhang, Xiang ; Yang, Lin ; Zhang, Jianping ; Yan, Yonghong
Author_Institution
Think IT Speech Lab., CAS, Beijing, China
fYear
2010
fDate
24-28 Oct. 2010
Firstpage
1386
Lastpage
1389
Abstract
Acoustic feature extraction from speech is a fundamental part in both automatic speech recognition and automatic speaker recognition. Mel-frequency cepstral coefficients (MFCCs) are widely used in both of the above two research directions. A new feature extraction technique named perceptual MVDR-based cepstral coefficients (PMCCs) has been demonstrated to perform superior in automatic speech recognition. Unlike the MFCCs in which a mel-scaled filterbank is applied to the short term FFT spectrum to obtain a perceptually meaningful smoothed gross spectrum, PMCCs use the Minimum Variance Distortionless Response (MVDR) all-pole model to represent the spectral envelope of the perceptual spectrum. In this study, we extract PMCCs and model them using Gaussian Mixture Models (GMMs) for speaker recognition. In order to compensate for speaker and channel variability effects, joint factor analysis (JFA) is used. The experiments are carried out on the core conditions of NIST 2008 speaker recognition evaluation data. The experimental results indicate that the systems based on PMCCs can achieve comparable performance to those based on MFCCs. Besides, the fusion of the two kinds of systems can make significant performance improvement compared to the MFCCs system alone.
Keywords
Gaussian processes; acoustic signal processing; cepstral analysis; channel bank filters; feature extraction; speaker recognition; FFT spectrum; GMM; Gaussian mixture model; JFA; MFCC; PMCC; acoustic feature extraction; automatic speaker recognition; automatic speech recognition; channel variability effect; joint factor analysis; mel-frequency cepstral coefficient; mel-scaled filterbank; minimum variance distortionless response; perceptual MVDR-based cepstral coefficient; spectral envelope; Interviews; Loading; Mel frequency cepstral coefficient; NIST; Speaker recognition; Speech; MVDR; PMCC; joint factor analysis; speaker recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing (ICSP), 2010 IEEE 10th International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-5897-4
Type
conf
DOI
10.1109/ICOSP.2010.5656906
Filename
5656906
Link To Document