مرکز منطقه ای اطلاع رساني علوم و فناوري - Perceptual MVDR-based cepstral coefficients (PMCCs) for speaker recognition

DocumentCode :

3422857

Title :

Perceptual MVDR-based cepstral coefficients (PMCCs) for speaker recognition

Author :

Liang, Chunyan ; Zhang, Xiang ; Yang, Lin ; Zhang, Jianping ; Yan, Yonghong

Author_Institution :

Think IT Speech Lab., CAS, Beijing, China

fYear :

2010

fDate :

24-28 Oct. 2010

Firstpage :

1386

Lastpage :

1389

Abstract :

Acoustic feature extraction from speech is a fundamental part in both automatic speech recognition and automatic speaker recognition. Mel-frequency cepstral coefficients (MFCCs) are widely used in both of the above two research directions. A new feature extraction technique named perceptual MVDR-based cepstral coefficients (PMCCs) has been demonstrated to perform superior in automatic speech recognition. Unlike the MFCCs in which a mel-scaled filterbank is applied to the short term FFT spectrum to obtain a perceptually meaningful smoothed gross spectrum, PMCCs use the Minimum Variance Distortionless Response (MVDR) all-pole model to represent the spectral envelope of the perceptual spectrum. In this study, we extract PMCCs and model them using Gaussian Mixture Models (GMMs) for speaker recognition. In order to compensate for speaker and channel variability effects, joint factor analysis (JFA) is used. The experiments are carried out on the core conditions of NIST 2008 speaker recognition evaluation data. The experimental results indicate that the systems based on PMCCs can achieve comparable performance to those based on MFCCs. Besides, the fusion of the two kinds of systems can make significant performance improvement compared to the MFCCs system alone.

Keywords :

Gaussian processes; acoustic signal processing; cepstral analysis; channel bank filters; feature extraction; speaker recognition; FFT spectrum; GMM; Gaussian mixture model; JFA; MFCC; PMCC; acoustic feature extraction; automatic speaker recognition; automatic speech recognition; channel variability effect; joint factor analysis; mel-frequency cepstral coefficient; mel-scaled filterbank; minimum variance distortionless response; perceptual MVDR-based cepstral coefficient; spectral envelope; Interviews; Loading; Mel frequency cepstral coefficient; NIST; Speaker recognition; Speech; MVDR; PMCC; joint factor analysis; speaker recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal Processing (ICSP), 2010 IEEE 10th International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4244-5897-4

Type :

conf

DOI :

10.1109/ICOSP.2010.5656906

Filename :

5656906

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3422857