On complementarity of state-of-the-art speaker recognition systems

Author

Machlica, Lukas ; Zajic, Zbynek ; Muller, Lukas

Author_Institution

Dept. of Cybern., Univ. of West Bohemia, Pilsen, Czech Republic

fYear

2012

fDate

12-15 Dec. 2012

Abstract

In this paper recent methods used in the task of Speaker Recognition (SR) are reviewed and their complementarity is analysed. At first, methods based on Supervectors (SVs) related to Gaussian Mixture Models (GMMs) and Support Vector Machines (SVMs) used as a discriminative model are described along with the Nuisance Attribute Projection (NAP). NAP was proposed to suppress undesirable influences of high channel variabilities between several sessions of a speaker. Next, recent methods focusing on the extraction of so called i-vectors (low dimensional representations of GMM based SVs) are discussed. The space in which i-vectors lie is denoted the Total Variability Space (TVS) since it contains both between-speaker and session/channel variabilities. Once i-vectors have been extracted a Probabilistic Linear Discriminant Analysis (PLDA) model is trained in the TVS. In the training phase of PLDA the TVS is decomposed to a channel and a speaker subspace, hence each i-vector is supposed to be composed from a speaker identity component and a channel component. The complementarity of PLDA and SVM based modelling techniques is examined utilizing the linear logistic regression as a fusion tool used to combine the verification scores of individual systems leading to significant reductions in error rates of the SR system. The results are presented on the NIST SRE 2008 and NIST SRE 2010 corpora.

Keywords

Gaussian processes; regression analysis; sensor fusion; speaker recognition; support vector machines; vectors; GMM; Gaussian mixture models; NIST SRE 2008 corpora; NIST SRE 2010 corpora; PLDA model; SR system; SVM; discriminative model; fusion tool; linear logistic regression; nuisance attribute projection; probabilistic linear discriminant analysis; speaker channel component; speaker channel variability; speaker identity component; speaker recognition systems; supervectors; support vector machines; total variability space; verification scores; Switches; NAP; PLDA; SVM; fusion; i-vector;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing and Information Technology (ISSPIT), 2012 IEEE International Symposium on

Conference_Location

Ho Chi Minh City

Print_ISBN

978-1-4673-5604-6

Type

conf

DOI

10.1109/ISSPIT.2012.6621280

Filename

6621280