Title :
On complementarity of state-of-the-art speaker recognition systems
Author :
Machlica, Lukas ; Zajic, Zbynek ; Muller, Lukas
Author_Institution :
Dept. of Cybern., Univ. of West Bohemia, Pilsen, Czech Republic
Abstract :
In this paper recent methods used in the task of Speaker Recognition (SR) are reviewed and their complementarity is analysed. At first, methods based on Supervectors (SVs) related to Gaussian Mixture Models (GMMs) and Support Vector Machines (SVMs) used as a discriminative model are described along with the Nuisance Attribute Projection (NAP). NAP was proposed to suppress undesirable influences of high channel variabilities between several sessions of a speaker. Next, recent methods focusing on the extraction of so called i-vectors (low dimensional representations of GMM based SVs) are discussed. The space in which i-vectors lie is denoted the Total Variability Space (TVS) since it contains both between-speaker and session/channel variabilities. Once i-vectors have been extracted a Probabilistic Linear Discriminant Analysis (PLDA) model is trained in the TVS. In the training phase of PLDA the TVS is decomposed to a channel and a speaker subspace, hence each i-vector is supposed to be composed from a speaker identity component and a channel component. The complementarity of PLDA and SVM based modelling techniques is examined utilizing the linear logistic regression as a fusion tool used to combine the verification scores of individual systems leading to significant reductions in error rates of the SR system. The results are presented on the NIST SRE 2008 and NIST SRE 2010 corpora.
Keywords :
Gaussian processes; regression analysis; sensor fusion; speaker recognition; support vector machines; vectors; GMM; Gaussian mixture models; NIST SRE 2008 corpora; NIST SRE 2010 corpora; PLDA model; SR system; SVM; discriminative model; fusion tool; linear logistic regression; nuisance attribute projection; probabilistic linear discriminant analysis; speaker channel component; speaker channel variability; speaker identity component; speaker recognition systems; supervectors; support vector machines; total variability space; verification scores; Switches; NAP; PLDA; SVM; fusion; i-vector;
Conference_Titel :
Signal Processing and Information Technology (ISSPIT), 2012 IEEE International Symposium on
Conference_Location :
Ho Chi Minh City
Print_ISBN :
978-1-4673-5604-6
DOI :
10.1109/ISSPIT.2012.6621280