Title :
Variational Bayesian Joint Factor Analysis Models for Speaker Verification
Author :
Zhao, Xianyu ; Dong, Yuan
fDate :
3/1/2012 12:00:00 AM
Abstract :
Joint factor analysis (JFA) is a recently developed method to model speaker and session variability in Gaussian Mixture Models (GMMs). In this paper, both batch and sequential Bayesian analysis of JFA models are evaluated for robust speaker recognition. Various sources of uncertainties in JFA models, from latent speaker and channel factors to Gaussian mixture indicator variables, are examined from a Bayesian perspective. By integrating over all these latent factors, we could better account for the sources of variability in speaker enrollment and verification processes than considering only point estimates; through this study, we could also analyze and identify the contribution of these various underlying model uncertainties to the final speaker verification performance. However, as all latent variables in JFA GMM become correlated with each other given observed data, it becomes practically intractable to do Bayesian analysis in closed analytic form. Hence, an alternative approach based on variational Bayes is developed in this paper to explore Bayesian JFA models in an approximate yet efficient way. In this method, fully correlated a posteriori distribution is approximated by a variational distribution of factored form to facilitate inference; and a lower bound on model likelihood is also derived to construct detection scores. Experimental results on the 2008 NIST Speaker Recognition Evaluation (NIST SRE) show that these variational Bayesian JFA models could obtain significant performance improvements over JFA using point estimates, especially for the cases with limited enrollment and test data. For the 10-s task in the 2008 NIST SRE, the variational Bayesian JFA systems obtained relatively 9.4% EER and 11.5% DCF reductions compared to the baseline JFA system. This paper also shows the importance of taking into account the uncertainties in both speaker and channel factors, which is more effective than considering uncertainties in channel factors alone.
Keywords :
Gaussian distribution; belief networks; inference mechanisms; speaker recognition; variational techniques; 2008 NIST Speaker Recognition Evaluation; GMM; Gaussian mixture indicator variables; Gaussian mixture models; JFA model; NIST SRE; batch Bayesian analysis; channel factors; correlated a posteriori distribution; inference mechanism; lower bound; model likelihood; model uncertainties; point estimates; robust speaker recognition; sequential Bayesian analysis; session variability model; speaker verification; variational Bayesian joint factor analysis models; Analytical models; Approximation methods; Bayesian methods; Joints; NIST; Speech; Uncertainty; Gaussian mixture models (GMMs); joint factor analysis (JFA); session variability; speaker variability; speaker verification (SV); variational Bayes;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2011.2170972