مرکز منطقه ای اطلاع رساني علوم و فناوري - Maximum Likelihood Acoustic Factor Analysis Models for Robust Speaker Verification in Noise

DocumentCode :

107860

Title :

Maximum Likelihood Acoustic Factor Analysis Models for Robust Speaker Verification in Noise

Author :

Hasan, T. ; Hansen, John H. L.

Author_Institution :

Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA

Volume :

Issue :

fYear :

2014

fDate :

Feb. 2014

Firstpage :

381

Lastpage :

391

Abstract :

Recent speaker recognition/verification systems generally utilize an utterance dependent fixed dimensional vector as features to Bayesian classifiers. These vectors, known as i-Vectors, are lower dimensional representations of Gaussian Mixture Model (GMM) mean super-vectors adapted from a Universal Background Model (UBM) using speech utterance features, and extracted utilizing a Factor Analysis (FA) framework. This method is based on the assumption that the speaker dependent information resides in a lower dimensional sub-space. In this study, we utilize a mixture of Acoustic Factor Analyzers (AFA) to model the acoustic features instead of a GMM-UBM. Following our previously proposed AFA technique (“Acoustic factor analysis for robust speaker verification,” by Hasan and Hansen, IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 4, April 2013), this model is based on the assumption that the speaker relevant information lies in a lower dimensional subspace in the multi-dimensional feature space localized by the mixture components. Unlike our previous method, here we train the AFA-UBM model directly from the data using an Expectation-Maximization (EM) algorithm. This method shows improved robustness to noise as the nuisance dimensions are removed in each EM iteration. Two variants of the AFA model are considered utilizing an isotropic and diagonal covariance residual term. The method is integrated within a standard i-Vector system where the hidden variables of the model, termed as acoustic factors, are utilized as the input for total variability modeling. Experimental results obtained on the 2012 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE) core-extended trials indicate the effectiveness of the proposed strategy in both clean and noisy conditions.

Keywords :

Gaussian processes; expectation-maximisation algorithm; mixture models; speaker recognition; AFA technique; AFA-UBM model; Bayesian classifiers; EM iteration; FA framework; GMM mean super-vectors; GMM-UBM; Gaussian mixture model; NIST SRE; National Institute of Standards and Technology; acoustic factor analyzers; diagonal covariance residual term; dimensional representations; dimensional subspace; expectation-maximization algorithm; isotropic covariance residual term; maximum likelihood acoustic factor analysis model; mixture components; multidimensional feature space; robust speaker verification; speaker recognition evaluation; speaker recognition-verification systems; speaker-dependent information; speech utterance features; standard i-vector system; total variability modeling; universal background model; utterance-dependent fixed dimensional vector; Acoustics; Analytical models; Covariance matrices; Feature extraction; NIST; Noise; Noise measurement; Acoustic factor analysis; mixture of factor analyzers; speaker verification;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

2329-9290

Type :

jour

DOI :

10.1109/TASLP.2013.2292356

Filename :

6674091

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=107860