DocumentCode :
742122
Title :
Acoustic Factor Analysis for Robust Speaker Verification
Author :
Hasan, T. ; Hansen, John H. L.
Author_Institution :
Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
Volume :
21
Issue :
4
fYear :
2013
fDate :
4/1/2013 12:00:00 AM
Firstpage :
842
Lastpage :
853
Abstract :
Factor analysis based channel mismatch compensation methods for speaker recognition are based on the assumption that speaker/utterance dependent Gaussian Mixture Model (GMM) mean super-vectors can be constrained to reside in a lower dimensional subspace. This approach does not consider the fact that conventional acoustic feature vectors also reside in a lower dimensional manifold of the feature space, when feature covariance matrices contain close to zero eigenvalues. In this study, based on observations of the covariance structure of acoustic features, we propose a factor analysis modeling scheme in the acoustic feature space instead of the super-vector space and derive a mixture dependent feature transformation. We demonstrate how this single linear transformation performs feature dimensionality reduction, de-correlation, normalization and enhancement, at once. The proposed transformation is shown to be closely related to signal subspace based speech enhancement schemes. In contrast to traditional front-end mixture dependent feature transformations, where feature alignment is performed using the highest scoring mixture, the proposed transformation is integrated within the speaker recognition system using a probabilistic feature alignment technique, which nullifies the need for regenerating the features/retraining the Universal Background Model (UBM). Incorporating the proposed method with a state-of-the-art i-vector and Gaussian Probabilistic Linear Discriminant Analysis (PLDA) framework, we perform evaluations on National Institute of Science and Technology (NIST) Speaker Recognition Evaluation (SRE) 2010 core telephone and microphone tasks. The experimental results demonstrate the superiority of the proposed scheme compared to both full-covariance and diagonal covariance UBM based systems. Simple equal-weight fusion of baseline and proposed systems also yield significant performance gains.
Keywords :
acoustic correlation; compensation; covariance matrices; eigenvalues and eigenfunctions; feature extraction; principal component analysis; probability; speaker recognition; speech enhancement; GMM; Gaussian mixture model; PLDA; UBM; acoustic factor analysis; acoustic feature space; acoustic feature vector; channel mismatch compensation; close to zero eigenvalue; covariance structure; decorrelation analysis; feature covariance matrix; feature dimensionality reduction; feature transformation; front-end mixture; i-vector; linear transformation; microphone; mixture dependent feature transformation; normalization; probabilistic feature alignment; probabilistic linear discriminant analysis; robust speaker verification; signal subspace; speech enhancement scheme; super vector space; telephone; universal background model; Acoustics; Covariance matrix; Eigenvalues and eigenfunctions; Feature extraction; Gain; Speaker recognition; Vectors; Acoustic feature enhancement; factor analysis; probabilistic principal component analysis; speaker verification;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2012.2226161
Filename :
6338275
Link To Document :
بازگشت