DocumentCode :
1448456
Title :
GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition
Author :
You, Chang Huai ; Lee, Kong Aik ; Li, Haizhou
Author_Institution :
Human Language Technol., Agency for Sci., Technol. & Res. (A*STAR), Singapore, Singapore
Volume :
18
Issue :
6
fYear :
2010
Firstpage :
1300
Lastpage :
1312
Abstract :
Among conventional methods for text-independent speaker recognition, Gaussian mixture model (GMM) is known for its effectiveness and scalability in modeling the spectral distribution of speech. A GMM-supervector characterizes a speaker´s voice by the GMM parameters such as the mean vectors, covariance matrices and mixture weights. Besides the first-order statistics, it is generally believed that speaker´s cues are partly conveyed by the second-order statistics. In this paper, we introduce a Bhattacharyya-based GMM-distance to measure the distance between two GMM distributions. Subsequently, the GMM-UBM mean interval (GUMI) concept is introduced to derive a GUMI kernel which can be used in conjunction with support vector machine (SVM) for speaker recognition. The GUMI kernel allows us to exploit the speaker´s information not only from the mean vectors of GMM but also from the covariance matrices. Moreover, by analyzing the Bhattacharyya-based GMM-distance measure, we extend the Bhattacharyya-based kernel by involving both the mean and covariance statistical dissimilarities. We demonstrate the effectiveness of the new kernel on the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) 2006 dataset.
Keywords :
Gaussian processes; covariance matrices; speaker recognition; support vector machines; Bhattacharyya-based GMM-distance; GMM-SVM kernel; GMM-UBM mean interval concept; GMM-supervector; GUMI concept; Gaussian mixture model; covariance matrices; mean vector; mixture weight; speaker recognition; spectral distribution; support vector machine; Gaussian mixture model (GMM); speaker recognition; supervector; support vector machine (SVM);
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2009.2032950
Filename :
5256328
Link To Document :
بازگشت