DocumentCode :
1836020
Title :
Vocal Tract Length Normalization factor based speaker-cluster UBM for speaker verification
Author :
Sarkar, A.K. ; Rath, S.P. ; Umesh, S.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol. Madras, Chennai, India
fYear :
2010
fDate :
29-31 Jan. 2010
Firstpage :
1
Lastpage :
5
Abstract :
In speaker verification task requires some sort of background model for the system to make decision. Most of the cases, a speaker independent large Gaussian Universal Background Model (GMM-UBM) is used. In this paper, we propose to use a Speaker Cluster-wise UBM (SC-UBM) for a group of target speakers. In this method, the target speakers are clustered into group based on their similarity in Vocal Tract Length Normalization (VTLN) parameter. The VTLN parameter depends on the physiological structure of human speech production system. Hence, the group of speakers with same VTLN factor represent a speaker with unique characteristic. The SC-UBMs are derived from GMM-UBM with Maximum Likelihood Linear Regression (MLLR) by pooling data from the specific group of target speakers. The speaker dependent models are then adapted from their respective SC-UBM using Maximum a Posteriori (MAP) method. During verification, the log likelihood ratio for the claimant is calculated with respect to the corresponding group specific UBM. The comparative study are performed on NIST 2004 SRE in core condition. The SC-UBM system reduced equal error rate (EER) by 9% over the GMM-UBM system.
Keywords :
Gaussian processes; error statistics; maximum likelihood estimation; regression analysis; speaker recognition; GMM-UBM system; MAP method; VTLN parameter; equal error rate; human speech production system; log likelihood ratio; maximum a posteriori method; maximum likelihood linear regression; physiological structure; speaker independent large Gaussian universal background model; speaker verification; speaker-cluster UBM; vocal tract length normalization factor; Automatic speech recognition; Error analysis; Humans; Loudspeakers; Maximum likelihood linear regression; NIST; Production systems; Spatial databases; Speaker recognition; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications (NCC), 2010 National Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4244-6383-1
Type :
conf
DOI :
10.1109/NCC.2010.5430207
Filename :
5430207
Link To Document :
بازگشت