مرکز منطقه ای اطلاع رساني علوم و فناوري - Vocal Tract Length Normalization factor based speaker-cluster UBM for speaker verification

DocumentCode :

1836020

Title :

Vocal Tract Length Normalization factor based speaker-cluster UBM for speaker verification

Author :

Sarkar, A.K. ; Rath, S.P. ; Umesh, S.

Author_Institution :

Dept. of Electr. Eng., Indian Inst. of Technol. Madras, Chennai, India

fYear :

2010

fDate :

29-31 Jan. 2010

Firstpage :

Lastpage :

Abstract :

In speaker verification task requires some sort of background model for the system to make decision. Most of the cases, a speaker independent large Gaussian Universal Background Model (GMM-UBM) is used. In this paper, we propose to use a Speaker Cluster-wise UBM (SC-UBM) for a group of target speakers. In this method, the target speakers are clustered into group based on their similarity in Vocal Tract Length Normalization (VTLN) parameter. The VTLN parameter depends on the physiological structure of human speech production system. Hence, the group of speakers with same VTLN factor represent a speaker with unique characteristic. The SC-UBMs are derived from GMM-UBM with Maximum Likelihood Linear Regression (MLLR) by pooling data from the specific group of target speakers. The speaker dependent models are then adapted from their respective SC-UBM using Maximum a Posteriori (MAP) method. During verification, the log likelihood ratio for the claimant is calculated with respect to the corresponding group specific UBM. The comparative study are performed on NIST 2004 SRE in core condition. The SC-UBM system reduced equal error rate (EER) by 9% over the GMM-UBM system.

Keywords :

Gaussian processes; error statistics; maximum likelihood estimation; regression analysis; speaker recognition; GMM-UBM system; MAP method; VTLN parameter; equal error rate; human speech production system; log likelihood ratio; maximum a posteriori method; maximum likelihood linear regression; physiological structure; speaker independent large Gaussian universal background model; speaker verification; speaker-cluster UBM; vocal tract length normalization factor; Automatic speech recognition; Error analysis; Humans; Loudspeakers; Maximum likelihood linear regression; NIST; Production systems; Spatial databases; Speaker recognition; Training data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communications (NCC), 2010 National Conference on

Conference_Location :

Chennai

Print_ISBN :

978-1-4244-6383-1

Type :

conf

DOI :

10.1109/NCC.2010.5430207

Filename :

5430207

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1836020