مرکز منطقه ای اطلاع رساني علوم و فناوري - Use of VTL-wise models in feature-mapping framework to achieve performance of multiple-background models in speaker verification

DocumentCode :

2174326

Title :

Use of VTL-wise models in feature-mapping framework to achieve performance of multiple-background models in speaker verification

Author :

Sarkar, A.K. ; Umesh, S.

Author_Institution :

Dept. of Electr. Eng., Indian Inst. of Technol. Madras, Chennai, India

fYear :

2011

fDate :

22-27 May 2011

Firstpage :

4552

Lastpage :

4555

Abstract :

Recently, Multiple Background Models (M-BMs) [1, 2] have been shown to be useful in speaker verification, where the M-BMs are formed based on different Vocal Tract Lengths (VTLs) among the population. The speaker models are adapted from the particular Background Model (BM) corresponding to their VTL. During test, log likelihood ratio of the test utterance is calculated between claimant model and the corresponding BM. In this paper, instead of using different BM for different speaker, we propose the use of single gender, channel and VTL independent UBM (root-UBM) using the concept of VTL dependent mapping function. The pro posed concept is inspired by Feature Mapping (FM) technique used in speaker verification to overcome channel variability. In our pro posed method, VTL specific gender independent Gaussian Mixture models (GMMs) are derived from the root-UBM using Maximum a posteriori (MAP) adaptation. The mapping relation is then learned between the root-UBM and the VTL-specific GMM. During training and testing phase, feature vectors are mapped into root-UBM using the best VTL specific model. Then speaker models are adapted from the root-UBM using mapped features. During test, the log likelihood ratio is calculated between target model and root-UBM. Therefore, unlike M-BM system, there is no need to switch to different BMs depending on the claimant. Another advantage of the proposed method is that other additional normalization/compensation techniques can be easily applied since it is in a single UBM frame-work. The experiments are performed on NIST 2004 SRE core condition, and we show that the performance of the proposed method is close to the M-BM system with and without score normalization.

Keywords :

Gaussian processes; speaker recognition; FM technique; GMM; Gaussian mixture model; M-BM; MAP; UBM; VTL; VTL-wise model; feature-mapping framework; log likelihood ratio; maximum a posteriori; multiple-background model; speaker verification; vocal tract length; Adaptation models; Computational modeling; Data models; Frequency modulation; NIST; Testing; Training; FM; GMM-UBM; Multiple BM; Speaker Verification; VTL-BM;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

Conference_Location :

Prague

ISSN :

1520-6149

Print_ISBN :

978-1-4577-0538-0

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2011.5947367

Filename :

5947367

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2174326