DocumentCode :
2174326
Title :
Use of VTL-wise models in feature-mapping framework to achieve performance of multiple-background models in speaker verification
Author :
Sarkar, A.K. ; Umesh, S.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol. Madras, Chennai, India
fYear :
2011
fDate :
22-27 May 2011
Firstpage :
4552
Lastpage :
4555
Abstract :
Recently, Multiple Background Models (M-BMs) [1, 2] have been shown to be useful in speaker verification, where the M-BMs are formed based on different Vocal Tract Lengths (VTLs) among the population. The speaker models are adapted from the particular Background Model (BM) corresponding to their VTL. During test, log likelihood ratio of the test utterance is calculated between claimant model and the corresponding BM. In this paper, instead of using different BM for different speaker, we propose the use of single gender, channel and VTL independent UBM (root-UBM) using the concept of VTL dependent mapping function. The pro posed concept is inspired by Feature Mapping (FM) technique used in speaker verification to overcome channel variability. In our pro posed method, VTL specific gender independent Gaussian Mixture models (GMMs) are derived from the root-UBM using Maximum a posteriori (MAP) adaptation. The mapping relation is then learned between the root-UBM and the VTL-specific GMM. During training and testing phase, feature vectors are mapped into root-UBM using the best VTL specific model. Then speaker models are adapted from the root-UBM using mapped features. During test, the log likelihood ratio is calculated between target model and root-UBM. Therefore, unlike M-BM system, there is no need to switch to different BMs depending on the claimant. Another advantage of the proposed method is that other additional normalization/compensation techniques can be easily applied since it is in a single UBM frame-work. The experiments are performed on NIST 2004 SRE core condition, and we show that the performance of the proposed method is close to the M-BM system with and without score normalization.
Keywords :
Gaussian processes; speaker recognition; FM technique; GMM; Gaussian mixture model; M-BM; MAP; UBM; VTL; VTL-wise model; feature-mapping framework; log likelihood ratio; maximum a posteriori; multiple-background model; speaker verification; vocal tract length; Adaptation models; Computational modeling; Data models; Frequency modulation; NIST; Testing; Training; FM; GMM-UBM; Multiple BM; Speaker Verification; VTL-BM;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
ISSN :
1520-6149
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2011.5947367
Filename :
5947367
Link To Document :
بازگشت