DocumentCode :
179006
Title :
Construction of discriminative Kernels from known and unknown non-targets for PLDA-SVM scoring
Author :
Wei Rao ; Man-Wai Mak
Author_Institution :
Dept. of Electron. & Inf. Eng., Hong Kong Polytech. Univ., Hong Kong, China
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
4012
Lastpage :
4016
Abstract :
Conventional PLDA scoring in i-vector speaker verification involves the i-vectors of target speakers and claimants only. We have previously demonstrated that better performance can be achieved by incorporating the information of background speakers in the scoring process via speaker-dependent SVMs. This is achieved by defining a PLDA score space with dimension equal to the number of training i-vectors for each target speaker. The new protocol in NIST 2012 SRE permits systems to use the information of other target-speakers (called known non-targets) in each verification trial. In this paper, we exploit this new protocol to enhance the performance of PLDA-SVM scoring by using the score vectors of both known and unknown non-targets as the impostor class data to train the speaker-dependent SVMs. Because some target speakers have one enrollment utterance only, which results in severe imbalance in the speaker- and impostor-class data for SVM training. This paper shows that if the enrollment utterance is sufficiently long, a number of target-speaker i-vectors can be generated by an utterance partitioning and resampling technique, resulting in much better scoring SVMs. Results on NIST 2012 SRE demonstrate the advantages of pooling the known and unknown non-targets for training the SVMs and that the resampling techniques can help the SVM training algorithm to find better decision boundaries for those speakers with only a small number of enrollment utterances.
Keywords :
signal sampling; speaker recognition; support vector machines; vectors; NIST 2012 SRE; PLDA-SVM scoring; SVM training; background speakers; discriminative kernel construction; enrollment utterances; i-vector speaker verification; impostor-class data; performance enhancement; protocol; resampling technique; score vectors; speaker-class data; speaker-dependent SVMs; target-speaker i-vectors; utterance partitioning; Kernel; NIST; Noise; Speech; Support vector machines; Training; Vectors; I-vectors; NIST 2012 SRE; empirical kernel maps; likelihood ratio kernels; probabilistic linear discriminant analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854355
Filename :
6854355
Link To Document :
بازگشت