DocumentCode :
1691743
Title :
PLDA for speaker verification with utterances of arbitrary duration
Author :
Kenny, P. ; Stafylakis, Themos ; Ouellet, Pierre ; Alam, Mohammad Jahangir ; Dumouchel, P.
Author_Institution :
Centre de Rech. Inf. de Montreal (CRIM), Montreal, QC, Canada
fYear :
2013
Firstpage :
7649
Lastpage :
7653
Abstract :
The duration of speech segments has traditionally been controlled in the NIST speaker recognition evaluations so that researchers working in this framework have been relieved of the responsibility of dealing with the duration variability that arises in practical applications. The fixed dimensional i-vector representation of speech utterances is ideal for working under such controlled conditions and ignoring the fact that i-vectors extracted from short utterances are less reliable than those extracted from long utterances leads to a very simple formulation of the speaker recognition problem. However a more realistic approach seems to be needed to handle duration variability properly. In this paper, we show how to quantify the uncertainty associated with the i-vector extraction process and propagate it into a PLDA classifier. We evaluated this approach using test sets derived from the NIST 2010 core and extended core conditions by randomly truncating the utterances in the female, telephone speech trials so that the durations of all enrollment and test utterances lay in the range 3-60 seconds and we found that it led to substantial improvements in accuracy. Although the likelihood ratio computation for speaker verification is more computationally expensive than in the standard i-vector/PLDA classifier, it is still quite modest as it reduces to computing the probability density functions of two full covariance Gaussians (irrespective of the number of the number of utterances used to enroll a speaker).
Keywords :
Gaussian processes; covariance analysis; probability; signal classification; speaker recognition; speech processing; vectors; NIST 2010 core; NIST speaker recognition evaluations; PLDA classifier; arbitrary duration; core conditions; covariance Gaussians; duration variability; fixed dimensional i-vector representation; i-vector extraction process; likelihood ratio computation; probability density functions; speaker verification; speech segments; speech utterances; standard i-vector; telephone speech trials; Covariance matrices; Mathematical model; NIST; Speaker recognition; Speech; Uncertainty; PLDA; i-vectors; speaker recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639151
Filename :
6639151
Link To Document :
بازگشت