Title :
Compensation for inter-frame correlations in speaker diarization and recognition
Author :
Stafylakis, Themos ; Kenny, P. ; Gupta, V. ; Dumouchel, P.
Author_Institution :
Centre de Rech. Inf. de Montreal (CRIM), Montreal, QC, Canada
Abstract :
In this paper, we introduce the concept of the effective sample size to speaker diarization and recognition. We show why the use of the nominal sample size is inadequate to feature streams that exhibit inter-frame correlations and how it adversely affects inference. We then discuss the effective sample size, that is the sample size of a set of independent observations that carry the equivalent amount of statistical information about the model parameters and how the scaling factor can be estimated. Our experiments on speaker diarization show that once the effective sample size is adopted, state-of-the-art results can be attained even with single Gaussians and Hierarchical Clustering, and even when the scaling factor is set to be common for all utterances. On speaker recognition, encouraging results are reported on NIST-2010 using iVectors and PLDA.
Keywords :
speaker recognition; statistical analysis; hierarchical clustering; interframe correlations compensation; nominal sample size; single Gaussians; speaker diarization; speaker recognition; statistical information; Bayes methods; Clustering algorithms; Correlation; Hidden Markov models; Speaker recognition; Speech; Uncertainty; Bayesian methods; Clustering methods; Speaker recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639168