DocumentCode :
3528315
Title :
Generative model-based speaker clustering via mixture of von Mises-Fisher distributions
Author :
Tang, Hao ; Chu, Stephen M. ; Huang, Thomas S.
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana-Champaign, IL
fYear :
2009
fDate :
19-24 April 2009
Firstpage :
4101
Lastpage :
4104
Abstract :
This paper proposes a generative model-based speaker clustering algorithm in the maximum a posteriori adapted Gaussian mixture model (GMM) mean supervector space. The algorithm can be viewed as an extension of the standard expectation maximization algorithm for fitting a mixture model to the data, which iterates between two steps - a sample re-assignment step (E-step) and a model re-estimation step (M-step) - until it converges. The directional scattering patterns of GMM mean supervectors suggest that we employ a mixture of von Mises-Fisher distributions in the model re-estimation step. In the sample re-assignment step, four sample-to-mixture assignment strategies, namely soft, hard, stochastic, and deterministic annealing assignments, are used. Our experiments on the GALE Mandarin dataset show that the use of a mixture of von Mises-Fisher distributions as the underlying model yields significantly higher speaker clustering accuracies than the use of a mixture of Gaussian distributions. It is further shown that deterministic annealing assignment outperforms soft assignment, that soft assignment is comparable to stochastic assignment, and that both soft and stochastic assignments outperform hard assignment.
Keywords :
Gaussian distribution; natural language processing; speaker recognition; E-step; GALE Mandarin dataset; Gaussian distributions; Gaussian mixture model; M-step; generative model-based speaker clustering; mean supervector space; model reestimation step; reassignment step; von Mises-Fisher distributions; Acoustic scattering; Annealing; Automatic speech recognition; Cepstral analysis; Clustering algorithms; Gaussian distribution; Loudspeakers; Speaker recognition; Speech recognition; Stochastic processes; EM algorithm; GMM mean supervectors; Model-based clustering; mixture of von Mises-Fisher distributions;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
ISSN :
1520-6149
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2009.4960530
Filename :
4960530
Link To Document :
بازگشت