Title :
Variational Bayesian PLDA for speaker diarization in the MGB challenge
Author :
Jes?s Villalba;Alfonso Ortega;Antonio Miguel;Eduardo Lleida
Author_Institution :
ViVoLab, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain
Abstract :
This paper describes the ViVoLab speaker diarization system for the Multi-Genre Broadcast (MGB) Challenge at ASRU2015. The challenge data consisted of BBC TV programmes of different genres. Diarization followed a longitudinal setup, i.e., the speakers of the current episode had to be linked to the speakers in previous episodes of the same show. We propose a system based on the i-vector paradigm. After an initial segmentation step, we compute an i-vector per speech segment. Then, a generative model based on Bayesian PLDA clusters the speakers. In this model, the speaker labels are latent variables that we optimize by variational Bayes iterations. The number of speakers in each episode was decided by maximizing the variational lower bound. The system includes several phases of segment-merging and re-clustering. We re-compute i-vectors after each merging step, which reduces the i-vector uncertainty. This approach attained a DER around 30% in the development set.
Keywords :
"Bayes methods","Computational modeling","Speech","Mathematical model","TV","Merging","Annealing"
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
DOI :
10.1109/ASRU.2015.7404860