مرکز منطقه ای اطلاع رساني علوم و فناوري - Variational Bayesian PLDA for speaker diarization in the MGB challenge

DocumentCode :

3744910

Title :

Variational Bayesian PLDA for speaker diarization in the MGB challenge

Author :

Jes?s Villalba;Alfonso Ortega;Antonio Miguel;Eduardo Lleida

Author_Institution :

ViVoLab, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain

fYear :

2015

Firstpage :

667

Lastpage :

674

Abstract :

This paper describes the ViVoLab speaker diarization system for the Multi-Genre Broadcast (MGB) Challenge at ASRU2015. The challenge data consisted of BBC TV programmes of different genres. Diarization followed a longitudinal setup, i.e., the speakers of the current episode had to be linked to the speakers in previous episodes of the same show. We propose a system based on the i-vector paradigm. After an initial segmentation step, we compute an i-vector per speech segment. Then, a generative model based on Bayesian PLDA clusters the speakers. In this model, the speaker labels are latent variables that we optimize by variational Bayes iterations. The number of speakers in each episode was decided by maximizing the variational lower bound. The system includes several phases of segment-merging and re-clustering. We re-compute i-vectors after each merging step, which reduces the i-vector uncertainty. This approach attained a DER around 30% in the development set.

Keywords :

"Bayes methods","Computational modeling","Speech","Mathematical model","TV","Merging","Annealing"

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on

Type :

conf

DOI :

10.1109/ASRU.2015.7404860

Filename :

7404860

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3744910