مرکز منطقه ای اطلاع رساني علوم و فناوري - A multimodal approach to initialisation for top-down speaker diarization of television shows

DocumentCode :

705196

Title :

A multimodal approach to initialisation for top-down speaker diarization of television shows

Author :

Bozonnet, Simon ; Vallet, Felicien ; Evans, Nicholas ; Essid, Slim ; Richard, Gael ; Carrive, Jean

Author_Institution :

EURECOM, Sophia Antipolis, France

fYear :

2010

fDate :

23-27 Aug. 2010

Firstpage :

581

Lastpage :

585

Abstract :

This paper presents a new multimodal approach to speaker diarization of TV show data. We hypothesize that the intraspeaker variation in visual information might be less than that in the corresponding acoustic information and therefore might be better suited to the task of speaker model initialisation. This is an acknowledged weakness of the computationally efficient top-down approach to speaker diarization that is used here. Experimental results show that a recently proposed approach to purification and the new multimodal approach to initialisation together deliver 22% and 17% relative improvements in diarization performance over the baseline system on independent development and evaluation datasets respectively.

Keywords :

speaker recognition; acoustic information; multimodal approach; speaker model initialisation; television shows; top-down speaker diarization; Adaptation models; Density estimation robust algorithm; Hidden Markov models; NIST; Speech; TV; Visualization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal Processing Conference, 2010 18th European

Conference_Location :

Aalborg

ISSN :

2219-5491

Type :

conf

Filename :

7096469

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=705196