مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving speaker diarization using social role information

DocumentCode :

177427

Title :

Improving speaker diarization using social role information

Author :

Sapru, Ashtosh ; Yella, Sree Harsha ; Bourlard, Herve

Author_Institution :

Idiap Res. Inst., Martigny, Switzerland

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

101

Lastpage :

105

Abstract :

Speaker diarization systems for meetings commonly model acoustic and spatial information, ignoring that meetings are instances of human interactions. Recent studies have shown that social roles influence the interaction patterns of speakers. This paper proposes a novel method to integrate social roles information in the speaker diarization framework. First, we modify the minimum duration constraint in baseline diarization system by using role information to model the expected duration of speaker´s turn. Furthermore, we also propose a social role n-gram model as prior information on speaker interaction patterns. The proposed method is integrated in the state-of-the-art diarization system to reduce the speaker error. Experiments are performed on AMI corpus which is annotated in terms of social roles. The proposed method reduces the speaker error by 16% relative to baseline HMM-GMM system. Furthermore, the paper also investigates the performance of the proposed method on other meeting scenarios like those from NIST Rich Transcription campaigns. Experiments on Rich Transcription meetings reveal that speaker error can be reduced by 13% relative to the baseline system, thus demonstrating the potential of the proposed method.

Keywords :

Gaussian processes; hidden Markov models; mixture models; speech processing; AMI corpus; Gaussian mixture modeling; NIST Rich Transcription campaigns; Rich Transcription meetings; acoustic information; baseline HMM-GMM system; baseline diarization system; hidden Markov model; minimum duration constraint; social role information; social role n-gram model; spatial information; speaker diarization; speaker interaction patterns; speaker turn expected duration; Acoustics; Feature extraction; Hidden Markov models; Histograms; NIST; Speech; Speech processing; HMM-GMM; Social Roles; Speaker diarization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6853566

Filename :

6853566

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=177427