DocumentCode
2173446
Title
Speaker diarization of meetings based on speaker role n-gram models
Author
Valente, Fabio ; Vijayasenan, Deepu ; Motlicek, Petr
Author_Institution
Idiap Res. Inst., Martigny, Switzerland
fYear
2011
fDate
22-27 May 2011
Firstpage
4416
Lastpage
4419
Abstract
Speaker diarization of meeting recordings is generally based on acoustic information ignoring that meetings are instances of conversations. Several recent works have shown that the sequence of speakers in a conversation and their roles are related and statistically predictable. This paper proposes the use of speaker roles n-gram model to capture the conversation patterns probability and investigates its use as prior information into a state-of-the-art diarization system. Experiments are run on the AMI corpus annotated in terms of roles. The proposed technique reduces the diarization speaker error by 19% when the roles are known and by 17% when they are estimated. Furthermore the paper investigates how the n-gram models generalize to different settings like those from the Rich Transcription campaigns. Experiments on 17 meetings reveal that the speaker error can be reduced by 12% also in this case thus the n-gram can generalize across corpora.
Keywords
speaker recognition; rich transcription campaigns; speaker diarization; speaker error; speaker role n-gram models; state-of-the-art diarization system; Data models; Decoding; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Viterbi algorithm; Speaker Roles; Speaker diarization; Viterbi decoding; meeting recordings; multi-party conversations;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location
Prague
ISSN
1520-6149
Print_ISBN
978-1-4577-0538-0
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2011.5947333
Filename
5947333
Link To Document