Multi-stream segmentation of meetings

Author

Dielmann, Alfred ; Renals, Steve

Author_Institution

Centre for Speech Technol. Res., Edinburgh Univ., UK

fYear

2004

fDate

29 Sept.-1 Oct. 2004

Firstpage

167

Lastpage

170

Abstract

This paper investigates the automatic segmentation of meetings into a sequence of group actions or phases. Our work is based on a corpus of multiparty meetings collected in a meeting room instrumented with video cameras, lapel microphones and a microphone array. We have extracted a set of feature streams, in this case extracted from the audio data, based on speaker turns, prosody and a transcript of what was spoken. We have related these signals to the higher level semantic categories via a multistream statistical model based on dynamic Bayesian networks (DBNs). We report on a set of experiments in which different DBN architectures are compared, together with the different feature streams. The resultant system has an action error rate of 9%.

Keywords

audio signal processing; belief networks; feature extraction; image segmentation; image sequences; microphone arrays; multimedia communication; statistical analysis; video cameras; video signal processing; audio extraction; dynamic Bayesian network; meeting segmentation; microphone array; multistream segmentation; video camera; Bayesian methods; Cameras; Data mining; Dictionaries; Error analysis; Instruments; Microphone arrays; Paper technology; Speech; Streaming media;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia Signal Processing, 2004 IEEE 6th Workshop on

Print_ISBN

0-7803-8578-0

Type

conf

DOI

10.1109/MMSP.2004.1436458

Filename

1436458