Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments

Author

Viktor Rozgic;Kyu Jeong Han;Panayiotis G. Georgiou;Shrikanth Narayanan

Author_Institution

Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA

fYear

2008

Firstpage

679

Lastpage

684

Abstract

We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.

Keywords

"Microphone arrays","Hidden Markov models","Phased arrays","Loudspeakers","Cameras","Signal processing algorithms","Speech analysis","Viterbi algorithm","Array signal processing","Decoding"

Publisher

ieee

Conference_Titel

Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on

Type

conf

DOI

10.1109/ISM.2008.103

Filename

4741247