Title :
Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments
Author :
Viktor Rozgic;Kyu Jeong Han;Panayiotis G. Georgiou;Shrikanth Narayanan
Author_Institution :
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA
Abstract :
We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.
Keywords :
"Microphone arrays","Hidden Markov models","Phased arrays","Loudspeakers","Cameras","Signal processing algorithms","Speech analysis","Viterbi algorithm","Array signal processing","Decoding"
Conference_Titel :
Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on
DOI :
10.1109/ISM.2008.103