DocumentCode
3630210
Title
Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments
Author
Viktor Rozgic;Kyu Jeong Han;Panayiotis G. Georgiou;Shrikanth Narayanan
Author_Institution
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA
fYear
2008
Firstpage
679
Lastpage
684
Abstract
We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.
Keywords
"Microphone arrays","Hidden Markov models","Phased arrays","Loudspeakers","Cameras","Signal processing algorithms","Speech analysis","Viterbi algorithm","Array signal processing","Decoding"
Publisher
ieee
Conference_Titel
Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on
Type
conf
DOI
10.1109/ISM.2008.103
Filename
4741247
Link To Document