• DocumentCode
    3630210
  • Title

    Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments

  • Author

    Viktor Rozgic;Kyu Jeong Han;Panayiotis G. Georgiou;Shrikanth Narayanan

  • Author_Institution
    Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA
  • fYear
    2008
  • Firstpage
    679
  • Lastpage
    684
  • Abstract
    We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.
  • Keywords
    "Microphone arrays","Hidden Markov models","Phased arrays","Loudspeakers","Cameras","Signal processing algorithms","Speech analysis","Viterbi algorithm","Array signal processing","Decoding"
  • Publisher
    ieee
  • Conference_Titel
    Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on
  • Type

    conf

  • DOI
    10.1109/ISM.2008.103
  • Filename
    4741247