• DocumentCode
    1468596
  • Title

    Multimodal Speaker Diarization

  • Author

    Noulas, Athanasios ; Englebienne, Gwenn ; Kröse, Ben J A

  • Author_Institution
    Univ. of Amsterdam, Amsterdam, Netherlands
  • Volume
    34
  • Issue
    1
  • fYear
    2012
  • Firstpage
    79
  • Lastpage
    93
  • Abstract
    We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an audiovisual recording as multimodal entities that generate observations in the audio stream, the video stream, and the joint audiovisual space. The framework is very robust to different contexts, makes no assumptions about the location of the recording equipment, and does not require labeled training data as it acquires the model parameters using the Expectation Maximization (EM) algorithm. We apply the proposed model to two meeting videos and a news broadcast video, all of which come from publicly available data sets. The results acquired in speaker diarization are in favor of the proposed multimodal framework, which outperforms the single modality analysis results and improves over the state-of-the-art audio-based speaker diarization.
  • Keywords
    belief networks; expectation-maximisation algorithm; hidden Markov models; probability; speech recognition; video signal processing; audio modality; audio stream; audio-based speaker diarization; audiovisual recording; dynamic Bayesian network; expectation maximization algorithm; factorial hidden Markov model; meeting videos; multimodal speaker diarization; news broadcast video; probabilistic framework; video modality; video stream; Bayesian methods; Data models; Feature extraction; Hidden Markov models; Streaming media; Speaker diarization; audiovisual fusion.; dynamic Bayesian networks;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2011.47
  • Filename
    5728824