• DocumentCode
    3745944
  • Title

    Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model

  • Author

    Israel D. Gebru;Sil?ye ;Georgios Evangelidis;Radu Horaud

  • Author_Institution
    INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France
  • fYear
    2015
  • Firstpage
    702
  • Lastpage
    708
  • Abstract
    Any multi-party conversation system benefits from speaker diarization, that is, the assignment of speech signals among the participants. We here cast the diarization problem into a tracking formulation whereby the active speaker is detected and tracked over time. A probabilistic tracker exploits the on-image (spatial) coincidence of visual and auditory observations and infers a single latent variable which represents the identity of the active speaker. Both visual and auditory observations are explained by a recently proposed weighted-data mixture model, while several options for the speaking turns dynamics are fulfilled by a multi-case transition model. The modules that translate raw audio and visual data into on-image observations are also described in detail. The performance of the proposed tracker is tested on challenging data-sets that are available from recent contributions which are used as baselines for comparison. Mixture model, while several options for the speaking turns dynamics are fulfilled by a multi-case transition model. The modules that translate raw audio and visual data into on-image observations are also described in detail. The performance of the proposed tracker is tested on challenging data-sets that are available from recent contributions which are used as baselines for comparison.
  • Keywords
    "Visualization","Cameras","Speech","Mathematical model","Acoustics","Human computer interaction","Natural language processing"
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision Workshop (ICCVW), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/ICCVW.2015.96
  • Filename
    7406445