• DocumentCode
    1017579
  • Title

    Boosting-Based Multimodal Speaker Detection for Distributed Meeting Videos

  • Author

    Zhang, Cha ; Yin, Pei ; Rui, Yong ; Cutler, Ross ; Viola, Paul ; Sun, Xinding ; Pinto, Nelson ; Zhang, Zhengyou

  • Author_Institution
    Microsoft Res., Redmond, WA
  • Volume
    10
  • Issue
    8
  • fYear
    2008
  • Firstpage
    1541
  • Lastpage
    1552
  • Abstract
    Identifying the active speaker in a video of a distributed meeting can be very helpful for remote participants to understand the dynamics of the meeting. A straightforward application of such analysis is to stream a high resolution video of the speaker to the remote participants. In this paper, we present the challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and propose a novel boosting-based multimodal speaker detection (BMSD) algorithm. Instead of separately performing sound source localization (SSL) and multiperson detection (MPD) and subsequently fusing their individual results, the proposed algorithm fuses audio and visual information at feature level by using boosting to select features from a combined pool of both audio and visual features simultaneously. The result is a very accurate speaker detector with extremely high efficiency. In experiments that includes hundreds of real-world meetings, the proposed BMSD algorithm reduces the error rate of SSL-only approach by 24.6%, and the SSL and MPD fusion approach by 20.9%. To the best of our knowledge, this is the first real-time multimodal speaker detection algorithm that is deployed in commercial products.
  • Keywords
    speaker recognition; teleconferencing; Microsoft RoundTable; boosting-based multimodal speaker detection; distributed meeting device; distributed meeting videos; multiperson detection; sound source localization; Algorithm design and analysis; Boosting; Cameras; Detectors; Face detection; Loudspeakers; Microphone arrays; Streaming media; Sun; Videos; Audiovisual fusion; boosting; speaker detection;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2008.2007344
  • Filename
    4694847