• DocumentCode
    2590744
  • Title

    Audio-Visual Speaker Localization Using Graphical Models

  • Author

    Kushal, Akash ; Rahurkar, Mandar ; Fei-Fei, Li ; Ponce, Jean ; Huang, Thomas

  • Author_Institution
    Dept. of Comput. Sci., Illinois Univ., Urbana, IL
  • Volume
    1
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    291
  • Lastpage
    294
  • Abstract
    In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive framework for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal correlations that exist for a moving object between adjacent frames to account for the cases where having both modalities might still not be enough, e.g., when the person being tracked is occluded and not speaking. Improvement in tracking results is shown at each step and compared with manually annotated ground truth
  • Keywords
    computer vision; object detection; sensor fusion; speaker recognition; target tracking; audio-visual speaker localization; graphical models; moving object; occlusion; person tracking; temporal correlation; Cameras; Computer science; Covariance matrix; Focusing; Fusion power generation; Gaussian noise; Graphical models; Microphones; Robustness; Sampling methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2006. ICPR 2006. 18th International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-2521-0
  • Type

    conf

  • DOI
    10.1109/ICPR.2006.284
  • Filename
    1698890