• DocumentCode
    2960927
  • Title

    Hierarchical audio-visual cue integration framework for activity analysis in intelligent meeting rooms

  • Author

    Shivappa, Shankar T ; Trivedi, Mohan Manubhai ; Rao, Bhaskar

  • Author_Institution
    Univ. of California, La Jolla, CA, USA
  • fYear
    2009
  • fDate
    20-25 June 2009
  • Firstpage
    107
  • Lastpage
    114
  • Abstract
    Scene understanding in the context of a smart meeting room involves the extraction of various kinds of cues at different levels of semantic abstraction. Specifically, human activity in a scene is usually monitored using arrays of audio and visual sensors. Tasks such as person localization and tracking, speaker ID, focus of attention detection, speech recognition and affective state recognition are among them. In this paper we demonstrate a system that extracts such information by synergistically combining the information from the various tasks to support each other. We exploit the fact that the output of one kind of human activity analysis task contains valuable information for another such block and by interconnecting them, a robust system results. We demonstrate this in a smart meeting room context equipped with 3 cameras and 16 microphones. The system performs the tasks of person tracking, head pose estimation, beamforming, speaker ID and speech recognition using audio and visual cues. The novelty lies in putting together the tasks such that they can provide relevant information to one another. We evaluate the performance of our system and present results for tasks such as keyword spotting and tracking re-identification on real-world meeting scenes collected in our audio-visual testbed.
  • Keywords
    business data processing; image sensors; speech recognition; activity analysis; affective state recognition; attention detection; audio sensor arrays; hierarchical audio-visual cue integration; intelligent meeting rooms; keyword spotting; semantic abstraction; smart meeting room; speaker ID; speech recognition; visual sensor arrays; Data mining; Humans; Information analysis; Intelligent sensors; Layout; Monitoring; Robustness; Sensor arrays; Smart cameras; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009. IEEE Computer Society Conference on
  • Conference_Location
    Miami, FL
  • ISSN
    2160-7508
  • Print_ISBN
    978-1-4244-3994-2
  • Type

    conf

  • DOI
    10.1109/CVPRW.2009.5204224
  • Filename
    5204224