DocumentCode :
2960927
Title :
Hierarchical audio-visual cue integration framework for activity analysis in intelligent meeting rooms
Author :
Shivappa, Shankar T ; Trivedi, Mohan Manubhai ; Rao, Bhaskar
Author_Institution :
Univ. of California, La Jolla, CA, USA
fYear :
2009
fDate :
20-25 June 2009
Firstpage :
107
Lastpage :
114
Abstract :
Scene understanding in the context of a smart meeting room involves the extraction of various kinds of cues at different levels of semantic abstraction. Specifically, human activity in a scene is usually monitored using arrays of audio and visual sensors. Tasks such as person localization and tracking, speaker ID, focus of attention detection, speech recognition and affective state recognition are among them. In this paper we demonstrate a system that extracts such information by synergistically combining the information from the various tasks to support each other. We exploit the fact that the output of one kind of human activity analysis task contains valuable information for another such block and by interconnecting them, a robust system results. We demonstrate this in a smart meeting room context equipped with 3 cameras and 16 microphones. The system performs the tasks of person tracking, head pose estimation, beamforming, speaker ID and speech recognition using audio and visual cues. The novelty lies in putting together the tasks such that they can provide relevant information to one another. We evaluate the performance of our system and present results for tasks such as keyword spotting and tracking re-identification on real-world meeting scenes collected in our audio-visual testbed.
Keywords :
business data processing; image sensors; speech recognition; activity analysis; affective state recognition; attention detection; audio sensor arrays; hierarchical audio-visual cue integration; intelligent meeting rooms; keyword spotting; semantic abstraction; smart meeting room; speaker ID; speech recognition; visual sensor arrays; Data mining; Humans; Information analysis; Intelligent sensors; Layout; Monitoring; Robustness; Sensor arrays; Smart cameras; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009. IEEE Computer Society Conference on
Conference_Location :
Miami, FL
ISSN :
2160-7508
Print_ISBN :
978-1-4244-3994-2
Type :
conf
DOI :
10.1109/CVPRW.2009.5204224
Filename :
5204224
Link To Document :
بازگشت