DocumentCode :
1299800
Title :
Audiovisual Information Fusion in Human–Computer Interfaces and Intelligent Environments: A Survey
Author :
Shivappa, Shankar T. ; Trivedi, Mohan Manubhai ; Rao, Bhaskar D.
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of California at San Diego, La Jolla, CA, USA
Volume :
98
Issue :
10
fYear :
2010
Firstpage :
1692
Lastpage :
1715
Abstract :
Microphones and cameras have been extensively used to observe and detect human activity and to facilitate natural modes of interaction between humans and intelligent systems. Human brain processes the audio and video modalities, extracting complementary and robust information from them. Intelligent systems with audiovisual sensors should be capable of achieving similar goals. The audiovisual information fusion strategy is a key component in designing such systems. In this paper, we exclusively survey the fusion techniques used in various audiovisual information fusion tasks. The fusion strategy used tends to depend mainly on the model, probabilistic or otherwise, used in the particular task to process sensory information to obtain higher level semantic information. The models themselves are task oriented. In this paper, we describe the fusion strategies and the corresponding models used in audiovisual tasks such as speech recognition, tracking, biometrics, affective state recognition, and meeting scene analysis. We also review the challenges and existing solutions and also unresolved or partially resolved issues in these fields. Specifically, we discuss established and upcoming work in hierarchical fusion strategies and cross-modal learning techniques, identifying these as critical areas of research in the future development of intelligent systems.
Keywords :
audio-visual systems; human computer interaction; image sensors; microphones; audiovisual information fusion; audiovisual sensor; cameras; human computer interface; intelligent system; microphone; semantic information; Audio-visual systems; Human computer interaction; Humans; Intelligent systems; Microphones; Sensors; Speech recognition; Audiovisual fusion; dynamic Bayesian networks (DBNs); hidden Markov models; human activity analysis; human activity modeling; information fusion; machine learning; multimodal systems;
fLanguage :
English
Journal_Title :
Proceedings of the IEEE
Publisher :
ieee
ISSN :
0018-9219
Type :
jour
DOI :
10.1109/JPROC.2010.2057231
Filename :
5551170
Link To Document :
بازگشت