DocumentCode
138077
Title
Audio-visual classification and detection of human manipulation actions
Author
Pieropan, Alessandro ; Salvi, Govind ; Pauwels, Karl ; Kjellstrom, Hedvig
Author_Institution
CVAP/CAS, KTH, Stockholm, Sweden
fYear
2014
fDate
14-18 Sept. 2014
Firstpage
3045
Lastpage
3052
Abstract
Humans are able to merge information from multiple perceptional modalities and formulate a coherent representation of the world. Our thesis is that robots need to do the same in order to operate robustly and autonomously in an unstructured environment. It has also been shown in several fields that multiple sources of information can complement each other, overcoming the limitations of a single perceptual modality. Hence, in this paper we introduce a data set of actions that includes both visual data (RGB-D video and 6DOF object pose estimation) and acoustic data. We also propose a method for recognizing and segmenting actions from continuous audio-visual data. The proposed method is employed for extensive evaluation of the descriptive power of the two modalities, and we discuss how they can be used jointly to infer a coherent interpretation of the recorded action.
Keywords
image classification; image segmentation; motion estimation; object detection; pose estimation; robot vision; video signal processing; 6DOF object pose estimation; RGB-D video; acoustic data; action recognition; action segmentation; audio-visual human manipulation actions classification; human manipulation action detection; perceptional modalities; unstructured environment; visual data; Dairy products; Data models; Feature extraction; Hidden Markov models; Robots; Three-dimensional displays; Visualization;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Robots and Systems (IROS 2014), 2014 IEEE/RSJ International Conference on
Conference_Location
Chicago, IL
Type
conf
DOI
10.1109/IROS.2014.6942983
Filename
6942983
Link To Document