Title :
Audio-visual classification and detection of human manipulation actions
Author :
Pieropan, Alessandro ; Salvi, Govind ; Pauwels, Karl ; Kjellstrom, Hedvig
Author_Institution :
CVAP/CAS, KTH, Stockholm, Sweden
Abstract :
Humans are able to merge information from multiple perceptional modalities and formulate a coherent representation of the world. Our thesis is that robots need to do the same in order to operate robustly and autonomously in an unstructured environment. It has also been shown in several fields that multiple sources of information can complement each other, overcoming the limitations of a single perceptual modality. Hence, in this paper we introduce a data set of actions that includes both visual data (RGB-D video and 6DOF object pose estimation) and acoustic data. We also propose a method for recognizing and segmenting actions from continuous audio-visual data. The proposed method is employed for extensive evaluation of the descriptive power of the two modalities, and we discuss how they can be used jointly to infer a coherent interpretation of the recorded action.
Keywords :
image classification; image segmentation; motion estimation; object detection; pose estimation; robot vision; video signal processing; 6DOF object pose estimation; RGB-D video; acoustic data; action recognition; action segmentation; audio-visual human manipulation actions classification; human manipulation action detection; perceptional modalities; unstructured environment; visual data; Dairy products; Data models; Feature extraction; Hidden Markov models; Robots; Three-dimensional displays; Visualization;
Conference_Titel :
Intelligent Robots and Systems (IROS 2014), 2014 IEEE/RSJ International Conference on
Conference_Location :
Chicago, IL
DOI :
10.1109/IROS.2014.6942983