DocumentCode :
639552
Title :
3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
Author :
Chakraborty, Imon ; Hui Cheng ; Javed, Omar
Author_Institution :
SRI Int., Princeton, NJ, USA
fYear :
2013
fDate :
23-28 June 2013
Firstpage :
3406
Lastpage :
3413
Abstract :
We present a unified framework for detecting and classifying people interactions in unconstrained user generated images. Unlike previous approaches that directly map people/face locations in 2D image space into features for classification, we first estimate camera viewpoint and people positions in 3D space and then extract spatial configuration features from explicit 3D people positions. This approach has several advantages. First, it can accurately estimate relative distances and orientations between people in 3D. Second, it encodes spatial arrangements of people into a richer set of shape descriptors than afforded in 2D. Our 3D shape descriptors are invariant to camera pose variations often seen in web images and videos. The proposed approach also estimates camera pose and uses it to capture the intent of the photo. To achieve accurate 3D people layout estimation, we develop an algorithm that robustly fuses semantic constraints about human interpositions into a linear camera model. This enables our model to handle large variations in people size, heights (e.g. age) and poses. An accurate 3D layout also allows us to construct features informed by Proxemics that improves our semantic classification. To characterize the human interaction space, we introduce visual proxemes, a set of prototypical patterns that represent commonly occurring social interactions in events. We train a discriminative classifier that classifies 3D arrangements of people into visual proxemes and quantitatively evaluate the performance on a large, challenging dataset.
Keywords :
feature extraction; human computer interaction; image classification; image sensors; object recognition; pose estimation; shape recognition; 3D people layout estimation; 3D shape descriptors; 3D space; 3D visual proxemics; camera pose variations; camera viewpoint estimation; discriminative classifier; human interaction recognition; people interaction classification; people interaction detection; people position estimation; relative distance estimation; relative orientation estimation; semantic classification; social interactions; spatial configuration feature extraction; unconstrained user generated images; Cameras; Face; Robustness; Shape; Three-dimensional displays; Videos; Visualization; 3D people layout; RANSAC; Visual Proxemics; semantic constraints;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on
Conference_Location :
Portland, OR
ISSN :
1063-6919
Type :
conf
DOI :
10.1109/CVPR.2013.437
Filename :
6619281
Link To Document :
بازگشت