مرکز منطقه ای اطلاع رساني علوم و فناوري - From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding

DocumentCode :

3426180

Title :

From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding

Author :

Weiyu Zhang ; Menglong Zhu ; Derpanis, Konstantinos G.

Author_Institution :

GRASP Lab., Univ. of Pennsylvania, Philadelphia, PA, USA

fYear :

2013

fDate :

1-8 Dec. 2013

Firstpage :

2248

Lastpage :

2255

Abstract :

This paper presents a novel approach for analyzing human actions in non-scripted, unconstrained video settings based on volumetric, x-y-t, patch classifiers, termed actemes. Unlike previous action-related work, the discovery of patch classifiers is posed as a strongly-supervised process. Specifically, key point labels (e.g., position) across space time are used in a data-driven training process to discover patches that are highly clustered in the space time key point configuration space. To support this process, a new human action dataset consisting of challenging consumer videos is introduced, where notably the action label, the 2D position of a set of key points and their visibilities are provided for each video frame. On a novel input video, each acteme is used in a sliding volume scheme to yield a set of sparse, non-overlapping detections. These detections provide the intermediate substrate for segmenting out the action. For action classification, the proposed representation shows significant improvement over state-of-the-art low-level features, while providing spatiotemporal localization as additional output, which sheds further light into detailed action understanding.

Keywords :

gesture recognition; image classification; image representation; video signal processing; actemes; action classification; action label; action understanding; action-related work; consumer videos; data-driven training process; human action dataset; human actions analyzing; patch classifiers discovery; sliding volume scheme; space time; spacetime key-point configuration space; sparse nonoverlapping detections; spatiotemporal localization; supervised representation; unconstrained video settings; Cameras; Context; Semantics; Spatiotemporal phenomena; Training; Trajectory; Visualization; action classification; action detection;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Vision (ICCV), 2013 IEEE International Conference on

Conference_Location :

Sydney, NSW

ISSN :

1550-5499

Type :

conf

DOI :

10.1109/ICCV.2013.280

Filename :

6751390

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3426180