Active Learning of an Action Detector from Untrimmed Videos

Author

Bandla, Sunil ; Grauman, Kristen

Author_Institution

Univ. of Texas at Austin, Austin, TX, USA

fYear

2013

fDate

1-8 Dec. 2013

Firstpage

1833

Lastpage

1840

Abstract

Collecting and annotating videos of realistic human actions is tedious, yet critical for training action recognition systems. We propose a method to actively request the most useful video annotations among a large set of unlabeled videos. Predicting the utility of annotating unlabeled video is not trivial, since any given clip may contain multiple actions of interest, and it need not be trimmed to temporal regions of interest. To deal with this problem, we propose a detection-based active learner to train action category models. We develop a voting-based framework to localize likely intervals of interest in an unlabeled clip, and use them to estimate the total reduction in uncertainty that annotating that clip would yield. On three datasets, we show our approach can learn accurate action detectors more efficiently than alternative active learning strategies that fail to accommodate the "untrimmed" nature of real video data.

Keywords

gesture recognition; learning (artificial intelligence); video signal processing; action category model; action detector; action recognition systems; active learning strategy; detection-based active learner; human actions; interval-of-interest localization; temporal regions; unlabeled clip; unlabeled video annotation; untrimmed videos; video collection; voting-based framework; Detectors; Entropy; Three-dimensional displays; Training; Uncertainty; Videos; Visualization; action detection; action localization; active learning; entropy; hollywood; hough; human annotation; vatic; voting-based;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision (ICCV), 2013 IEEE International Conference on

Conference_Location

Sydney, NSW

ISSN

1550-5499

Type

conf

DOI

10.1109/ICCV.2013.230

Filename

6751338