Recognizing manipulation actions in arts and crafts shows using domain-specific visual and textual cues

Author

Sapp, Brian ; Chaudhry, Rizwan ; Xiaodong Yu ; Singh, Gagan ; Perera, Indika ; Ferraro, F. ; Tzoukermann, E. ; Kosecka, Jana ; Neumann, Jorg

Author_Institution

Univ. of Pennsylvania, Philadelphia, PA, USA

fYear

2011

fDate

6-13 Nov. 2011

Firstpage

1554

Lastpage

1561

Abstract

We present an approach for automatic annotation of commercial videos from an arts-and-crafts domain with the aid of textual descriptions. The main focus is on recognizing both manipulation actions (e.g. cut, draw, glue) and the tools that are used to perform these actions (e.g. markers, brushes, glue bottle). We demonstrate how multiple visual cues such as motion descriptors, object presence, and hand poses can be combined with the help of contextual priors that are automatically extracted from associated transcripts or online instructions. Using these diverse features and linguistic information we propose several increasingly complex computational models for recognizing elementary manipulation actions and composite activities, as well as their temporal order. The approach is evaluated on a novel dataset of comprised of 27 episodes of PBS Sprout TV, each containing on average 8 manipulation actions.

Keywords

art; feature extraction; image motion analysis; video retrieval; video signal processing; art; automatic annotation; commercial video; craft; domain-specific textual cue; domain-specific visual cue; hand pose; linguistic information; manipulation action; motion descriptor; multiple visual cues; object presence; Computational modeling; Educational institutions; Feature extraction; Humans; Internet; USA Councils; Videos;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on

Conference_Location

Barcelona

Print_ISBN

978-1-4673-0062-9

Type

conf

DOI

10.1109/ICCVW.2011.6130435

Filename

6130435