DocumentCode
3748656
Title
Flowing ConvNets for Human Pose Estimation in Videos
Author
Tomas Pfister;James Charles;Andrew Zisserman
Author_Institution
Dept. of Eng. Sci., Univ. of Oxford, Oxford, UK
fYear
2015
Firstpage
1913
Lastpage
1921
Abstract
The objective of this work is human pose estimation in videos, where multiple frames are available. We investigate a ConvNet architecture that is able to benefit from temporal context by combining information across the multiple frames using optical flow. To this end we propose a network architecture with the following novelties: (i) a deeper network than previously investigated for regressing heatmaps, (ii) spatial fusion layers that learn an implicit spatial model, (iii) optical flow is used to align heatmap predictions from neighbouring frames, and (iv) a final parametric pooling layer which learns to combine the aligned heatmaps into a pooled confidence map. We show that this architecture outperforms a number of others, including one that uses optical flow solely at the input layers, one that regresses joint coordinates directly, and one that predicts heatmaps without spatial fusion. The new architecture outperforms the state of the art by a large margin on three video pose estimation datasets, including the very challenging Poses in the Wild dataset, and outperforms other deep methods that don´t use a graphical model on the single-image FLIC benchmark (and also [5, 35] in the high precision region).
Keywords
"Heating","Optical imaging","Videos","Training","Adaptive optics","Computer architecture"
Publisher
ieee
Conference_Titel
Computer Vision (ICCV), 2015 IEEE International Conference on
Electronic_ISBN
2380-7504
Type
conf
DOI
10.1109/ICCV.2015.222
Filename
7410579
Link To Document