DocumentCode
2783049
Title
3D Human Motion Analysis in Monocular Video Techniques and Challenges
Author
Sminchisescu, Cristian
Author_Institution
TTI-C, USA
fYear
2006
fDate
Nov. 2006
Firstpage
76
Lastpage
76
Abstract
Human motion and action analysis in video is an actively growing field with a broad spectrum of applications including video browsing and indexing, entertainment, virtual reality, human-computer interaction and surveillance. However, human motion analysis systems face important scientific and computational challenges. The proportions of the human body vary across individuals due to gender, weight, age or race. Aside from this variability, any single human body has many degrees of freedom due to articulation and the individual limbs are deformable due to muscle and clothing. Finally many realworld scenes involve multiple interacting humans occluded by each other or by other objects, and the scene conditions may also vary due to the camera motion or lighting changes. These factors make accurate 3D human models difficult to build and difficult to reconstruct reliably from ?flat? 2D images. During this talk I will discuss learning and inference algorithms for estimating 3D human motion in monocular video. While the problem has been traditionally approached using the powerful machinery of generative models, operating in an analysis by synthesis loop, the main emphasis of this talk will be on an emerging class of complementary discriminative temporal estimation models. These can be viewed as upside down, bottom-up versions of classical temporal models used with Kalman filtering or particle filtering. But instead of inverting a generative imaging model, we will learn to cooperatively predict complex, feedforward 2D-to-3D mappings, using Conditional Bayesian Mixtures of Experts. These are embedded in a probabilistic temporal framework in order to enforce dynamic constraints and allow a principled propagation of uncertainty. We call the resulting model BM3E (a Conditional Bayesian Mixture of Experts Markov Model). During the talk, I will discuss how inference can be restricted, for efficiency, to low-dimensional non-linear state spaces, and how the framework can be extended - in order to deal with clutter and occlusion. I will also discuss the relative advantages of generative and initialize and recover from failure.
Keywords
Bayesian methods; Face; Filtering; Humans; Indexing; Layout; Motion analysis; Muscles; Surveillance; Virtual reality;
fLanguage
English
Publisher
ieee
Conference_Titel
Video and Signal Based Surveillance, 2006. AVSS '06. IEEE International Conference on
Conference_Location
Sydney, Australia
Print_ISBN
0-7695-2688-8
Type
conf
DOI
10.1109/AVSS.2006.3
Filename
4020735
Link To Document