3D Human Motion Analysis in Monocular Video Techniques and Challenges

Author

Sminchisescu, Cristian

Author_Institution

TTI-C, USA

fYear

2006

fDate

Nov. 2006

Firstpage

76

Lastpage

76

Abstract

Human motion and action analysis in video is an actively growing field with a broad spectrum of applications including video browsing and indexing, entertainment, virtual reality, human-computer interaction and surveillance. However, human motion analysis systems face important scientific and computational challenges. The proportions of the human body vary across individuals due to gender, weight, age or race. Aside from this variability, any single human body has many degrees of freedom due to articulation and the individual limbs are deformable due to muscle and clothing. Finally many realworld scenes involve multiple interacting humans occluded by each other or by other objects, and the scene conditions may also vary due to the camera motion or lighting changes. These factors make accurate 3D human models difficult to build and difficult to reconstruct reliably from ?flat? 2D images. During this talk I will discuss learning and inference algorithms for estimating 3D human motion in monocular video. While the problem has been traditionally approached using the powerful machinery of generative models, operating in an analysis by synthesis loop, the main emphasis of this talk will be on an emerging class of complementary discriminative temporal estimation models. These can be viewed as upside down, bottom-up versions of classical temporal models used with Kalman filtering or particle filtering. But instead of inverting a generative imaging model, we will learn to cooperatively predict complex, feedforward 2D-to-3D mappings, using Conditional Bayesian Mixtures of Experts. These are embedded in a probabilistic temporal framework in order to enforce dynamic constraints and allow a principled propagation of uncertainty. We call the resulting model BM3E (a Conditional Bayesian Mixture of Experts Markov Model). During the talk, I will discuss how inference can be restricted, for efficiency, to low-dimensional non-linear state spaces, and how the framework can be extended - in order to deal with clutter and occlusion. I will also discuss the relative advantages of generative and initialize and recover from failure.

Keywords

Bayesian methods; Face; Filtering; Humans; Indexing; Layout; Motion analysis; Muscles; Surveillance; Virtual reality;

fLanguage

English

Publisher

ieee

Conference_Titel

Video and Signal Based Surveillance, 2006. AVSS '06. IEEE International Conference on

Conference_Location

Sydney, Australia

Print_ISBN

0-7695-2688-8

Type

conf

DOI

10.1109/AVSS.2006.3

Filename

4020735