Iterative local dynamic programming

Author

Todorov, Emanuel ; Tassa, Yuval

Author_Institution

Dept. of Cognitive Sci., Univ. of California San Diego, San Diego, CA

fYear

2009

fDate

March 30 2009-April 2 2009

Firstpage

90

Lastpage

95

Abstract

We develop an iterative local dynamic programming method (iLDP) applicable to stochastic optimal control problems in continuous high-dimensional state and action spaces. Such problems are common in the control of biological movement, but cannot be handled by existing methods. iLDP can be considered a generalization of differential dynamic programming, in as much as: (a) we use general basis functions rather than quadratics to approximate the optimal value function; (b) we introduce a collocation method that dispenses with explicit differentiation of the cost and dynamics and ties iLDP to the unscented Kalman filter; (c) we adapt the local function approximator to the propagated state covariance, thus increasing accuracy at more likely states. Convergence is similar to quasi-Newton methods. We illustrate iLDP on several problems including the ldquoswimmerrdquo dynamical system which has 14 state and 4 control variables.

Keywords

Kalman filters; Newton method; covariance analysis; dynamic programming; optimal control; stochastic systems; action spaces; collocation method; continuous high-dimensional state; differential dynamic programming; explicit differentiation; iterative local dynamic programming; local function approximator; optimal value function; quasi-Newton methods; state covariance; stochastic optimal control problems; swimmer dynamical system; unscented Kalman filter; Control systems; Costs; Dynamic programming; Function approximation; Iterative methods; Learning; Open loop systems; Optimal control; Stochastic processes; Stochastic resonance;

fLanguage

English

Publisher

ieee

Conference_Titel

Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on

Conference_Location

Nashville, TN

Print_ISBN

978-1-4244-2761-1

Type

conf

DOI

10.1109/ADPRL.2009.4927530

Filename

4927530