مرکز منطقه ای اطلاع رساني علوم و فناوري - Approximate Dynamic Programming with (min; +) linear function approximation for Markov decision processes

DocumentCode :

114624

Title :

Approximate Dynamic Programming with (min; +) linear function approximation for Markov decision processes

Author :

Chandrashekar, L. ; Bhatnagar, Shalabh

Author_Institution :

Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India

fYear :

2014

fDate :

15-17 Dec. 2014

Firstpage :

1588

Lastpage :

1593

Abstract :

Markov Decision Process (MDP) is a useful framework to study problems of optimal sequential decision making under uncertainty. Given any MDP the aim here is to find the optimal action selection mechanism i.e., the optimal policy. Typically, the optimal policy (u*) is obtained by substituting the optimal value-function (J*) in the Bellman equation. Alternatively, u* is also obtained by learning the optimal state-action value function Q* known as the Q value-function. However, it is difficult to compute the exact values of J* or Q* for MDPs with large number of states. Approximate Dynamic Programming (ADP) methods address this difficulty by computing lower dimensional approximations of J*/Q*. Most ADP methods employ linear function approximation (LFA), i.e., the approximate solution lies in a subspace spanned by a family of pre-selected basis functions. The approximation is obtained via a linear least squares projection of higher dimensional quantities and the L₂ norm plays an important role in convergence and error analysis. In this paper, we discuss ADP methods for MDPs based on LFAs in the (min, +) algebra. Here the approximate solution is a (min, +) linear combination of a set of basis functions whose span constitutes a subsemimodule. Approximation is obtained via a projection operator onto the subsemimodule which is different from linear least squares projection used in ADP methods based on conventional LFAs. MDPs are not (min, +) linear systems, nevertheless, we show that the monotonicity property of the projection operator helps us establish the convergence of our ADP schemes. We also discuss future directions in ADP methods for MDPs based on the (min, +) LFAs.

Keywords :

Markov processes; approximation theory; computational complexity; decision making; dynamic programming; least squares approximations; ADP methods; Bellman equation; LFA; MDP; Markov decision processes; approximate dynamic programming; linear function approximation; linear least squares projection; optimal action selection mechanism; optimal policy; optimal sequential decision making; optimal state-action value function; optimal value function; Convergence; Equations; Function approximation; Least squares approximations; Markov processes; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on

Conference_Location :

Los Angeles, CA

Print_ISBN :

978-1-4799-7746-8

Type :

conf

DOI :

10.1109/CDC.2014.7039626

Filename :

7039626

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=114624