مرکز منطقه ای اطلاع رساني علوم و فناوري - An analysis of gradient-based policy iteration

DocumentCode :

2753926

Title :

An analysis of gradient-based policy iteration

Author :

Dankert, James ; Yang, Lei ; Si, Jennie

Author_Institution :

Dept. of Electr. Eng., Arizona State Univ., Tempe, AZ, USA

Volume :

fYear :

2005

fDate :

31 July-4 Aug. 2005

Firstpage :

2977

Abstract :

Recently, a system theoretic framework for learning and optimization has been developed that shows how many approximate dynamic programming paradigms such as perturbation analysis, Markov decision processes, and reinforcement learning are very closely related. Using this system theoretic framework, a new optimization technique called gradient-based policy iteration (GBPI) has been developed. In this paper, we show how GBPI iteration can be extended to partially observable Markov decision processes (POMDPs). We also develop the value iteration analogue of GBPI and show that this new version of value iteration, extended to POMDPs, not only theoretically acts like value iteration but also does so numerically.

Keywords :

Markov processes; gradient methods; optimisation; system theory; gradient-based policy iteration; partially observable Markov decision process; system theory; value iteration analogue; Algorithm design and analysis; Artificial intelligence; Control systems; Dynamic programming; Electronic mail; Learning; Operations research; Poisson equations; System performance; Terminology;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks, 2005. IJCNN '05. Proceedings. 2005 IEEE International Joint Conference on

Print_ISBN :

0-7803-9048-2

Type :

conf

DOI :

10.1109/IJCNN.2005.1556399

Filename :

1556399

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2753926