Title of article :
Policy set iteration for Markov decision processes
Author/Authors :
Chang، نويسنده , , Hyeong Soo، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2013
Abstract :
This communique presents an algorithm called “policy set iteration” (PSI) for solving infinite horizon discounted Markov decision processes with finite state and action spaces as a simple generalization of policy iteration (PI). PSI generates a monotonically improving sequence of stationary Markovian policies { π k ∗ } based on a set manipulation, as opposed to PI’s single policy manipulation, at each iteration k . When the set involved with PSI at k contains N independently generated sample-policies from a given distribution d , the probability that the expected value of any sampled policy from d with respect to an initial state distribution is greater than that of π k ∗ converges to zero with O ( N − k ) rate. Moreover, PSI converges to an optimal policy no slower than PI in terms of the number of iterations for any d .
Keywords :
Markov decision processes , Randomization , Dynamic programming , Policy iteration
Journal title :
Automatica
Journal title :
Automatica