Title of article :
On functional equations for th best policies in Markov decision processes
Author/Authors :
Chang، نويسنده , , Hyeong Soo، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2013
Pages :
4
From page :
297
To page :
300
Abstract :
This paper revisits the problem of finding the values of K th best policies for finite-horizon finite Markov decision processes. The recursive dynamic-programming (DP) equations established by Bellman and Kalaba for non-deterministic MDPs with zero-cost function in [Bellman, R., & Kalaba, R. (1960). On kth best policies. Journal of SIAM, 8, 582–588] are incomplete because expectation and selection for the K th minimum do not interchange in general. Based on the DP equations by Dreyfus for the K th shortest path problem, some non-DP equations generally satisfied by the values of the K th best policies are identified, from which corrected Bellman and Kalaba’s DP equations are derived with an appropriate sufficient condition.
Keywords :
Dynamic programming , Ranks , Markov decision processes
Journal title :
Automatica
Serial Year :
2013
Journal title :
Automatica
Record number :
1448994
Link To Document :
بازگشت