DocumentCode :
31440
Title :
Optimization of Average Rewards of Time Nonhomogeneous Markov Chains
Author :
Xi-Ren Cao
Author_Institution :
Dept. of Finance, Shanghai Jiao Tong Univ., Shanghai, China
Volume :
60
Issue :
7
fYear :
2015
fDate :
Jul-15
Firstpage :
1841
Lastpage :
1856
Abstract :
We study the optimization of average rewards of discrete time nonhomogeneous Markov chains, in which the state spaces, transition probabilities, and reward functions depend on time. The analysis encounters a few major difficulties: 1) Notions crucial to homogeneous Markov chains, such as ergodicity, stationarity, periodicity, and connectivity, no longer apply; 2) The average reward criterion is under-selective; i.e, it does not depend on the decisions in any finite period, and thus dynamic programming is not amenable; and 3) Because of the under-selectivity, an optimal average-reward policy may not be the best in any finite period. These issues are resolved by 1) We discover that a new notion, called “confluencity”, is the base for optimization of average rewards of Markov chains. Confluencity refers to the property that two independent sample paths of a Markov chain starting from any two different initial states will eventually meet together; 2) We apply the direct-comparison based approach [3] to the average reward optimization and obtain the necessary and sufficient conditions for optimal policies; and 3) We study the bias optimality with bias measuring the transient reward; we show that for the transient reward to be optimal, one additional condition based on bias potentials is required.
Keywords :
Markov processes; discrete time systems; dynamic programming; probability; state-space methods; average reward optimization; bias optimality; bias potential; confluencity; discrete time nonhomogeneous Markov chain; dynamic programming; necessary and sufficient condition; optimal average-reward policy; optimal policy; reward function; state space; transient reward; transition probability; Couplings; Dynamic programming; Equations; IEEE Potentials; Markov processes; Optimization; Transient analysis; Bias optimality; Confluencity; HJB equation; bias potential; confluencity; direct-comparison based optimization; performance potential; weak ergodicity; weak recurrence;
fLanguage :
English
Journal_Title :
Automatic Control, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9286
Type :
jour
DOI :
10.1109/TAC.2015.2394951
Filename :
7017536
Link To Document :
بازگشت