Title :
Bayesian Sequential Detection With Phase-Distributed Change Time and Nonlinear Penalty—A POMDP Lattice Programming Approach
Author :
Krishnamurthy, Vikram
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of British Columbia, Vancouver, BC, Canada
Abstract :
We show that the optimal decision policy for several types of Bayesian sequential detection problems has a threshold switching curve structure on the space of posterior distributions. This is established by using lattice programming and stochastic orders in a partially observed Markov decision process (POMDP) framework. A stochastic gradient algorithm is presented to estimate the optimal linear approximation to this threshold curve. We illustrate these results by first considering quickest time detection with phase-type distributed change time and a variance stopping penalty. Then it is proved that the threshold switching curve also arises in several other Bayesian decision problems such as quickest transient detection, exponential delay (risk-sensitive) penalties, stopping time problems in social learning, and multi-agent scheduling in a changing world. Using Blackwell dominance, it is shown that for dynamic decision making problems, the optimal decision policy is lower bounded by a myopic policy. Finally, it is shown how the achievable cost of the optimal decision policy varies with change time distribution by imposing a partial order on transition matrices.
Keywords :
Bayes methods; Markov processes; approximation theory; decision making; signal detection; stochastic programming; Bayesian sequential detection; POMDP lattice programming approach; blackwell dominance; dynamic decision making problem; exponential delay penalties; multiagent scheduling; myopic policy; nonlinear penalty; optimal linear approximation; partial observed Markov decision process; phase-distributed change time; posterior distribution; social learning; stochastic gradient algorithm; threshold switching curve structure; time detection; time distribution; transient detection; transition matrices; Bayesian methods; Delay; Lattices; Markov processes; Programming; Transient analysis; Blackwell dominance; POMDP; exponential delay penalty; lattice programming; monotone likelihood ratio ordering; multi-agent decision making; quickest time change detection; social learning; stochastic dominance; transient detection; variance penalty;
Journal_Title :
Information Theory, IEEE Transactions on
DOI :
10.1109/TIT.2011.2165152