Title :
Tracking of real-valued Markovian random processes with asymmetric cost and observation
Author :
Mansourifard, Parisa ; Krishnamachari, Bhaskar ; Javidi, Tara
Author_Institution :
Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
Abstract :
We study a state-tracking problem in which the background random process is Markovian with unknown real-valued states and known transition probability densities. At each time step the decision-maker chooses a state as an action and accumulates some reward based on the selected state and the actual state. If the selected state is higher than the actual state, the actual state is fully observed in expense of overutilization cost. Otherwise, the decision-maker has to pay underutilization cost and could only observe the actual state partially (that it is higher than the selected state). Thus, the decision-maker faces asymmetries in both cost and observation. The goal is to select the actions in order to maximize the total expected discounted reward over infinite horizon. We model this problem as a Partially Observable Markov Decision Process and formulate it in two different ways: (i) belief-based, and (ii) sequence-based. In the sequence-based formulation, only two parameters matter to define the sequence of actions, the last fully observed state and the time passed from the last observation. We prove key structural properties of the optimal policy including a lower bound on the optimal sequence. Further, for a specific form of processes we present an upper bound on the optimal sequence. Both lower and upper bound sequences have percentile threshold structure and are monotonically increasing with respect to the last fully observed state.
Keywords :
Markov processes; decision making; decision theory; probability; random processes; asymmetric cost; background random process; belief-based; infinite horizon; optimal policy; overutilization cost; partially observable Markov decision process; real-valued Markovian random processes; sequence-based formulation; state-tracking problem; structural properties; total expected discounted reward; transition probability densities; underutilization cost; Bandwidth; Markov processes; Optimized production technology; Probability density function; Protocols; Random processes; Upper bound;
Conference_Titel :
American Control Conference (ACC), 2015
Conference_Location :
Chicago, IL
Print_ISBN :
978-1-4799-8685-9
DOI :
10.1109/ACC.2015.7171888