مرکز منطقه ای اطلاع رساني علوم و فناوري - Partially observable Markov decision processes with reward information

DocumentCode :

435021

Title :

Partially observable Markov decision processes with reward information

Author :

Cao, Xi-Ren ; Guo, Xianping

Author_Institution :

Dept. of Electr. & Electron. Eng., Hong Kong Univ. of Sci. & Technol., China

Volume :

fYear :

2004

fDate :

14-17 Dec. 2004

Firstpage :

4393

Abstract :

In a partially observable Markov decision process (POMDP), if the reward can be observed at each step, then the observed reward history contains information for the unknown state. This information, in addition to the information contained in the observation history, can be used to update the state probability distribution. The policy thus obtained is called a reward-information policy (RI-policy); an optimal RI policy performs no worse than any normal optimal policy depending only on the observation history. The above observation leads to four different problem-formulations for partially observable Markov decision processes (POMDPs) depending on whether the reward function is known and whether the reward at each step is observable.

Keywords :

Markov processes; decision theory; probability; observation history; partially observable Markov decision process; reward history; reward-information policy; state probability distribution; Cost function; History; Mathematics; Probability distribution; State estimation; State-space methods; Uncertainty; User-generated content;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control, 2004. CDC. 43rd IEEE Conference on

ISSN :

0191-2216

Print_ISBN :

0-7803-8682-5

Type :

conf

DOI :

10.1109/CDC.2004.1429442

Filename :

1429442

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=435021