Title of article

Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs Original Research Article

Author/Authors

Finale Doshi-Velez، نويسنده , , Joelle Pineau، نويسنده , , Nicholas Roy، نويسنده ,

Issue Information

روزنامه با شماره پیاپی سال 2012

Pages

18

From page

115

To page

132

Abstract

Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agentʼs sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agentʼs knowledge and actions that increase the agentʼs immediate reward. However, the task of specifying the POMDPʼs parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive. In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries—in which we ask an expert for the correct action—to infer the consequences of a potential pitfall without experiencing its effects. More important for human–robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified.

Keywords

Partially observable Markov decision process , Bayesian methods , Reinforcement learning

Journal title

Artificial Intelligence

Serial Year

2012

Journal title

Artificial Intelligence

Record number

Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs Original Research Article

Finale Doshi-Velez، نويسنده , , Joelle Pineau، نويسنده , , Nicholas Roy، نويسنده ,

1207912