PAC learning for Markov decision processes and dynamic games

Author

Jain, Rahul ; Varaiya, Pravin P.

Author_Institution

EECS Dept., California Univ., Berkeley, CA, USA

fYear

2004

fDate

27 June-2 July 2004

Firstpage

468

Abstract

We extend the probably approximately correct (PAC) model of learning to Markov decision processes (MDPs) and dynamic games. We obtain simulation-based uniform sample complexity bounds for value function estimates of discounted reward MDPs. We also obtain uniform sample complexity results for Markov games with a finite number of players.

Keywords

Markov processes; decision theory; game theory; Markov decision process; Markov game; dynamic game; function estimation; probably approximately correct learning; sample complexity bound; Contracts; Convergence; Markov processes; Noise generators; Space stations; State-space methods; Stochastic processes;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Theory, 2004. ISIT 2004. Proceedings. International Symposium on

Print_ISBN

0-7803-8280-3

Type

conf

DOI

10.1109/ISIT.2004.1365505

Filename

1365505

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2060865