DocumentCode
493372
Title
Inferring bounds on the performance of a control policy from a sample of trajectories
Author
Fonteneau, Raphael ; Murphy, Susan ; Wehenkel, Louis ; Ernst, Damien
Author_Institution
Dept. of Electr. Eng. & Comput. Sci., Univ. of Liege, Liege
fYear
2009
fDate
March 30 2009-April 2 2009
Firstpage
117
Lastpage
123
Abstract
We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this paper, the dynamics, control policy, and reward function are supposed to be deterministic and Lipschitz continuous. Under these assumptions, a polynomial algorithm, in terms of the sample size and length of the optimization horizon, is derived to compute these bounds, and their tightness is characterized in terms of the sample density.
Keywords
continuous systems; optimal control; optimisation; polynomials; Lipschitz continuous; control policy; optimization horizon; polynomial algorithm; reward function; trajectories sample; Artificial intelligence; Biomedical engineering; Computational modeling; Control systems; Dynamic programming; Fingers; Optimal control; Polynomials; Predictive models; Upper bound;
fLanguage
English
Publisher
ieee
Conference_Titel
Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on
Conference_Location
Nashville, TN
Print_ISBN
978-1-4244-2761-1
Type
conf
DOI
10.1109/ADPRL.2009.4927534
Filename
4927534
Link To Document