• DocumentCode
    493372
  • Title

    Inferring bounds on the performance of a control policy from a sample of trajectories

  • Author

    Fonteneau, Raphael ; Murphy, Susan ; Wehenkel, Louis ; Ernst, Damien

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Univ. of Liege, Liege
  • fYear
    2009
  • fDate
    March 30 2009-April 2 2009
  • Firstpage
    117
  • Lastpage
    123
  • Abstract
    We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this paper, the dynamics, control policy, and reward function are supposed to be deterministic and Lipschitz continuous. Under these assumptions, a polynomial algorithm, in terms of the sample size and length of the optimization horizon, is derived to compute these bounds, and their tightness is characterized in terms of the sample density.
  • Keywords
    continuous systems; optimal control; optimisation; polynomials; Lipschitz continuous; control policy; optimization horizon; polynomial algorithm; reward function; trajectories sample; Artificial intelligence; Biomedical engineering; Computational modeling; Control systems; Dynamic programming; Fingers; Optimal control; Polynomials; Predictive models; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on
  • Conference_Location
    Nashville, TN
  • Print_ISBN
    978-1-4244-2761-1
  • Type

    conf

  • DOI
    10.1109/ADPRL.2009.4927534
  • Filename
    4927534