• DocumentCode
    1724000
  • Title

    Three connectionist implementations of dynamic programming for optimal control: a preliminary comparative analysis

  • Author

    Bersini, Hugues ; Gorrini, Vittorio

  • Author_Institution
    IRIDIA, Univ. Libre de Bruxelles, Belgium
  • fYear
    1996
  • Firstpage
    428
  • Lastpage
    437
  • Abstract
    Three optimal control methodologies all relying on neural network for their universal approximation capabilities and on dynamic programming for substituting the time-integral optimization by a succession of time-local optimizations are presented in this paper and applied on the same elementary rendezvous problem. First a simplified version of the backpropagation-through-time algorithm is presented as the most faithful implementation of dynamic programming when the optimal controller is approximated by a neural network (learning by gradient descent) and the process model is available. Relaxing the need for an explicit prior modelling of the process model, reinforcement learning (RL) approaches, both for continuous and discrete controllers, are described and tested on the rendezvous problem. The results and the numerous methodological difficulties we met are discussed. The most successful reinforcement learning is the connectionist implementation of Q-learning with all Q-values approximated by radial-basis-function networks. However when searching for a continuous optimal controller, the price RL has to pay for the absence of model turns out to be far from negligible in terms of methodological difficulties, lack of robustness, convergence time and quality of the discovered solution
  • Keywords
    dynamic programming; learning (artificial intelligence); optimal control; robust control; Q-learning; backpropagation-through-time algorithm; connectionist implementations; dynamic programming; elementary rendezvous problem; gradient descent; optimal control; radial-basis-function networks; reinforcement learning; time-local optimizations; universal approximation capabilities; Backpropagation algorithms; Cost function; Delay effects; Dynamic programming; Jacobian matrices; Lagrangian functions; Learning; Neural networks; Optimal control; Optimization methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks for Identification, Control, Robotics, and Signal/Image Processing, 1996. Proceedings., International Workshop on
  • Conference_Location
    Venice
  • Print_ISBN
    0-8186-7456-3
  • Type

    conf

  • DOI
    10.1109/NICRSP.1996.542787
  • Filename
    542787