مرکز منطقه ای اطلاع رساني علوم و فناوري - Three connectionist implementations of dynamic programming for optimal control: a preliminary comparative analysis

DocumentCode :

1724000

Title :

Three connectionist implementations of dynamic programming for optimal control: a preliminary comparative analysis

Author :

Bersini, Hugues ; Gorrini, Vittorio

Author_Institution :

IRIDIA, Univ. Libre de Bruxelles, Belgium

fYear :

1996

Firstpage :

428

Lastpage :

437

Abstract :

Three optimal control methodologies all relying on neural network for their universal approximation capabilities and on dynamic programming for substituting the time-integral optimization by a succession of time-local optimizations are presented in this paper and applied on the same elementary rendezvous problem. First a simplified version of the backpropagation-through-time algorithm is presented as the most faithful implementation of dynamic programming when the optimal controller is approximated by a neural network (learning by gradient descent) and the process model is available. Relaxing the need for an explicit prior modelling of the process model, reinforcement learning (RL) approaches, both for continuous and discrete controllers, are described and tested on the rendezvous problem. The results and the numerous methodological difficulties we met are discussed. The most successful reinforcement learning is the connectionist implementation of Q-learning with all Q-values approximated by radial-basis-function networks. However when searching for a continuous optimal controller, the price RL has to pay for the absence of model turns out to be far from negligible in terms of methodological difficulties, lack of robustness, convergence time and quality of the discovered solution

Keywords :

dynamic programming; learning (artificial intelligence); optimal control; robust control; Q-learning; backpropagation-through-time algorithm; connectionist implementations; dynamic programming; elementary rendezvous problem; gradient descent; optimal control; radial-basis-function networks; reinforcement learning; time-local optimizations; universal approximation capabilities; Backpropagation algorithms; Cost function; Delay effects; Dynamic programming; Jacobian matrices; Lagrangian functions; Learning; Neural networks; Optimal control; Optimization methods;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks for Identification, Control, Robotics, and Signal/Image Processing, 1996. Proceedings., International Workshop on

Conference_Location :

Venice

Print_ISBN :

0-8186-7456-3

Type :

conf

DOI :

10.1109/NICRSP.1996.542787

Filename :

542787

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1724000