DocumentCode
840377
Title
Continuous-Time Adaptive Critics
Author
Hanselmann, T. ; Noakes, L. ; Zaknich, A.
Author_Institution
Dept. of Electr. & Electron. Eng., Melbourne Univ., Parkville, Vic.
Volume
18
Issue
3
fYear
2007
fDate
5/1/2007 12:00:00 AM
Firstpage
631
Lastpage
647
Abstract
A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. Practical benefits are that this framework fits in well with plant descriptions given by differential equations and that any standard integration routine with adaptive step-size does an adaptive sampling for free. A second-order actor adaptation using Newton´s method is established for fast actor convergence for a general plant and critic. Also, a fast critic update for concurrent actor-critic training is introduced to immediately apply necessary adjustments of critic parameters induced by actor updates to keep the Bellman optimality correct to first-order approximation after actor changes. Thus, critic and actor updates may be performed at the same time until some substantial error build up in the Bellman optimality or temporal difference equation, when a traditional critic training needs to be performed and then another interval of concurrent actor-critic training may resume
Keywords
Newton method; backpropagation; difference equations; dynamic programming; sampling methods; Bellman optimality; Newton method; adaptive sampling; adaptive step-size; backpropagation through time; concurrent actor-critic training; continuous-time adaptive critic design; differential equations; fast actor convergence; fast critic update; first-order approximation; real-time recurrent learning; second-order actor adaptation; temporal difference equation; Australia; Backpropagation; Control systems; Convergence; Costs; Differential equations; Dynamic programming; Error correction; Learning; Sampling methods; Actor–critic adaptation; adaptive critic design (ACD); approximate dynamic programming; backpropagation through time (BPTT); continuous adaptive critic designs; real-time recurrent learning (RTRL); reinforcement learning; second-order actor adaptation; Algorithms; Computer Simulation; Decision Support Techniques; Expert Systems; Information Storage and Retrieval; Models, Theoretical; Neural Networks (Computer); Pattern Recognition, Automated; Task Performance and Analysis;
fLanguage
English
Journal_Title
Neural Networks, IEEE Transactions on
Publisher
ieee
ISSN
1045-9227
Type
jour
DOI
10.1109/TNN.2006.889499
Filename
4182383
Link To Document