DocumentCode :
3709780
Title :
Correct-by-synthesis reinforcement learning with temporal logic constraints
Author :
Min Wen;Rudiger Ehlers;Ufuk Topcu
Author_Institution :
Dept. of Electr. &
fYear :
2015
Firstpage :
4983
Lastpage :
4990
Abstract :
We consider a problem on the synthesis of optimal reactive controllers with an a priori unknown performance criterion while satisfying a given temporal logic specification through the interaction with an uncontrolled environment. We decouple the problem into two sub-problems. First, we extract a (maximally) permissive strategy for the system, which encodes multiple (possibly all) ways in which the system can react to the adversarial environment and satisfy the specifications. Then, we quantify the a priori unknown performance criterion as a (still unknown) reward function, and compute - by using the so-called maximin-Q learning algorithm - an optimal strategy for the system within the operating envelope allowed by the permissive strategy. We establish both correctness (with respect to the temporal logic specifications) and optimality (with respect to the a priori unknown performance criterion) of this two-step technique for a fragment of temporal logic specifications. For specifications beyond this fragment, correctness can still be preserved, but the learned strategy may be sub-optimal. We present an algorithm to the overall problem, and demonstrate its use and computational requirements on a set of robot motion planning examples.
Keywords :
"Games","Safety","Learning (artificial intelligence)","Planning","Mobile robots","Collision avoidance"
Publisher :
ieee
Conference_Titel :
Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on
Type :
conf
DOI :
10.1109/IROS.2015.7354078
Filename :
7354078
Link To Document :
بازگشت