Author/Authors :
Fu، نويسنده , , Jie and Tanner، نويسنده , , Herbert G. and Heinz، نويسنده , , Jeffrey N. and Karydis، نويسنده , , Konstantinos and Chandlee، نويسنده , , Jane and Koirala، نويسنده , , Cesar، نويسنده ,
Abstract :
A system can accomplish an objective specified in temporal logic while interacting with an unknown, dynamic but rule-governed environment, by employing grammatical inference and adapting its plan of action on-line. The purposeful interaction of the system with its unknown environment can be described by a deterministic two-player zero-sum game. Using special new product operations, the whole game can be expressed with a factored, modular representation. This representation not only offers computational benefits but also isolates the unknown behavior of the dynamic environment in a particular subsystem, which then becomes the target of learning. As the fidelity of the identified environment model increases, the strategy synthesized based on the learned hypothesis converges in finite time to the one that satisfies the task specification.