مرکز منطقه ای اطلاع رساني علوم و فناوري - Reinforcement learning for penalty avoiding policy making

DocumentCode :

1737881

Title :

Reinforcement learning for penalty avoiding policy making

Author :

Miyazaki, Kazuteru ; Kobayashi, Shigenobu

Author_Institution :

Nat. Instn. for Acad. Degrees, Tokyo, Japan

Volume :

fYear :

2000

fDate :

2000

Firstpage :

206

Abstract :

Reinforcement learning is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to a reward. In general, the purpose of a reinforcement learning system is to acquire an optimum policy that can maximize expected reward per action. However, it is not always important for any environment. Especially, if we apply reinforcement learning to engineering, we expect the agent to avoid all penalties. In Markov decision processes, we call a rule penalty if and only if it has a penalty or it can transit to a penalty state where it does not contribute to get any reward. After suppressing all penalty rules, we aim to make a rational policy whose expected reward per action is larger than zero. We propose the penalty avoiding rational policy making algorithm that can suppress any penalty as stable as possible and get a reward constantly. By applying the algorithm to the tick-tack-toe its effectiveness is shown

Keywords :

Markov processes; decision theory; game theory; learning (artificial intelligence); Markov decision processes; expected reward per action; machine learning; optimum policy; penalty avoiding policy making; penalty rules; rational policy; reinforcement learning; tick-tack-toe; Learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Systems, Man, and Cybernetics, 2000 IEEE International Conference on

Conference_Location :

Nashville, TN

ISSN :

1062-922X

Print_ISBN :

0-7803-6583-6

Type :

conf

DOI :

10.1109/ICSMC.2000.884990

Filename :

884990

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1737881