DocumentCode :
2800991
Title :
Multi-optima exploration with adaptive Gaussian mixture model
Author :
Calinon, Sylvain ; Pervez, Anjum ; Caldwell, D.G.
Author_Institution :
Dept. of Adv. Robot., Ist. Italiano di Tecnol. (IIT), Genoa, Italy
fYear :
2012
fDate :
7-9 Nov. 2012
Firstpage :
1
Lastpage :
6
Abstract :
In learning by exploration problems such as reinforcement learning (RL), direct policy search, stochastic optimization or evolutionary computation, the goal of an agent is to maximize some form of reward function (or minimize a cost function). Often, these algorithms are designed to find a single policy solution. We address the problem of representing the space of control policy solutions by considering exploration as a density estimation problem. Such representation provides additional information such as shape and curvature of local peaks that can be exploited to analyze the discovered solutions and guide the exploration. We show that the search process can easily be generalized to multi-peaked distributions by employing a Gaussian mixture model (GMM) with an adaptive number of components. The GMM has a dual role: representing the space of possible control policies, and guiding the exploration of new policies. A variation of expectation-maximization (EM) applied to reward-weighted policy parameters is presented to model the space of possible solutions, as if this space was a probability distribution. The approach is tested in a dart game experiment formulated as a black-box optimization problem, where the agent´s throwing capability increases while it chases for the best strategy to play the game. This experiment is used to study how the proposed approach can exploit new promising solution alternatives in the search process, when the optimality criterion slowly drifts over time. The results show that the proposed multi-optima search approach can anticipate such changes by exploiting promising candidates to smoothly adapt to the change of global optimum.
Keywords :
Gaussian processes; evolutionary computation; expectation-maximisation algorithm; learning (artificial intelligence); optimisation; search problems; statistical distributions; GMM; adaptive Gaussian mixture model; agent throwing capability; black-box optimization problem; control policy solution; cost function minimization; dart game experiment; density estimation problem; direct policy search; evolutionary computation; expectation-maximization variation; game strategy; learning by exploration problem; multioptima exploration; multioptima search approach; multipeaked distribution; optimality criterion; probability distribution; reinforcement learning; reward function maximization; reward-weighted policy parameter; search process; space representation; stochastic optimization; Aerospace electronics; Equations; Games; Gaussian distribution; Noise; Optimization; Robots;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Development and Learning and Epigenetic Robotics (ICDL), 2012 IEEE International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4673-4964-2
Electronic_ISBN :
978-1-4673-4963-5
Type :
conf
DOI :
10.1109/DevLrn.2012.6400808
Filename :
6400808
Link To Document :
بازگشت