مرکز منطقه ای اطلاع رساني علوم و فناوري - Pareto Upper Confidence Bounds algorithms: An empirical study

DocumentCode :

1799312

Title :

Pareto Upper Confidence Bounds algorithms: An empirical study

Author :

Drugan, Madalina M. ; Nowe, Ann ; Manderick, Bernard

Author_Institution :

Artificial Intell. Lab., Vrije Univ. Brussel, Brussels, Belgium

fYear :

2014

fDate :

9-12 Dec. 2014

Firstpage :

Lastpage :

Abstract :

Many real-world stochastic environments are inherently multi-objective environments with conflicting objectives. The multi-objective multi-armed bandits (MOMAB) are extensions of the classical, i.e. single objective, multi-armed bandits to reward vectors and multi-objective optimisation techniques are often required to design mechanisms with an efficient exploration / exploitation trade-off. In this paper, we propose the improved Pareto Upper Confidence Bound (iPUCB) algorithm that straightforwardly extends the single objective improved UCB algorithm to reward vectors by deleting the suboptimal arms. The goal of the improved Pareto UCB algorithm, i.e. iPUCB, is to identify the set of best arms, or the Pareto front, in a fixed budget of arm pulls. We experimentally compare the performance of the proposed Pareto upper confidence bound algorithm with the Pareto UCB1 algorithm and the Hoeffding race on a bi-objective example coming from an industrial control applications, i.e. the engagement of wet clutches. We propose a new regret metric based on the Kullback-Leibler divergence to measure the performance of a multi-objective multi-armed bandit algorithm. We show that iPUCB outperforms the other two tested algorithms on the given multi-objective environment.

Keywords :

Pareto optimisation; learning (artificial intelligence); stochastic processes; Hoeffding race; Kullback-Leibler divergence; MOMAB; Pareto UCB1 algorithm; Pareto upper confidence bounds algorithms; UCB algorithm; bi-objective example; industrial control applications; multiobjective environments; multiobjective multiarmed bandit algorithm; multiobjective multiarmed bandits; multiobjective optimisation techniques; real-world stochastic environments; wet clutches; Algorithm design and analysis; Electronic mail; Hypercubes; Measurement; Pareto optimization; Upper bound; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on

Conference_Location :

Orlando, FL

Type :

conf

DOI :

10.1109/ADPRL.2014.7010620

Filename :

7010620

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1799312