Adaptive Optimization of Markov Reward Processes

Author

Campos-Nánez, Enrique ; Patek, Stephen D.

Author_Institution

Department of Engineering Management and Systems Engineering, The George Washington University, 1776, G Street Washington, DC, 20052, USA ecamposn@gwu.edu

fYear

2005

fDate

12-15 Dec. 2005

Firstpage

8034

Lastpage

8041

Abstract

We consider the problem of optimizing the average reward of Markov chains controlled by two sets of parameters 1) a set of tunable parameters and 2) a set of fixed but unknown parameters. We study the convergence characteristics of recursive estimation procedures based on the observation of regenerative cycles. We also provide sufficient conditions for the convergence to local optima of existing simulation-based optimization procedures under parameter certainty, in order to achieve simultaneous optimal selection of the tunable parameters and identification of the unknown parameters. To illustrate our approach, we discuss an algorithm which exploits the gradient of the likelihood of an observed regenerative cycle and its application to a regenerative simulation-based algorithm introduced in [1]. Our results are illustrated numerically in a problem of optimal pricing of services in a multi-class loss network.

Keywords

Convergence; Dynamic programming; Modeling; Pricing; Q factor; Recursive estimation; State estimation; State-space methods; Sufficient conditions; Systems engineering and theory;

fLanguage

English

Publisher

ieee

Conference_Titel

Decision and Control, 2005 and 2005 European Control Conference. CDC-ECC '05. 44th IEEE Conference on

Print_ISBN

0-7803-9567-0

Type

conf

DOI

10.1109/CDC.2005.1583462

Filename

1583462