Distributed optimization of markov reward processes

Author

ñez, Enrique Campos Ná

Author_Institution

George Washington Univ., Washington

fYear

2007

fDate

12-14 Dec. 2007

Firstpage

6166

Lastpage

6171

Abstract

Dynamic programming provides perhaps the most natural way to model many control problems, but suffers from the fact that existing solution procedures do not scale gracefully with the size of the problem. In this work, we present a gradient- based policy search technique that exploits the fact that in many applications the state space and control actions are naturally distributed. After presenting our modeling assumptions, we introduce a technique in which a set of distributed agents compute an estimate of the partial derivative of a system-wide objective with respect to the parameters under their control and use it in a gradient-based policy search procedure. We illustrate the algorithm with an application to energy-efficient coverage in energy harvesting sensor networks. The resulting algorithm can be implemented using only local information available to the sensors, and is therefore fully scalable. Our numerical results are encouraging and allow us to conjecture the usefulness of our approach.

Keywords

decentralised control; dynamic programming; gradient methods; state-space methods; stochastic systems; Markov reward process; distributed optimization; dynamic programming; gradient-based policy search technique; state-space method; Control systems; Decision making; Distributed computing; Dynamic programming; Energy efficiency; Modeling; Resource management; Size control; State-space methods; Stochastic systems;

fLanguage

English

Publisher

ieee

Conference_Titel

Decision and Control, 2007 46th IEEE Conference on

Conference_Location

New Orleans, LA

ISSN

0191-2216

Print_ISBN

978-1-4244-1497-0

Electronic_ISBN

0191-2216

Type

conf

DOI

10.1109/CDC.2007.4434649

Filename

4434649