Title :
Optimal learning of transition probabilities in the two-agent newsvendor problem
Author :
Ryzhov, Ilya O. ; Valdez-Vivas, Martin R. ; Powell, Warren B.
Author_Institution :
Oper. Res. & Financial Eng., Princeton Univ., Princeton, NJ, USA
Abstract :
We examine a newsvendor problem with two agents: a requesting agent that observes private demand information, and an oversight agent that must determine how to allocate resources upon receiving a bid from the requesting agent. Because the two agents have different cost structures, the requesting agent tends to bid higher than the amount that is actually needed. As a result, the allocating agent needs to adaptively learn how to interpret the bids and estimate the requesting agent´s biases. Learning must occur as quickly as possible, because each suboptimal resource allocation incurs an economic cost. We present a mathematical model that casts the problem as a Markov decision process with unknown transition probabilities. We then perform a simulation study comparing four different techniques for optimal learning of transition probabilities. The best technique is shown to be a knowledge gradient algorithm, based on a one-period look-ahead approach.
Keywords :
Markov processes; decision theory; learning (artificial intelligence); multi-agent systems; operations research; probability; resource allocation; Markov decision process; knowledge gradient algorithm; mathematical model; one period look ahead approach; optimal learning; oversight agent; private demand information; suboptimal resource allocation; transition probability; two agent newsvendor problem; unknown transition probabilities; Approximation methods; Bayesian methods; Games; History; Markov processes; Mathematical model; Resource management;
Conference_Titel :
Simulation Conference (WSC), Proceedings of the 2010 Winter
Conference_Location :
Baltimore, MD
Print_ISBN :
978-1-4244-9866-6
DOI :
10.1109/WSC.2010.5679081