• DocumentCode
    853002
  • Title

    Decentralized learning in finite Markov chains

  • Author

    Wheeler, Richard M., Jr. ; Narendra, Kumpati S.

  • Author_Institution
    Sandia National Laboratories, Livermore, CA, USA
  • Volume
    31
  • Issue
    6
  • fYear
    1986
  • fDate
    6/1/1986 12:00:00 AM
  • Firstpage
    519
  • Lastpage
    526
  • Abstract
    The principal contribution of this paper is a new result on the decentralized control of finite Markov chains with unknown transition probabilities and rewords. One decentralized decision maker is associated with each state in which two or more actions (decisions) are available. Each decision maker uses a simple learning scheme, requiring minimal information, to update its action choice. It is shown that, if updating is done in sufficiently small steps, the group will converge to the policy that maximizes the long-term expected reward per step. The analysis is based on learning in sequential stochastic games and on certain properties, derived in this paper, of ergodic Markov chains. A new result on convergence in identical payoff games with a unique equilibrium point is also presented.
  • Keywords
    Distributed control; Distributed decision-making; Learning control systems; Markov processes; Optimal stochastic control; Stochastic optimal control; Control systems; Convergence; Costs; Decision making; Distributed control; Game theory; Parameter estimation; Process control; Stochastic processes; Stochastic systems;
  • fLanguage
    English
  • Journal_Title
    Automatic Control, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9286
  • Type

    jour

  • DOI
    10.1109/TAC.1986.1104342
  • Filename
    1104342