Consider a controlled Markov chain whose transition probabilities depend upon an unknown parameter α taking values in finite set

. To each α is associated a prespecified stationary control law

. The adaptive control law selects at each time

the control action indicated by

where α
tis the maximum likelihood estimate of α. It is shown that α
tconverges to a parameter α
*such that the "closed-loop" transition probabilities corresponding to α
*and

are the same as those corresponding to α
0and

where α
0is the true parameter. The situation when α
0does not belong to the model set

is briefly discussed.