A class of stochastic automata models is proposed for the synthesis of a parameter optimizing controller. The automaton can operate in environments characterized by reward strengths (

-models) or reward probabilities (

-models). In the

-model case the proposed algorithm is equivalent to the ε-optimal algorithm reported by Shapiro and Narendra. The algorithm discussed here was originally reported by Mason with emphasis on the

-model case. In this paper, emphasis is placed on the

-model case. Recently, an equivalent ε-optimal algorithm has been reported by Viswanathan and Narendra. It is Shown herein that only the optimal solution is stable and that the expected performance converges monotonically. Simulation results are presented that corroborate the analytical results. It is demonstrated that the proposed algorithm is superior tO McLaren\´s linear reinforcement scheme in regard to expediency.