Title :
An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis
Author :
Doltsinis, Stefanos ; Ferreira, Paulo ; Lohse, Niels
Author_Institution :
Manuf. Div., Univ. of Nottingham, Nottingham, UK
Abstract :
Ramp-up is a significant bottleneck for the introduction of new or adapted manufacturing systems. The effort and time required to ramp-up a system is largely dependent on the effectiveness of the human decision making process to select the most promising sequence of actions to improve the system to the required level of performance. Although existing work has identified significant factors influencing the effectiveness of ramp-up, little has been done to support the decision making during the process. This paper approaches ramp-up as a sequential adjustment and tuning process that aims to get a manufacturing system to a desirable performance in the fastest possible time. Production stations and machines are the key resources in a manufacturing system. They are often functionally decoupled and can be treated in the first instance as independent ramp-up problems. Hence, this paper focuses on developing a Markov decision process (MDP) model to formalize ramp-up of production stations and enable their formal analysis. The aim is to capture the cause-and-effect relationships between an operator´s adaptation or adjustment of a station and the station´s response to improve the effectiveness of the process. Reinforcement learning has been identified as a promising approach to learn from ramp-up experience and discover more successful decision-making policies. Batch learning in particular can perform well with little data. This paper investigates the application of a Q-batch learning algorithm combined with an MDP model of the ramp-up process. The approach has been applied to a highly automated production station where several ramp-up processes are carried out. The convergence of the Q-learning algorithm has been analyzed along with the variation of its parameters. Finally, the learned policy has been applied and compared against previous ramp-up cases.
Keywords :
Markov processes; decision making; learning (artificial intelligence); manufacturing systems; production engineering computing; MDP model-based reinforcement learning approach; Markov decision process model; Q-batch learning algorithm; automated production station; cause-and-effect relationships; convergence analysis; formal analysis; human decision making process; manufacturing systems; operator adaptation; performance level improvement; process effectiveness improvement; production machines; production station ramp-up optimization; sequential adjustment; station response adjustment; system improvement; tuning process; Algorithm design and analysis; Decision making; Learning (artificial intelligence); Manufacturing systems; Personnel; Decision-making; Markov processes; learning systems; manufacturing automation;
Journal_Title :
Systems, Man, and Cybernetics: Systems, IEEE Transactions on
DOI :
10.1109/TSMC.2013.2294155