This project was realized as part of my bachelor thesis at the Technical University of Chemnitz.
Problem
In production, disturbances during manufacturing and assembly can lead to suboptimal decisions made by foremen and workers based on their experience, affecting production plans and efficiency, increasing inventory costs, and potentially impacting customer satisfaction. RL algorithms have shown superhuman results in solving optimization problems with long time horizons. However, they struggle with generalization to unknown problems, limiting their real-world applicability, particularly in robotics.
To address this, two training methods, Curriculum Learning and Reference State Initialization, are available. CL involves progressively more difficult tasks to teach RL algorithms complex tasks gradually, akin to the education of children and adolescents. Reference State Initialization exposes RL algorithms to various intermediate states during training. Both methods have the potential to improve generalization. However, current research primarily focuses on applications in robotics and language processing, leaving the investigation of their utility in enhancing RL algorithms' generalization capability in the production domain an open question.
Solution
To address uncertainties in manufacturing, the reinforcement learning algorithm must be capable of solving a variety of different problems or disruptions, including diverse combinations of disruption location, duration, and timing. During training, the reinforcement learning algorithm should not merely memorize the presented problems but rather learn the skills to solve them. Consequently, the improved generalization capability should enable the use of reinforcement learning algorithms in real manufacturing environments.
To achieve these goals, the generalization capability of reinforcement learning algorithms for order scheduling is intended to be enhanced through a training method. It involves implementing and evaluating the respective advantages of the two available training methods, Curriculum Learning and Reference State Initialization. The use of training methods should enable reinforcement learning algorithms to generalize from a few trained problems to as many unknown problems as possible.
Results
In this thesis, Curriculum Learning and Reference State Initialization training methods are applied for order scheduling in manufacturing, creating various environments based on manufacturing requirements. Agents trained using Curriculum Learning follow a predefined curriculum, while RSI initializes a random training environment at the start of each iteration. After training with Curriculum Learning and Reference State Initialization, agents can successfully solve both known and unknown manufacturing environments. The validation shows that both training methods improve generalization for known and unknown environments. Training in multiple environments makes agents less dependent on specific conditions, allowing them to handle disruptions at different times and locations. Reference State Initialization outperforms Curriculum Learning in enhancing reinforcement learning algorithms' generalization capability for order scheduling, with better results and simpler handling.
In summary, this research explores training methods to improve reinforcement learning algorithms' generalization for order scheduling in manufacturing, with Reference State Initialization emerging as a more effective and user-friendly approach compared to Curriculum Learning. This enhances reinforcement learning's ability to complement human decision-makers and production systems in the manufacturing domain.