Modular Transfer Reinforcement Learning in Industrial Robotics
The PhD thesis of Dr.-Ing. Christian Bitter focuses on the use of self-learning agents for robot control learning. In this context, the modularization of decision-making processes was studied to achieve data-efficient and interpretable creation of building blocks for perception, planning, and execution. In addition, the recombination and subsequent transfer of these building blocks for new scenarios was demonstrated.
We asked Christian about his dissertation:
In what context was your dissertation written? Which projects or other factors particularly influenced your dissertation?
I first encountered the topic of reinforcement learning for robot control systems while working on my master's thesis, in which I used a self-learning agent for an industrial robot to learn how to play the game “hot wire.” By the end of my thesis, I was convinced and excited about the potential of self-learning agents, but training the agents was very time-consuming and nerve-wracking. Due to the direct training on the real robot, the experiments were slow and the test setup could break down. In addition, the learning progress of an agent was very opaque, and the question of whether a training run still had a chance of success or should rather be aborted was difficult to impossible to answer.
Based on these experiences, during my subsequent doctoral studies, I asked myself how the application of reinforcement learning could be made more efficient and transparent. I found answers in transfer learning, starting with the use of simulation for pre-training and the transfer of agents between task variations. In addition, I had the opportunity to investigate reinforcement learning in an industrial application for automated aircraft shell assembly in the AGR33D project. To answer the central question of the project, namely how agents can be trained for precise joining tasks, we set up a corresponding demonstrator in our robotics lab at the chair. While the scenario could be reproduced realistically enough using real components and 3D printing, there was one major difference from the real process: the robot. The question of how agents can be transferred between different robot models continued to occupy my mind even after the project was completed and became an important part of my dissertation.
What contribution does your work make to the field of research?
In my work, I have successfully demonstrated the modularization of reinforcement learning agents in context perception, task planning, and robot control, and contributed to the transfer of modules from simulation, between tasks, and across robot models. Specifically, I have combined asynchronous reinforcement learning with generative methods to compensate for latencies in robot control. I then used AI models for compression to extract process-relevant information from image data and was able to demonstrate the transfer from simulation. I then used hierarchical reinforcement learning to separate the task-specific strategy from the cross-task tactics, whereby the latter could be reused for new tasks. Finally, I developed a method for comparing and transferring movements between different robot models.
Modularization for complexity reduction and transparency is a well-established concept in engineering and computer science. The main thesis of my work, that the modularization of an AI agent is worthwhile, is therefore not radically new. Nevertheless, my work provides an important counterpoint to the exciting and relevant research on increasingly powerful AI agents in terms of the applicability of reinforcement learning in industrial environments by showing that it is not only size and complexity that determine success, but the structured decomposition of capabilities into transferable, interpretable, and maintainable modules. This perspective opens new ways to control learning processes in a more targeted manner, transfer knowledge between tasks, and ultimately realize more practical, reliable, and comprehensible AI systems in industrial applications.
What's next for you and the topic?
After a very enjoyable time in application-oriented research, I have now ended up in research-oriented application. At the startup enabl in Karlsruhe, I am currently working on the automation of remote-controlled forklifts. Here, too, the focus is on the modularization of decision pipelines in general, as well as the use of self-learning AI agents based on human demonstrations.