Dr.- Ing. Constantin Waubert de Puiseau

Advancing Training and Inference Methods for Deep Reinforcement Learning-Based Job Shop Scheduling

The dissertation deals with the use of self-learning agents for production planning tasks. In particular, innovative methods for more effective training and more efficient use of trained agents were developed in order to be able to find shorter production plans in less time than before. In conjunction with a focus on reliability criteria, this contributed to the application of agent-based production planning.

We asked Alexander about his dissertation:

What was the context of your dissertation? What projects or other factors particularly influenced your dissertation?

The topic of AI-based production planning was brand new when I started my PhD in 2019. Deep reinforcement learning (DRL) had just made a sensational and media-effective name for itself as a promising approach in strategic computer games, and more recently in chess and Go. However, the transfer of this methodology to industrial problems had not really taken off yet. A joint research project with AIRBUS was set to change that.
The fascination with the idea that a computer or an artificial neural network can develop its own strategy has not waned to this day. But it was the combination with an application such as production planning, which has been repeatedly confirmed by industrial partners over time as omnipresent and highly relevant, that motivated me to continue my research.

How does your work contribute to the field of research?

DRL for production planning is an interdisciplinary field of research. My work makes contributions that are inspired by different perspectives. For example, I have investigated ways of specifically integrating domain knowledge from the field of operations research, which has been developed over decades, into the methodology. On the other hand, I have followed trends in AI research and profitably transferred achievements such as curriculum learning and transformer architectures to this use case.

I hope that I have not only contributed individual methods, but perhaps also been able to show that such changes in perspective are worthwhile. For this reason, in my final chapter, I have brought together different perspectives on reliability and developed a uniform nomenclature and uniform evaluation metrics—as a basis for future interdisciplinary work that will bring the methodology into application.

What does the future hold for you and the topic?

The topic will be continued at TMDT. As is often the case, my dissertation has raised new questions that need to be answered. I am very happy about that! I would like to continue to actively follow and shape these developments. For me personally, however, the first step is to enter industry, where I will be working on production IT at SCHOTT AG in Mainz. Perhaps this will serve as preparation for agent-based production planning.