Institute for Technologies and Management of Digital Transformation

Utilizing Embeddings to Learn a Universal Customer Behavior Representation in E-Commerce

Miguel Alves Gomes' dissertation examines how embeddings can be used to create general customer representations in e-commerce.
In online retail, good personalization determines success or failure. At the same time, data protection and data minimization are becoming increasingly important. The approach developed by Miguel Alves Gomes uses embeddings that are learned from observed customer behavior. These latent representations can then be used in a variety of ways for different personalization tasks, such as predicting purchase or click intentions.
In order to reflect the highly dynamic nature of e-commerce, the embeddings were expanded to include lifelong learning mechanisms. Through incremental and continuous learning, the models continuously adapt to new trends and changing user behavior without immediately losing what they have learned so far.

We asked Miguel about his dissertation:

What was the context of your dissertation? What projects or other factors particularly influenced your dissertation?

When I started at TMDT, I had the opportunity to get a taste of various projects, one of which was the industry project with Breinify. The work was challenging, but also very inspiring. I found it particularly fascinating how complex human behavior is compared to other types of data.
While images or language often follow clear patterns, human behavior is highly individual, yet patterns can be identified that can be represented mathematically in a latent vector space. This insight motivated me to devote my dissertation to precisely this challenge: how behavior can be generalized using machine learning methods without losing its individuality.

How does your work contribute to the field of research?

Personalization in e-commerce is not a new topic. Large companies such as Amazon and Alibaba have been shaping this field for years. They have enormous amounts of data at their disposal to train huge models and continuously optimize them.
But what happens when there is not enough data available, for example in smaller online shops where users do not have to register and interactions are correspondingly sparse? In such scenarios, traditional expert systems are usually used because data-hungry AI models are simply not applicable here. However, these systems have two key disadvantages:
1. They generate rigid, application-specific customer representations that have to be manually adjusted every time the use case changes, for example.
2.    They are not real-time capable, which limits their use to campaigns or batch analyses.
This is where my approach comes in: Through self-supervised learning, embeddings can be learned directly from existing interaction data, even with small amounts of data, without labels and without explicit user information. This enables flexible, scalable, and privacy-friendly personalization that can be used in real time, even for unknown users.

What does the future hold for you and the topic?

Die Vorhersage von Verhalten fasziniert mich weiterhin sehr. Ich möchte das Thema über den E-Commerce hinausdenken, denn Verhalten spielt in vielen Bereichen eine Rolle:

Im Bildungsbereich etwa kann es helfen, Lernprozesse besser zu verstehen und individuell zu unterstützen. In der Medizin könnte es genutzt werden, um Krankheitsverläufe oder Therapieerfolge vorherzusagen. Selbst im Tierschutz kann das Verständnis von Verhaltensmustern neue Perspektiven eröffnen.

Ich sehe Verhalten daher als eine universelle Datenkategorie und möchte weiter erforschen, wie wir diese mithilfe von KI besser verstehen, modellieren und verantwortungsvoll nutzen können.

Ich freue mich, dass ich auch am TMDT bleiben werde, um genau diese Verhaltensthematik weiter voranzutreiben und neue Anwendungsmöglichkeiten datengetriebener Verhaltensmodelle zu erschließen.