How robots learn to understand humans

To the blog post about Rudolf Lioutikov

Rudolf Lioutikov revolutionizes human-robot interaction with his Intuitive Robots Lab at KIT – and successfully competes with US tech giants

This article first appeared on the website of the Karlsruhe Institute of Technology (KIT). We are publishing it here with kind permission. Additional perspectives can be found in the LookKIT (2025 edition, p. 49).

When Rudolf Lioutikov talks about intelligent robots, it doesn’t sound like science fiction. The Professor of Machine Learning and Robotics at KIT is talking about everyday things: smart machines are supposed to hand you a cup or put a glass on the shelf. Nevertheless, his vision is revolutionary: Lioutikov wants to develop robots that really understand people. They should not only perform tasks, but also be able to communicate and cooperate with people in a natural way – even if they have no prior technical knowledge.

To achieve this, he is relying on a new generation of AI models that can recognize speech and images and derive meaningful behaviour from them. His goal: robots should be able to communicate with humans as intuitively as we do with each other – without complicated commands, but through eye contact, tone of voice or facial expression. His team is working particularly hard to improve and further develop these models themselves – an approach with which it is one of the pioneers in Europe. “Robots must not only be able to understand human intentions, but also make themselves understood,” says Lioutikov. With his Intuitive Robots Lab, the 38-year-old is even competing with the US tech giants and is receiving worldwide recognition for his work.

Technology that understands people – and vice versa

There is a great need in society: intelligent machines that adapt flexibly to new situations are required in areas such as care, household and industry – without users having to provide large amounts of data or understand complex systems. This is precisely where Lioutikov’s research comes in: “We want to make technology directly accessible and usable for people.”

But how do researchers want to achieve the goal of more humanity in technology? Behind the spectacular videos of robots running across a field, climbing steep stairs or performing somersaults, there is sometimes a lot of programming involved. “Current machine learning methods are often not focused enough on the user,” says Lioutikov. “We are developing learning methods that enable robots to learn from interaction with humans – and also to deal with incomplete or incorrect information.” This would make robotics more accessible in everyday life.

To the blog post
Pankhuri Vanjani, PhD student at the IRL, and Rudolf Lioutikov, head of the Intuitive Robots Lab (IRL), discuss how they can advance their research despite unforeseen limitations of the hardware used. Photo: Magali Hauser, KIT

The search for the “ChatGPT moment”

Large US corporations such as Google and Meta are investing billions in so-called Large Behavior Models (LBMs). These AI models are intended to equip robots with general, versatile behavioral capabilities – similar to the capabilities of large language models such as ChatGPT. They are not just capable of one specific task, but can perform many different tasks flexibly without having to be reprogrammed or trained for each one. A robot with an LBM could, for example, set a table, fetch a tool, show a person the way or open a door – all based on a general understanding of the environment, language and action.

The problem: robotics is still looking for its “ChatGPT moment” – a breakthrough that will make robots as powerful and flexible as the large AI language models. LBMs are considered the key technology for this, but the models operate with huge amounts of data and are very complex. They learn from millions of demonstrations, videos, sensor recordings and voice inputs how people behave in certain situations and transfer this knowledge to the robot.

Small models, big impact

Rudolf Lioutikov, on the other hand, focuses on efficiency. His vision: smaller, more efficient and explainable LBMs that can manage with little data and are suitable for on-premise use – i.e. locally, without cloud dependency. With a small team, he develops vision-language-action models, i.e. AI systems that can see, understand and act. And with considerable success. The Intuitive Robots Lab at KIT is one of the few research labs in Europe that is actively working on such models – and competing with US start-ups worth billions.

“Our models are smaller, faster and require relatively little data,” says Lioutikov. Nevertheless, they achieve comparable – or even better – results. The team deliberately relies on local systems, which means more independence and better data protection for users.

With FLOWER, the team has developed the first European vision-language-action model that runs on commercially available hardware and can be trained in just a few hours – a milestone for resource-saving robotics. BEAST, in turn, can display movements in a particularly compact and fluid way, similar to a navigation system that smooths out a route. “FLOWER and BEAST have enormous potential, especially in the care sector or in the home, where intuitive and reliable interaction is required,” says Lioutikov.

Cover picture: Experimental setup of the research team at the Intuitive Robots Labs (IRL) led by Rudolf Lioutikov. The team is researching how robots can communicate and cooperate with humans in a natural way using modern AI models. Photo: Magali Hauser, Karlsruhe Institute of Technology