Studying the use of simulation-based DRL for industrial automation.
In 1997, chess grandmaster Garry Kasparov lost in a historic 3½ – 2½ series against a computer named Deep Blue. It marked the end of human superiority in chess.
That day hasn’t yet come to pass for the classic pub game foosball, but it is coming—and soon. A team of researchers from Bosch Rexroth and DXC Technology have developed an automated foosball system called KIcker, and it’s on the verge of going pro.
Consisting of cameras, industrial PCs, controllers, servomotors and a neural network to tie it all together, KIcker can spin and slide its four player rods in real-time to shoot and pass a foosball on the miniature pitch of a standard table.
KIcker was developed not to crush the spirits of up-and-coming human foosballers, but rather to study the use of simulation-based deep reinforcement learning (DRL) for automating complex manufacturing processes.
“We initiated our foosball study because we wanted to know how to apply artificial intelligence, especially machine learning, in industrial automation,” explained Bosch Rexroth’s Hans Michael Krause.
KI: Künstliche Intelligenz
KIcker derives its stylized name from the German term for artificial intelligence, Künstliche Intelligenz. From the beginning, KIcker was designed to be smart.
“You cannot program every combination of if this, then that in foosball,” Krause explained. “Rather, we wanted to create a neural network to make decisions on how to move the players. That was our mission: no programming, just neural networks controlling the foosball game.”
Building up that neural network requires a process called machine learning. The KIcker team used a type of machine learning called deep reinforcement learning, or DRL. Deep means there are many hidden layers in the network, and reinforcement learning is a way of training a network in sequential decision-making through trial-and-error. This requires experience—lots of experience.
Illustration of a neural network with hidden layers between the input and output. (Image courtesy of DXC Technology.)
To learn how to play foosball, for example, a neural network is trained in such a way that desired behavior (e.g. scoring) is reinforced and undesired behavior (e.g. getting scored on) is subdued. The network must play foosball enough times that this reinforcement can take effect and the network can incrementally improve. Even for machines, practice makes perfect.
But this practice isn’t always easy to come by. A neural network like KIcker needs to be exposed to thousands of foosball games to learn how to play properly. Who’s going to volunteer as its opponent? It would be like playing foosball against a toddler for weeks on end and after all that, it could still only manage to tap the ball one in ten times. Oh, and this toddler’s arm is an industrial servomotor that at any time could slam the ball with breakneck speed at your face (or worse).
Fortunately, there’s another way.
Using Simulation for Deep Reinforcement Learning
Bosch Rexroth began the KIcker project in 2017, just in time for the 2018 World Cup in Russia. They originally intended to use human players to train the neural network (“we put the system in our lab and arranged for everybody to play on it”) but realized pretty quickly that wouldn’t be enough.
The Bosch Rexroth team playing a game against KIcker. (Image courtesy of Bosch Rexroth.)
“This was a little bit of naïve thinking,” Krause admitted. Fortunately, in September 2018, IT services provider DXC Technology took notice of KIcker and had a way to help.
“By accident we came to know about the KIcker project,” recalled DXC’s Sebastian Klöser. “We were already looking for applications of deep reinforcement learning in the manufacturing industry, but having the chance to collaborate with Bosch Rexroth was a perfect fit.”
Klöser and his team well understood the challenges of deep reinforcement learning. More importantly, they knew how to get around them. Rather than the inefficient and often impractical task of real-time, real-world reinforcement, DXC Technology uses simulation for DRL.
Imagine: instead of playing a real game of foosball with KIcker, you can simulate KIcker and have it play 1,000 virtual games all at once. With computational acceleration and parallelization, simulation makes DRL dramatically quicker. It’s safer, too—stray balls that are simulated hurt a whole lot less than stray balls that aren’t.
DXC Technology used Unity for their previous DRL simulations, and KIcker presented an opportunity to test the limits of the game engine.
“It was very interesting for us to see whether Unity's simulation capabilities were enough for such a complex situation that is faced on KIcker,” Klöser said. “The main concern was the physical precision in this highly precise and very fast system.”
Top: A simulation of KIcker created in Unity. Bottom: The real KIcker system as seen from its bird’s-eye-view camera. (Image courtesy of DXC Technology.)
The Reality Gap
Though simulation can make for much faster DRL training, it has its own set of challenges. The primary one is that the simulation must accurately reflect real-world conditions. Remember, the goal is not to train a neural network to be really good at a simulation; the goal is to train it to be really good at the actual thing you’re simulating. Virtual foosball does not equal actual foosball.
“That's the biggest challenge here, the difference between reality and nice toy systems that are usually used in the reinforcement learning community,” Klöser explained.
That difference is called the reality gap, and the KIcker team had to find a way to bridge it. The first and most obvious bridge is to make the simulation as close as possible to the physical system—a task easier said than done. The biggest challenge of the whole project, according to Klöser, was figuring out how to accurately model the complex trajectories of foosball rods in Unity. Fortunately for the team, they found an open-source MATLAB library that could do the trick.
The simulation control logic used for KIcker. The trajectories of the player rods were calculated with a MATLAB library called opt_control and plugged into the Unity simulation. (Image courtesy of DXC Technology.)
Even if you create what you believe is a realistic simulation, there is still the risk of it not being quite realistic enough. If so, you could wind up training a neural network that’s useless for your intended purpose.
That’s why the KIcker team used another technique to bridge the reality gap, called domain randomization. Put simply, the idea is to change your simulation parameters by small amounts every time so that the neural network isn’t hyper tuned to one specific scenario.
For example, the foosball used in the actual KIcker system is 34.5mm in diameter. The simulated foosball was 34.5mm in diameter on average, but it varied randomly in each simulation. It might have been 34.1mm in one simulation and 35.3mm in another, with values taken from a normal distribution. The KIcker team used the same technique to randomize the ball’s mass, friction, bounciness, trajectories and many other physical parameters.
“We randomized everything you can think of, but only in small amounts. Essentially, we forced our system to learn a behavior that does not require it to be very precise,” said Klöser.
Deep Blue, the chess computer that beat Garry Kasparov in 1997, was a 1.4 ton supercomputer crammed with dedicated chess processors. KIcker learned how to play foosball on a standard business laptop with an Intel i7 processor and 28 GB of RAM.
Leveraging Unity Machine Learning Agents (ML-Agents), the Unity simulation created 100 virtual KIckers in parallel, each playing back-to back games for 48 hours of real time. The time frame was intentionally short, since industrial applications of sim-based DRL would want maximum flexibility and minimum downtime. The data from the thousands of simulated games was transferred directly to the physical KIcker system on an industrial PC communicating via OPC UA with the KIcker controller. (Bosch Rexroth's newest automation platform ctrlX AUTOMATION can run the data in the form of Python code directly on the controller itself).
The double-whammy of a realistic simulation coupled with domain randomization proved effective, and the real-world KIcker behaved as expected. KIcker wasn’t yet a full foosball machine; the simulations encompassed a reduced scope, training only the striker rod to hit a static ball on net. But it was a successful proof of concept, and the team enthusiastically published a paper on their results. Here’s a video summary (skip to 1:40 for the comparison between simulation and reality):
“It was definitely a success,” reflected Klöser, who co-authored the paper. “In the beginning, we had all these questions, whether this is possible at all with a technology like Unity. We only tackled a reduced scope, but even there, from an engineering perspective, something like a diagonal shot is not at all straightforward, and highly dependent on the precision of the system. This being learned in a simulation and then transferred without any further adaption to the real KIcker, and seeing it there with the high velocities that are present on the KIcker system, that's a success both for our approach and also for Unity technology.”
Will KIcker be the Next Foosball Champ?
Today, KIcker plays full games against human opponents. It’s good, but not amazing. Bosch Rexroth’s Hans Michael Krause has trouble beating the system, though he admits to not being a great foosball player himself.
KIcker’s neural network, however, is no longer being actively trained. Since 2019, KIcker has been used as an educational tool for students at universities in Germany.
Bosch Rexroth’s Hans Michael Krause poses with the KIcker system. (Image courtesy of Bosch Rexroth.)
“At the moment we’ve stopped development,” explained Krause. “We use it for student work because we want to make them interested in automation. But if we could get it back and dedicate more time together with DXC, we would be able to make it compete against the pros.”
With KIcker, Bosch Rexroth and DXC Technology succeeded in their goal of demonstrating that simulation-based DRL can be effective for industrial automation. Their goal was never to create the world’s best foosball player—but it sure would be fun to take that next step.
“I think everybody would love to see that,” Klöser said. “Many of the fundamental questions have been answered, so it's pretty straightforward what to do next. Of course, we cannot guarantee KIcker would beat the top world player. But on the other hand, we've been surprised so many times seeing the performance of this machine.”
To learn more about using Unity for AI and machine learning, check out Unity Simulation.