business triangle technical hexagon

How to train your robot (with Deep Reinforcement Learning)

Technical talk | English

Theatre 25: Track 1

Wednesday - 15.35 to 16.15 - Technical


Artificial Intelligence (AI) is transforming automated systems, from voice assistants and chatbots, to self-driving cars and robots. AI systems have the capability to learn and adapt as they incorporate experiences, in order to enhance their predictive abilities. Deep Learning (DL) is a subset of Machine Learning (ML) where artificial neural networks, algorithms inspired by the human brain, learn from large amounts of data. DL has disrupted the world of ML, allowing Deep Neural Networks to achieve near or better accuracy than humans in a variety of tasks: image classification, speech/hand writing recognition or autonomous driving.

Reinforcement Learning (RL) is revolutionizing the applications of DL –from playing and beating the best human players at video games to training robots to accomplish complex, technical tasks. RL is learning what to do (mapping situations to actions) to maximize a numerical reward signal. It has successfully trained computer programs to play games (Go, StarCraft II, etc.) better than the world’s best human players. These programs find the best action to take in games with large state and action spaces, imperfect world information, and uncertainty around how short-term actions pay off in the long run. Engineers and scientists face the same types of challenges when designing real systems (e.g. controllers). Can RL also help solve complex control problems like making a robot walk or driving an autonomous car? In this talk, we aim to answer this question by explaining what RL is in the context of traditional control problems, showing how to generate simulation data, setting up and solving the RL problem, allowing a virtual robot to learn complex tasks, like walking, using Deep Reinforcement Learning.

RL works in a dynamic environment, and the goal is to find the best sequence of actions that will generate the optimal outcome: collect the most reward. The Agent is the “brain” of the robot, a piece of software that explores, interacts with and learns from the environment, which is everything that exists outside of the agent. It is where the agent sends actions, and it is what generates rewards and observations. The reward function is what you want the agent to do and how you will reward it for doing what you want, it represents the “goodness” of an agent being in a particular state and taking a particular action. Thus, the agent takes in the state observations, the inputs, and maps them to actions, the outputs. This mapping is called the Policy. Given a set of observations, the policy decides which action to take, and the learning algorithm is the optimization method used to find the optimal policy. In Deep Reinforcement Learning, the Policy can be a Deep Neural Network, allowing us a more complex policy that can input thousands of states at once, and still come up with meaningful action.

The talk addresses the full workflow for Deep Reinforcement Learning: choosing an adequate environment, crafting a reward function, choosing a policy function, training and deployment. Using Model-Based Design, the talk demonstrates how to build and control a virtual biped humanoid robot in Simulink and leverages Deep Reinforcement Learning in MATLAB, specifically the Deep Deterministic Policy Gradient (DDPG), to successfully train the agent. Finally, we discuss how to deploy the optimal policies to the target hardware, using C/C++ or CUDA.