business triangle technical hexagon

Bayesian Voice Emotion Detection Applied to Robotics: Adding Uncertainty

Technical talk | Spanish

Theatre 21: Track 5

Thursday - 14.40 to 15.20 - Technical


- -

Artificial Intelligence techniques that are currently having the most impact are very dependent on data. Deep Learning algorithms learn to detect nonlinear correlations between data and because of the development of Internet of Things based technologies that allow to collect large amounts of data and distributed frameworks that allow the processing and storage of that Big Data, these deep neural networks have managed to generate great results in many areas such as Artificial Vision, NLP, Tabular Data Classification and Regression, Anomaly Detection, Recommendation Systems, etc.

When you want to model very well-defined environments and you have a large amount and variety of data that are also of quality for the prediction of the objective then Deep Learning techniques provide very good results. This is where the first weak points of these algorithms begin to appear. The availability of data with these characteristics of volume, variety and quality is not available to many companies. In order to try to promote the advance of this type of technology, both in research and in the business sector, transfer learning techniques are used, which allow deep networks trained on large data sets, so that these backbone networks will serve as extractors of features to be applied in other domain problems related to the initial datasets.

In addition, the usual techniques of Deep Learning don’t incorporate measures that allow to know the lack of certainty that the trained model has on the predictions it makes, which means that when it comes to serving systems that use these models in production, people can accept with excessive trust the predictions.

This problem is the one that is intended to be addressed in the lecture and for this, it will begin by presenting a problem to be solved by means of Deep Learning algorithms: the detection of emotions based on the tone of voice messages. This base model is part of a research project called DIA4RA (Desarrollo de Inteligencia Artificial para Robótica Asistencia in spanish) whose objective is to provide predictive models so that a humanoid robot Pepper can help in the assistance of people with Alzheimer’s in the Reference Center of Alzheimer of Salamanca, so that the robot can collect the output of these models in real time to propose to patients games and tests that allow doctors to know the evolution of the disease.

Once the main challenges that we face when applying Deep Learning models in open environments and of special sensitivity, such as the decision making of a robot that must interact with patients have been shown, Bayesian techniques will be developed to make it possible to know the Aleatoric and Epistemic uncertainty of the predictions of models focusing on the practical case of the detection of emotions from the tone of voice developed with Tensorflow.

Summarizing, a first objective of this talk is to show recipes for the challenges we have found when combining Deep Learning with Robotics in open environments. Next, we want to contrast the theory of constructed emotion about how people detect emotions from the neuroscience point of view vs how these mechanisms are being implemented using Deep Learning focusing on the development of an emotion detection model. from the tone of voice. And finally we will see how to evolve this model by adding measures that allow us to know the Aleatoric and Epistemic uncertainty of this model using Bayesian techniques.