Talk | Technical | English

In this talk Newtral will show how we are leveraging the Transformer-architecture to build deep learning systems to automate fact-checking.


Today, human fact-checkers are overloaded by the massive amount of disinformation in the Internet. Fake news is easy to generate but data verification is a slow, human-intensive task. Without tech support humans can not win this fight. Newtral is combining its expertise in fact-checking (we provide official fact-checking services for Facebook, Tiktok or Whatsapp) and deep learning architectures to create the next generation of fully automated fact-checking systems. Our goal is to develop AI-assistants to increase fact-checkers productivity by 30 times and save up to 90% time and cost in fact-checking operations.


We will start with a brief introduction to our previous work, based on traditional machine learning models (SVM, Decision Trees…) and feature extraction through well-known NLP frameworks. Next, we will briefly introduce the transformers architecture and its quick evolution in the last three years, highlighting how this new tech has supported novel use cases in the industry, including speech to text engines, text generation models and improved QA mechanisms. After providing the audience enough context to understand our technology stack, we will detail the design-training-testing process followed to fine-tune BERT-like models for two specific use cases: 1) the automated detection of verifiable sentences and 2) measure semantic similarity among sentences in different languages


We will explain the main challenges found in this process, including limitations on fixed-size vocabulary models, multi-language approaches and data quality issues. We will show the final outcomes of our current AI system tested on 21 EU languages.


Once described our iterative design process, we will briefly introduce our production deployment using AWS Neuron. In order to minimize internal costs we minimized the number of models (from three to one) and compiled our architecture to run on high-performance and low latency inference using AWS Inferentia-based Amazon EC2.


Finally we will present our prototype solution, being used by Newtral and other 5 fact-checkers currently, showing a real-life scenario use case of how this tech works in our day to day. Two different prototypes will be explained: A) a video fact-checking tool, that automatically transcribes video/audios and spot automatically relevant fact-checks; B) a twitter monitoring tool (ClaimHunter) that automatically follows a set of political-related accounts and notifies fact-checkers when factual tweets are detected. We will describe the main functional components of these technical solutions


Finally we will introduce the main challenges ahead of us to reach fully-automated solutions. We will briefly explain how the Unified Text-to-Text Transformer (T5 architecture) works and will discuss how multi-task training could help us to build a new generation of expert fact-checking systems. Besides, we will explain what other main challenges are still unsolved in the current SoA to automated data verification. Model explainability in neural networks is one of the most important research challenges ahead, because the AI must say not only if something is true or not, but explain the reasoning behind its judgment. Besides, we will also briefly discuss AI ethics involved in the development of such a system and what principle designs to apply to limit potential bias.