Machine Learning for federated privacy-preserving scenarios
Technical talk | English
Technical talk | English
Theatre 20: Track 3
Thursday - 15.25 to 16.05 - Technical
In the presentation we will give a description of the Musketeer project (http://musketeer.eu), funded by the Horizon 2020 programme, which is the biggest EU Research and Innovation programme. This project aims at providing:
1. Machine Learning over a high variety of different privacy-preserving scenarios.
2. Robustness against external and internal threats.
3. Enhancement of the Data Economy.
4. Standardized and extensible architecture.
5. Industrial demonstration of the technology advances in operational environments.
The massive increase in data collected and stored worldwide calls for new ways to preserve privacy while still allowing data sharing among multiple data owners. Today, the lack of trusted and secure environments for data sharing inhibits the data economy while legality, privacy, trustworthiness, data value and confidentiality hamper the free flow of data.
This talk will first explain the benefits of sharing data. Since creating complex models requires large datasets, one common barrier to using advanced machine learning techniques is the amount of data needed to train a model. In practice, however, compiling large datasets is intensive and time-consuming. To overcome this problem, different partners in a data economy can benefit from sharing datasets in order to improve the accuracy of their respective machine learning models.
After that, we will analyze the main privacy barriers that limit data sharing: legal, ownership, confidentiality, information leakage, privacy, and trustworthiness.
Then, we will describe the Musketeer project, which aims at developing an open source Industrial Data Platform (IDP) instantiated in an inter-operable, highly scalable, standardized and extendable architecture, efficient enough to be deployed in real use cases. It incorporates an initial set of analytical (machine learning) techniques for privacy preserving distributed model learning such that the usage of every users’ data fully complies with the current legislation or any other industrial or legal limitation of use. Musketeer does not rely on a single technology. Different Privacy Operation Modes (POMs) will be implemented with the machine learning algorithms developed over them. These POMs have been designed to remove some privacy barriers and each one describes a potential scenario with different privacy preserving demands, but also with different computational, communication, storage and accountability features. To develop the POMs a wide variety of standard Privacy Preserving Technologies (PPTs) will be used such as Federated Machine Learning (FML), homomorphic encryption, differential privacy or secure multiparty computation – also aiming at developing new ones or incorporating others from third parties in the future.
To further foster the development of a user-centered data economy based on data value (ultimately enabling the data and AI-driven digital transformation in Europe), we will explore reward models capable of estimating the real contribution of a user’s data to the improvement of a given task, such that a fair monetization scheme could be possible.
Finally, we will describe new potential dangers. Machine learning systems are vulnerable to data poisoning, a coordinated attack in which a fraction of the training dataset is manipulated by an attacker to subvert learning. Since we are speaking about collaborative scenarios, the security and robustness against these attacks will be ensured, not only with respect to threats external to the IDP, but also from internal ones, by early detection and diminishing the potential mis-behaviours of IDP members (adversarial attacks).