18 November. 17.30 - 18.10 | Garage

Federated learning promises to improve data security, model accuracy and system resilience. Operational challenges dominate the time required to bring these promises to production: obtaining training data, comparing learning strategies and maintaining model integrity despite network unreliability. Techniques to address each of these problems are well-known: generating training data from physically accurate models, for example. But addressing each of these issues with individual applications creates inefficiencies: data scientists and architects must navigate a complex collection of parts rather than a seamless integrated solution. An efficient system must integrate best-in-class services and minimize or eliminate the boundaries between them. We develop, tune and deploy an anomaly-detecting machine learning model to demonstrate the enterprise benefits of streaming data from physical models into a federated learning architecture. Accurate physical models produce multiple training data sets, each training a single machine learning model to recognize a specific anomaly. Federated learning combines individual machine learning models into a robust production-ready classifier. Integrating streaming data into the development process mimics the production environment, enabling data scientists to validate their solutions under real-world conditions. A single platform for developing training, federated learning and classification algorithms enables a rapid feedback loop for model evolution. Sharing a networked datastore between the development environment and the production system provides a mechanism for continuous training and redeployment of improved models. Our system uses MATLAB for algorithm development and validation, Kafka for streaming data management, MATLAB Production Server to host classification algorithms, Redis for machine learning model deployment and Grafana to monitor the production system and display alerts for detected anomalies. Simulink models provide physically accurate synthetic data for the training data sets. We show how on-premises hosting speeds development and then scale the solution horizontally via integration with cloud-based platforms. We present both our architecture and a demonstration of the system in development and production. We will walk through the end-to-end workflow, with particular emphasis on the integration of streaming data into the development environment and the benefits to data scientists of a simulated production environment. We show how physical models accelerate bootstrapping the system by providing training data without requiring access to real-world assets, and how the use of model parameterization allows injection of behavioral anomalies into the data stream without damaging or destroying those assets. We discuss the system in the context of MLOps, highlighting operational successes and areas for future growth. In particular, the use of design principles such as dependency inversion allowed us to create a production-quality architecture focused on system integration and cooperation. Throughout, we emphasize the importance of knowing your core competencies and competitive advantages and using that understanding to choose between software development and component integration. We identify the strengths of our platform – algorithm development and physical model-based design — and show how that knowledge shaped the architecture of a federated machine learning system. Separating configuration from code, for example, was particularly important: provisioning strategies like infrastructure as code require architectures to be externally configurable. But designing an externally configurable system requires additional effort to choose, name and scope configuration parameters. We conclude with a summary of the effect of such architectural tradeoffs in an operational system as they informs the system’s evolution.