business triangle technical hexagon

Operationalizing Data Science using the Azure stack

Technical talk | English

Theatre 16: Track 4

Wednesday - 17.05 to 17.45 - Technical

- -

You are involved in a data science project. You start by cleaning the data and building the first experimental models, which look really promising, but when you refactor the pipeline and try to create the final model, the results are not as good as they were before. Oh no! What were the parameters you tried in your previous experiments? And how was the dataset configured? Did you store that super nice old model anywhere in disk or is it just lost for good?

You finally get a nice model, and it’s time to put it into production. You save all the relevant structures in files, build the web service from scratch, deploy it… but wait, the code worked when you ran it on your machine, why are you seeing this weird error in the web service logs? Are the package versions the same in both environments? And once your prediction API is up and running it’s time to think about the future: what will happen when you need to upgrade software in the server? Should you build a replica in a different machine and a rerouting system to ensure availability? You better start, that’s going to take time.  

Good job! Everything is finally ready! But then you need to retrain the model and the nightmare starts all over again. And don’t even think about automatically retraining every month, monitoring different model’s performance, and changing the current model in production if it’s outperformed by any of them. Building such a system will take ages!  

Anything sound familiar? Azure has a few tools that can help you in each of these scenarios and many others.  

On one hand, Azure Machine Learning service helps with both experiment traceability and model deployment, accompanying the data scientist through all the phases of the project, from transforming data, training models remotely and storing the results in the cloud, to monitoring performance or deploying a prediction API using Kubernetes.  

On the other hand, we can use Azure DevOps to build our Machine Learning Continuous Integration and Continuous Delivery pipelines, that will ensure our models are always up to date with minor deployment effort.  

In this talk we will go through the details of this MLOps approach, which is applicable to any data science project and independent of technology. We will illustrate how we can build such an architecture using some capabilities of the Azure stack, and will also review the open source alternatives that can help successfully implementing a robust Machine Learning system.