business triangle technical hexagon

Fairing: Bringing Kubernetes for Data Scientists

Technical talk | English

Theatre 16: Track 4

Wednesday - 16.20 to 17.00 - Technical


This talk shows how a data scientist can seamlessly build, train, and deploy their machine learning models on Kubernetes using Kubeflow Fairing Python SDK. The goal of the open source Fairing SDK is to lower the barrier of entry to Kubernetes for the data scientist community. Fairing SDK is part of Kubeflow, an open source effort, to enable production grade machine learning support for Kubernetes.

Traditionally, packaging and containerization are the two steps which add a lot of friction to training and deployment of machine learning models in multiple environments. It is exacerbated further, if users do not have enough experience with Kubernetes, which is often true for most data scientists. The talk covers two important topics for data scientists: Machine learning workflow on Kubernetes using Kubeflow Fairing and how it makes the workflow portable from on prem clusters to cloud. This talk highlights challenges in conventional approach of Machine Learning on Kubernetes and how user with limited Kubernetes knowledge can leverage the strengths which devOps community has enabled over the years. It also compares and contrasts ML workflows with and without Fairing to emphasize the steps eliminated and simplification of usage experience.

The talk also covers machine learning best practices in hybrid cloud environment to minimize development time by training locally with sampled data and to minimize infrastructure cost by using GPUs or TPUs only during the course of model training.

Finally the talk will cover how to take your trained model into production by creating an online prediction endpoint on your Kubernetes cluster using Kubernetes’ serverless toolkit, Knative. Fairing makes it easy to add custom python pre-processing logic to online prediction models, which are often part of complex production models. Also users get features like autoscaling, node auto-provisioning, etc that are strengths of Kubernetes.

The three main takeaways of talk will be:
Key simplifications Fairing brings to make Kubernetes more accessible to a wider data scientist community
Best practices for debugging machine learning models locally using sampled dataset and bursting to a remote Kubernetes cluster for training on large dataset
Creating an online prediction endpoint with custom preprocessing.