TALK

← Back to the schedule

Big Data w/Python on Kubernetes (PySpark on K8s)

Calendar icon

Thursday 15th

Time icon

16:10 | 16:50

Location icon

Theatre 25

Technology

One-liner summary:

Big Data applications are increasingly being run on Kubernetes. Data scientists commonly use python-based workflows, with tools like PySpark and Jupyter for wrangling large amounts of data. The Kubernetes community over the past year has been actively investing in tools and support for frameworks such as Apache Spark, Jupyter and Apache Airflow. Attendees will learn how these tools can be used together to build a scalable self-service platform for data science on Kubernetes as well as the benefits that Kubernetes can provide over traditional options.

Description:

There is growing interest in the consolidation of big data applications under a single infrastructure like Kubernetes. With more and more data scientists leveraging Python-based tooling for their Machine Learning workflows, this has prompted much development in the world of containerization and increased investment in the analysis of various model management solutions. The Kubernetes community over the past year has been actively investing in tools and support for frameworks such as Apache Spark, Jupyter and Apache Airflow. We will explain the design idioms, architecture, and internal mechanics of Spark orchestrations over Kubernetes and the on-going work of the community. Attendees will learn how these tools can be used together harmoniously to build a scalable self-service platform for data science on Kubernetes in an attempt to appreciate the benefits of native Kubernetes support.

MEDIA

Keynote