Back to the program

Building a Spark Streaming application

Thursday 17th

from 16:30 to 17:10

Theatre 20

Workshop

This tutorial will show you a practical example of building a streaming application using Apache Spark Streaming. Lighter on theory and somewhat heavier on coding, but gentle on newcomers.

n the first part of the tutorial, I will give a quick overview of Apache Spark and its architecture. You will learn about different Spark cluster types (local, Spark Standalone, YARN and Mesos); about main processes in a Spark cluster (driver and executors); about job and resource scheduling; about Spark's Web UI; about RDDs (resilient distributed datasets), DataFrames, and DataSets, all three main abstractions in Spark; and about different components that comprise Spark's rich API (Core, SQL, Streaming, GraphX, MLlib and ML). Then you will hear about Spark Streaming in more detail.

In the second part of the tutorial, I will show you the steps required for building a Spark Streaming application that reads event data from a Kafka topic, parses and counts the events generating event metrics, and pushes the metrics to a different Kafka topic. Then you will see the source code of an example Java web application that reads those aggregated metrics and pushes them to client browsers using websockets. The clients will display the metrics on a live dashboard.

The Spark Streaming job will be written in Scala, the web application in Java, and client code in JavaScript with D3.js library.

Petar Zecevic

SV GroupCTO