from 16:30 to 17:10
This tutorial will show you a practical example of building a streaming application using Apache Spark Streaming. Lighter on theory and somewhat heavier on coding, but gentle on newcomers.
n the first part of the tutorial, I will give a quick overview of Apache Spark and its architecture. You will learn about different Spark cluster types (local, Spark Standalone, YARN and Mesos); about main processes in a Spark cluster (driver and executors); about job and resource scheduling; about Spark's Web UI; about RDDs (resilient distributed datasets), DataFrames, and DataSets, all three main abstractions in Spark; and about different components that comprise Spark's rich API (Core, SQL, Streaming, GraphX, MLlib and ML). Then you will hear about Spark Streaming in more detail.
In the second part of the tutorial, I will show you the steps required for building a Spark Streaming application that reads event data from a Kafka topic, parses and counts the events generating event metrics, and pushes the metrics to a different Kafka topic. Then you will see the source code of an example Java web application that reads those aggregated metrics and pushes them to client browsers using websockets. The clients will display the metrics on a live dashboard.