Back to the program

A unified (future proof) model for Big Data processing. An introduction to Apache Beam

Friday 18th

from 13:40 to 14:20

Theatre 18

Keynote

When we start a new Big Data project, we have an overwhelming amount of decisions to take: How to develop it?, What big data analysis framework to choose?, etc. This talk introduces Apache Beam, a unified model to create Big Data systems. Beam represents batch and streaming processes (pipelines) with a single model. It includes interesting abstractions to deal with out of order and late data. Beam is runtime-agnostic. For this reason, Beam is flexible not only for development but also for deployment because it can be run on-premise with Apache Spark or Apache Flink, or on the cloud with Google Cloud Dataflow.

What makes the proposal unique:
This talk introduces Apache Beam a unified model to represent Batch and Streaming systems independent of the execution environment. Beam solves the problem of needing to rewrite all the data analysis code once we want to switch to a new data processing framework.

Ismaël Mejía

TalendSoftware Engineer