Big Data Spain

15th ~ 16th OCT 2015 MADRID, SPAIN #BDS15


THANK YOU FOR AN AMAZING CONFERENCE!


THE 4th EDITION OF BIG DATA IN Oct 2015 WAS A RESOUNDING SUCCESS.

BUILDING A REAL-TIME STOCK PREDICTION ENGINE POWERED BY SPRING XD, APACHE GEODE AND SPARK ML

Thursday 15th

from 17:45 pm to 18:30 pm

Room 19

-

Workshop

Finance market prediction has always been one of the hottest topics in Data Science and Machine Learning. However, the prediction algorithm is just a small piece of the puzzle. Building a data stream pipeline that is constantly combining the latest price info with high volume historical data is extremely challenging using traditional platforms, requiring a lot of code and thinking about how to scale or move into the cloud.

This session is going to walk through the architecture and implementation details of an application built on top of open-source tools that demonstrate how to easily build a stock prediction solution with no source code - except a few lines of R and the web interface that will consume data through a RESTful endpoint, real-time. The solution leverages in-memory data grid technology for high-speed ingestion, combining streaming of real-time data and distributed processing for stock indicator algorithms.

These components will be explained and demonstrated during the session:

  • The different data sources available to get the information about the stock values and related information.
  • Spring XD is the tool that ingests, transforms and sinks the data into other components of the architecture.
  • Spark as the distributed computing part to process the transformed data using a machine learning model engineered in R language.
  • Geode as the in-memory database necessary to store the resulting data and compare it with historical series of past data.
  • Frontend interfaces created using JavaFx tool and D3.js framework for web representation.

One core element of this architecture is the streaming engine, using SpringXD. Streams process data in motion. Current trends with the Internet of Things (IoT), mobile applications, digital business, big data, data science, and machine learning are driving streaming requirements—typically referred to as real-time streaming, stream computing, and stream processing. Importantly, the business use cases and technical implementations are characterized by the need to act in real-time on large volumes of data in motion—as data is created and ingested.

Streaming workflows are often made up of continuous ingestion, various types of data wrangling or transformation, advanced analytical processes, and export steps. With timeframes measured in seconds or even milliseconds, companies process streaming information to act on opportunities before they are lost, using data science and machine learning algorithms to achieve predictive and prescriptive insight wherever possible, as long as it doesn’t impact the real-time requirements for the application in question.

From a business perspective, streaming also shows up in scenarios where compliance, loss prevention, cost reduction, or taking advantage of revenue generating opportunities can only be acted upon in a small window of time. In B2B, streaming is being applied anywhere systems, servers, machines and sensors are managed—manufacturing, energy, telecommunications, utilities, surveillance, network security, logistics, oil rigs, and healthcare. Use cases include alert monitoring, preventative maintenance, re-routing, optimization, availability, and utilization.

Companies are also innovating with Internet of Things (IoT), making machines and sensors do new things—like using tractor as soil sensors. In B2C-oriented industries such as media, advertising, consumer packaged goods, and retail, there are also use cases based on data streaming from mobile applications, social networks, and clickstreams. Here, streaming helps to better engage, react, and support consumers in real-time. Of course, financial services companies have historically managed risk, fraud, transactions, markets, and trades with systems that are now ancestors of modern streaming systems. In this session, we'll learn about one of these interesting use cases

Antonio Gallego foto

Antonio Gallego

PivotalSr. Field Engineer