bigdataspain.org

THANK YOU FOR AN AMAZING CONFERENCE!

The 3rd edition of Big Data Spain in Nov 2014 was a resounding success.
Watch the video below and find out why our attendees, speakers, partners and friends turned Big Data Spain into one of the largest events in Europe about Hadoop, Spark, NoSQL and cloud technologies.

Real time analytics with MapReduce and in-memory

AnalyticsEnglish
Operational systems manage our finances, shopping, devices and much more. Adding real-time analytics to these systems enables them to instantly respond to changing conditions and provide immediate, targeted feedback. This use of analytics is called “operational intelligence,” and the need for it is widespread. Financial trading applications must rapidly respond to fluctuating market conditions as market data flows through trading systems. E-commerce systems must reconcile orders with inventory changes on a second by second basis and need to quickly respond to shopping behavior to offer personalized recommendations. Smart grid monitoring systems need to continuously analyze telemetry from many sources to anticipate and respond to unexpected changes in power grids.

In all of these examples, live, fast-changing data sets churn in active, ongoing operations. The combination of in-memory computing and data-parallel analysis (such as MapReduce) running on a cluster of commodity servers allows these systems to continuously track and analyze this data, extract important patterns, and generate immediate feedback that steers the system’s behavior. These technologies enable the creation of an in-memory model for active entities within an operational system which continuously tracks changes to corresponding real-world entities and analyzes them in parallel. In-memory models provide a natural basis for correlating incoming events, enriching them with relevant historical information, and structuring a parallel analysis of aggregate behavior.

For example, clickstream data from a population of online shoppers can update an in-memory model of individual shoppers. Using an object-oriented approach, each shopper is represented by a memory-based object which holds a dynamic collection of time-ordered clickstream events as well as preferences and historical shopping patterns obtained from secondary storage. This object-oriented view enables incoming events to be easily correlated, and it provides the basis for a data-parallel analysis of shopping activity both to generate immediate, personalized recommendations for active shoppers and to look for emerging trends (such as determining the most popular items or the response to a sale).

This talk will describe the use of in-memory models to obtain operational intelligence in several scenarios, including financial services, ecommerce, and cable-based media. It will show both how the model is constructed and how a data-parallel analysis can be implemented to provide immediate feedback. Performance results from a simulation of 10M live cable-TV set-top boxes will illustrate how an in-memory model was used to correlate and enrich 25K events per second and complete a parallel analysis every 10 seconds on a cluster of commodity servers.

The talk also will examine the simplifications offered by this approach over directly analyzing incoming event streams from an operational system using complex event processing or Storm. Lastly, it will explain key requirements of the computing platform for an in-memory model, in particular real-time updating of individual objects and high availability, and compare these requirements to the design goals for stream processing in Spark.


Join our Newsletter