While we will do our best to keep this program at all times, please be aware that changes, re-schedules and cancellations might happen for reasons out of our control.
Businesses have been using big data for centuries, but the concept has only become widely accessible in the last decade as computing power has become cheaper and open-source software has dramatically expanded the number of businesses that can afford big-data tools. I'll offer a history of big data, tracing its usage among businesses and government. I'll also explore its recent embrace by commercial concerns, try to quantify its impact, and look at what it might do for us in the future.
One of the most important principles of NoSQL is, One size does not fit all. But the proliferation of options makes it difficult to know which solution is most appropriate for your problem. Jonathan will cover five important questions with which to evaluate potential solutions, and give examples how Apache Cassandra answers those questions.
The big data revolution is more than just terabytes or petabytes of data. It is also the application of new paradigms, languages, and tools to these data sets. This is a great strength of big data, but also a liability. These tools have different data models, different utilities for reading and writing data, and different frameworks for including user code.
The massive computing and storage resources that are needed to support big data applications make cloud environments an ideal fit. Now more than ever, there is a growing number of choices of cloud infrastructure providers, from Amazon AWS, OpenStack offered by the likes of HP, Rackspace and soon even Dell, VMware vCloud as well as private cloud offerings based on OpenStack, CloudStack, vCloud, and more. There is also a new class of bare-metal clouds from SoftLayer and PistonCloud that provide high performance resources designed for I/O and CPU intensive applications that don’t run as well on a virtualized resources. The recent announcements by Google & Microsoft about their new infrastructure as a service offerings, add additional significant players to this growing marketplace.
Given the diverse options, and the dynamic environments involved, it becomes ever more important to maintain the flexibility to choose the right cloud for the job.
In this session, you'll learn how to deploy and manage your Hadoop cluster on any Cloud, as well as manage the rest of your big data application stack using a new open source framework called Cloudify.
Google knows Big Data - Our web applications scale to hundreds of millions of users and petabytes of data. Google has developed custom technologies to analyze this data and make intelligent product decisions. We’ve started to open up some of these technologies as APIs which allow developers to concentrate on their business problems, while Google handles the underlying infrastructure.
Google BigQuery is a Big Data analysis tool born from an internal technology known as Dremel. BigQuery enables developers to analyze terabyte-sized data sets in seconds using a RESTful API and a SQL-like query language. We'll demonstrate how you can incorporate Google BigQuery into your own applications, and how queries are processed underneath the covers. We’ll also show examples mining the Freebase knowledge graph and demos built by other developers like you.
There is a continued need for higher compute performance: scientific grand challenges, engineering, geophysics, bioinformatics, etc. However, energy is increasingly becoming one of the most expensive resources and the dominant cost item for running a large computing facility. In fact the total energy cost of a few years of operation can almost equal the cost of the hardware infrastructure. Energy efficiency is already a primary concern for the design of any computer system and it is unanimously recognized that Big Data systems will be strongly constrained by power.
One important way to make Big Data small is to use tools like open-source MongoDB to enable real-time analytics on huge data sets. Even technologies like Hadoop tend to be too slow and too cumbersome for most developers.
In this presentation, Brendam McAdams will argue that making Big Data powerful means making it useful to the business, and not merely to data scientists. While pointing to the benefits of MongoDB for a variety of use cases, he will focus on the broader implications of technologies and approaches that can make Big Data small.
In the summer months of 2012, Plain Concepts had the chance of working with a long-time partner in developing SHARP (SCADA Historical Analysis and Reporting Platform) - their new data gathering and analysis system - built on top of Microsoft's Hadoop Distribution and SQL Server 2012.
SHARP performs the data loading from wind power farms all across the globe to a Hadoop cluster, which serves many purposes, ranging from historical storage, performance model generation and data source for aggregated data to a relational Data Warehouse, on top of which SQL Server 2012 OLAP tabular models are built for high-performance, real time analytics leveraging PowerView and other tools.
There is a trend for CPU performance to increase at a faster rate than memory buses, which is creating a big problem of data starvation in current CPUs. This has a lot of implications on how the computing applications should handle the data in order to achieve optimal performance, In my talk I'll describe several techniques to cope with this problem. We will also see how, curiously enough, compressing data can help computations go faster. Finally, and based on the above, I'll show some implementations of efficient data containers for hosting Big Data.
* Subject to changes and adjustments