November 7th and 8th 2013

Kinépolis Madrid, Spain

Menu+

Speakers

We lining up some of the most relevant industry leaders in Big Data for keynote sessions. The time limit to submit proposals is over so we will soon reveal the definitive list of speakers.

Platform Hive, batch and interactive SQL on Hadoop by Alan Gates

Hive is the most used SQL platform on Hadoop. It originally focussed on large batch, ETL processing where it scales to handle multiple terabytes of data. But in the last year Hive has been changing quickly on two fronts: performance has been improved drastically and its SQL has been significantly enhanced.

Work going on in the Hive and Tez communities has been driving run times down to under ten seconds for some queries, making Hive viable for use in reporting and ad hoc querying. This has been accomplished by a combination of Tez, a new execution engine for Hive and Pig, ORC, a new file format for Hive, and improvements in Hive's optimizer and executor. Hive's SQL has also been expanded to include more standard SQL data types add windowing and analytic functions.

More SQL improvements are coming, including subqueries and full ACID inserts, updates, and deletes. Work is also ongoing to continue to improve performance. This talk will cover the progress Hive has made in the last year and current work that is going on in the community to drive Hive forward.

Take away points:
* Hive's performance is improving by orders of magnitude. * Hive is improving its SQL support to include datatypes and features needed for OLAP queries and integration with BI tools.

Keywords: Tez, Hive, Hadoop, SQL