The first international conference in Spain about Big Data with leading experts in data mining, data cleasing, distributing storing, cloud computing, sharing, data analysing and visualisation.Big Data is a technological challenge and a business opportunity. The conference Big Data Spain 2013 will introduce Big Data to developers and business managers in Madrid.
November 7th, 9:00AM
November 8th, 17:00PM
Platform Hive, batch and interactive SQL on Hadoop by Alan Gates
Hive is the most used SQL platform on Hadoop. It originally focussed on large batch, ETL processing where it scales to handle multiple terabytes of data. But in the last year Hive has been changing quickly on two fronts: performance has been improved drastically and its SQL has been significantly enhanced.
Work going on in the Hive and Tez communities has been driving run times down to under ten seconds for some queries, making Hive viable for use in reporting and ad hoc querying. This has been accomplished by a combination of Tez, a new execution engine for Hive and Pig, ORC, a new file format for Hive, and improvements in Hive's optimizer and executor. Hive's SQL has also been expanded to include more standard SQL data types add windowing and analytic functions.
More SQL improvements are coming, including subqueries and full ACID inserts, updates, and deletes. Work is also ongoing to continue to improve performance. This talk will cover the progress Hive has made in the last year and current work that is going on in the community to drive Hive forward.
Take away points: * Hive's performance is improving by orders of magnitude. * Hive is improving its SQL support to include datatypes and features needed for OLAP queries and integration with BI tools.