from 10:45 to 11:25
Apache Hadoop was originally designed for processing large stored data sets. Many tools, e.g. Apache Hive and Apache Spark, joined the ecosystem to enable everything from interactive SQL queries to machine learning on the data stored in Hadoop. Projects such as Apache Ranger and Apache Atlas added security and governance to meet the needs of enterprises using Hadoop. At the same time, users are no longer willing to wait until data is stored in Hadoop to begin processing and querying it. With tools like Apache Kafka, Spark, Apache NiFi, and Apache Storm it is possible to process and query data while it is still in motion, on the way to a Hadoop cluster. Finally, many enterprises are moving their data into the cloud, pushing Hadoop and related projects to optimize themselves for the cloud environment. Responding to these trends, vendors such as Hortonworks are combining a number of Apache projects to produce an enterprise grade connected data platform. This talk will explore these trends and consider what changes they are pushing into the Hadoop ecosystem.