business triangle technical hexagon

The present and the future of Cloudera

Technical talk | English

Theatre 20: Track 3

Wednesday - 12.35 to 13.15 - Technical

- - -

Cloudera is the most complete all-in-one Hadoop-based Big Data platform in the market. Nowadays this is a fact. Cloudera is used in most of the top global companies across many verticals including banking, telco, automotive, healthcare, government, energy, etc.

Cloudera is rapidly evolving so it is a challenge to keep up to date with best practices to successfully implement solutions on the platform. As Cloudera partners we, in ClearPeaks, need to be on top of the wave so that our customers can benefit as soon as possible from the best the technology can offer. In this talk we will discuss the present and the future of Cloudera.

Regarding the present, we will review which are the usual workloads that we find in Cloudera clusters: data lakes, data warehouses, operational databases, search engines, data engineering, real time & streaming and, last but not least, data science. For each type of workload, we will share our lessons learned from recent implementations. We will review how to size a large cluster for multi-workload functioning, how to optimally organize data lakes and data warehouses to ease data engineering operations and to have best analytical query performance – we will show how to estimate query response times when we are designing and sizing a cluster, this is usually a lever when sizing that is not considered (usually only capacity is considered), but it may eventually be critical to avoid incurring unexpected expenses (because the cluster needs to be larger than originally planned to meet query performances SLAs). We will discuss how to leverage search engines in Cloudera and how their combination with other services in Cloudera offers key advantages. We will also review best practices when developing data engineering, real-time & streaming applications on Cloudera considering the recent additions in the stack, such as Nifi. Finally, we will review the available options in Cloudera to develop ML & AI solutions.

Regarding the future, exciting times are ahead of us after recent events: the merger of Cloudera with Hortonworks, the announcement of the new Cloudera Data Platform (CDP) resulting from the merger, the embracing of a full open-source approach by the new Cloudera, and the steering of its main focus to become (i) and end-to-end platform (from edge to AI) and (ii) the most popular and used Big Data platform on the Cloud. Because of this, some important changes in the way we know Cloudera clusters will be happening soon, shifting from monolithic clusters to platforms consisting of multiple cluster instances co-existing and with very simple spinning procedures, tending to a more PaaS approach with simpler maintenance and administration. We will also discuss what will happen to the services that we used to find in Cloudera Distribution of Hadoop (CDH) and Hortonworks Data Platform (HDP). In the new CDP some of these services will become obsolete and some will get a boost since CDP is keeping and evolving the best of CDH and HDP.

This talk is for managers, data engineers, data scientists and for anyone interested in Big Data & AI technologies that want to know where we are and where we are going with regards to one of the most used and popular set of technologies.