This talk examines business perspectives about the Ray Project from RISELab, hailed as a successor to Apache Spark. Ray is a simple-to-use open source library in Python or Java, which provides multiple patterns for distributed systems: mix and match as needed for a given business use case – without tight coupling of applications with underlying frameworks. Warning: this talk may change the way your organization approaches AI.
Nearly 15 years ago, the speaker served as a “soundboard” for ideas about a new kind of computing service model, subsequently known as cloud computing. What has changed in a decade and a half? To paraphrase the UC Berkeley RISELab, one of the fundamental changes underway circa 2020 is to: “Pay to execute a block of code, rather than pay for allocating resources on which to execute code.” That may sound simple, although as shown by the commercial success of services such as Snowflake, the business implications can be staggering.
An important observation we’ll explore is that hardware is now evolving more rapidly than software, which in turn is evolving more rapidly than process. The economics of data analytics circa 2005 and the hardware used then – which shaped Big Data frameworks such as Hadoop, Spark, etc. – addressed the needs of ecommerce workloads such as log file aggregation at scale. Today, in a time of AI adoption, the most valuable IT workloads must address a new set of needs: differentiating gradients within the context of networked data. Hardware and software have both changed dramatically to address these new workloads, as seen by TensorFlow, PyTorch, etc. Process … not so much.
Ray supplies a much-needed control layer for distributing and optimizing workloads across hybrid architectures, while being mindful about the economics of computing. This concept – dubbed “infinite laptop” – is especially important for computing needs such as deep learning. It become even more crucial for the growing segments of AI technologies that require more sophisticated computing such as reinforcement learning, AutoML, and knowledge graph. Moreover, use of Ray fits into existing software engineering process more seamlessly than prior generations of distributed systems. We’ll draw from primary material by industry thought leaders such as Ion Stoica, David Patterson, Jeff Dean, Ben Lorica, and others to look at the architectural implications, as well as consider large use cases in business.