Unifying Real-time and Batch ML Inference using BentoML and Apache Spark

Bo Jiang

Chinese Session 2023-08-18 16:45 GMT+8 #ai

BentoML provides tooling for packaging, deploying, and serving machine learning models at scale. Apache Spark is an open-source cluster computing framework for large-scale data processing. This talk will highlight how BentoML can unify real-time and batch inference workloads by integrating with Apache Spark.

BentoML has rapidly gained popularity among its user base owing to its seamless open standards for constructing online AI applications as distributed services through simple Python code. In this regard, we present the novel integration of BentoML with Spark, which allows users to employ the Bento service, originally designed for real-time inference, within a Spark cluster for offline batch inference without altering any code. This functionality is enabled by the run_in_spark API, which automatically propagates the models and inference logic across all Spark worker nodes during batch inference.

This integration offers an optimal solution for teams to manage both their real-time and batch inference logic under the same standards, facilitated with version control, and ensuring consistent library dependencies. As a result, this eliminates the concerns regarding divergence in the inference logic over time between real-time and batch inferences. The unified approach ensures consistent model application, fostering efficient AI service development and deployment.

Attendees will learn how to: • Package models with BentoML • Deploy BentoServices to production • Invoke BentoServices from Spark for batch inference at scale • Leverage the same models for both real-time and batch predictions

Speakers:

Bo Jiang : BentoML, Product Engineer, Product Engineer at BentoML, previously Product Engineer at Douban. Working on platforms industrializing AI Applications.