Java-based machine learning solutions for big data

Qing Lan

Chinese Session 2021-08-06 16:10 GMT+8  (ROOM : A) #bigdata

The success of Machine Learning (ML) application depends on utilizing the big data. Most of the big data is available in unstructured format. The availability of big data can be also offline and online. Although there are options available for ML tasks in python, Integration between Python application into the existing Java/Scala based Big Data pipeline are quite challenging. Apart from that, there are very few options in Java/Scala to bridge the gap of processing the big data and using the same library for ML workload.

To solve the issues above, we will demo a solution for Big Data ML in Java with DJL, a Machine Learning framework in Java. DJL offers a variety of ML engines, including TensorFlow, PyTorch, Apache MXNet (incubating). PaddlePaddle, ONNXRuntime and a lot more. Through using Apache Flink and Apache Spark, user can easily build their online/offline ML pipeline. At the end of the session, audience will be able to build an easy-to-use, high performance ML pipeline for all different scenarios.


Qing Lan: Qing is a SDE in AWS Machine Learning Platform. He is one of the co-authors of DJL ( and PPMC member of Apache MXNet. He graduated from Columbia University in 2017 with a MS degree in Computer Engineering. He holds the expertise in model training and inference.