Optimizing scheduling and communication of distributed data processing for resource and data characteristics is crucial for achieving high performance. In this talk, we introduce Apache Nemo, an optimization framework for distributed dataflow processing that provides fine control for high performance, and also ensures correctness for an easy user experience. In the talk, we provide a demo to show the execution flow of an Apache Beam program being run on our distributed data processing system. We demonstrate how the program is wrapped by the Nemo intermediate representation that enables compiler optimization passes and runtime extensions, and show how the optimizations can be easily and flexibly applied on dataflow applications. We also do a comparison of the evaluation results based on the optimizations.
Won Wook SONG: Won Wook (pronounced as won-ook) is a PhD candidate in Seoul National University, advised by Prof. Byung-Gon Chun. His main research interests reside in big data and distributed systems, while he also has interests in machine learning systems. He is currently doing an internship at Microsoft Research Asia. He has been committing to the Apache Nemo project since 2017, and is one of the initial members that have built the system. He have been giving talks in the Apache Beam Summit and several domestic conferences, hosted by tech companies located in South Korea.