The practice and landing of the real-time computing data flow framework based on Apache Flink in Jingdong Retail


Chinese Session 2022-07-29 15:30 GMT+8  #streaming

JD retail data and intelligence department build a flink-based real-time computing framework based on JD’s business characteristics to improve the development efficiency of r&d personnel in specific scenarios. It is committed to building unique flink-based data flow scenarios, including but not limited to:

  1. List scenario: committed to solving multi-stream Join scenario and TopN scenario

  2. Query moving line analysis scenario: Flink Gelly is used to drop the data of graph analysis results into OLAP for multidimensional analysis, and then start streaming analysis to Query OLAP for data analysts to provide AB causality analysis for QP scenario and other technical points, mainly share some machine learning engineering and related data analysis technology scenarios and corresponding solutions accumulated by jd Retail Data and Intelligence Department

  3. Machine learning scenario: Dedicated to the construction of a unique machine learning workflow based on Flink, internal machine learning link closed loop, including but not limited to real-time feature generation, sample splicing, feature engineering, model training, model estimation and other links, full link batch integration, multiplexing operator.


Ying Zhang: JingDong, Algorithm Engineer, experienced in data processing, lightweight training system and online learning data link construction, including feature dump link construction, label pipeline construction, sample splicing, sample pre-processing, feature engineering and small model distributed training. Now in charge of real-time computing business of the Data Analysis and Optimization Department. As a contributor for many open source projects such as Alink and DL on Flink, I have made speeches at Flink Forward Asia 2021 and InfoQ QCon+ Case Study Society Global Software Conference.

Ligang Yan: JingDong, Staff Technical Expert, has many years of experience in low code implementation of data flow, and has led the construction of JingDong’s big data real-time platform, data flow framework and component tools.