Apache Paimon Stream data Lake: CDC feed lake and stream read

李劲松

Chinese Session 2023-08-19 14:00 GMT+8  #datalake

Apache Paimon (incubating) is a streaming data lake storage technology that can provide users with high throughput, low latency data intake, streaming subscriptions, and real-time query capabilities. Paimon uses an open data format and technology concept to interface with many of the industry’s leading computing engines, such as Apache Flink/Spark/Trino.

This sharing mainly introduces Paimon:

  • CDC Schema Evolution into lake
  • CDC entire vault into the lake
  • CDC into the lake part of the column update
  • Change log stream reading in real time

Speakers:


Li Jinsong: Alibaba, Senior technical specialist, PPMC Member of Apache Paimon, PMC member of Apache Flink, Committer of Apache Iceberg&Beam. Successively engaged in distributed flow computing, distributed batch computing, lake storage, and now focuses on the technology of flow lake storage.