Apache Paimon Stream data Lake: CDC feed lake and stream read
李劲松
Chinese Session 2023-08-19 14:00 GMT+8 #datalakeApache Paimon (incubating) is a streaming data lake storage technology that can provide users with high throughput, low latency data intake, streaming subscriptions, and real-time query capabilities. Paimon uses an open data format and technology concept to interface with many of the industry’s leading computing engines, such as Apache Flink/Spark/Trino.
This sharing mainly introduces Paimon:
- CDC Schema Evolution into lake
- CDC entire vault into the lake
- CDC into the lake part of the column update
- Change log stream reading in real time
Speakers:
Li Jinsong: Alibaba, Senior technical specialist, PPMC Member of Apache Paimon, PMC member of Apache Flink, Committer of Apache Iceberg&Beam. Successively engaged in distributed flow computing, distributed batch computing, lake storage, and now focuses on the technology of flow lake storage.