Challenges and Solutions on building Realtime Data warehousing with Apache Flink , Apache Hive and Apache Iceberg

Yan Liu 刘岩

Chinese Session 2023-08-18 13:30 GMT+8  #datalake

There are many technologies that can be used to build an Enterprise level real-time data warehouse. In order to fully migrate the Batch ETL processing of your EDW towards Real Time ETL, there are challenges such as late events, dirty data routing, etc require extra attention to handle. The purpose of this speech is to provide the recent community works on Apache Flink, Apache Hive, and Apache Iceberg and architecture design related to migrating Batch Processing EDW to Real-time PRocessing EDW.

Speakers:


Yan Liu: Cloudera, Apache Hive Contributor,Apache Flink Contributor,Cloudera Solution Eng, Apache Hive and Apache Flink Contributor, Cloudera Solution Engineering. Over 10 Years of Practical Experience in Big Data and my current focus is real-time data warehouse using Apache Flink, Apache Hive, and Apache Iceberg.