Interactive Realtime Dashboards on Data Streams using Apache Kafka, Druid and Superset

Nishant Bangarwa

English Session 2021-08-08 16:10 GMT+8 #streaming

When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are quick response time and data freshness. To meet the requirements of creating fast interactive BI dashboards over streaming data, organizations often struggle with selecting a proper stack.

This talk presents an open-source real-time data analytics stack using Apache Kafka, Apache Druid, and Apache Superset. The stack combines the low latency streaming and processing capabilities of Kafka with Druid which enables immediate exploration and provides low latency queries over the ingested data streams. Superset provides the visualization and dashboarding that integrates nicely with Druid. In this talk, we will discuss why this architecture is well suited to interactive applications over streaming data, present an end-to-end demo of the complete stack and discuss its key features and discuss performance characteristics from real-world use-cases.

Agenda of the Talk -

Introduction and Ideal Use cases
Architecture
Demo
Apache Kafka and Kafka Streams as Streaming Platform
Druid as Serving Layer
Superset as the Visualization layer
Key features of Analytics Stack
Performance benchmarks

Speakers:

Nishant Bangarwa: Nishant is Co-founder and Head of Engineering at Rilldata. He is an active open-source contributor and a PMC member of Apache Druid and Apache Superset. He is also a committer in Apache Calcite and Apache Hive. Prior to starting Rilldata, he was part of Cloudera’s Data warehouse team and Metamarkets Druid team where was responsible for managing large-scale Apache Druid deployments. He holds a B.Tech in Computer Science from the National Institute of Technology, Kurukshetra, India.