Challenges of Building a Distributed Fault-Tolerant Scalable Analytics Stack

Nishant Bangarwa

English Session 2021-08-06 14:10 GMT+8  (ROOM : B) #bigdata

As of today, the largest Apache druid cluster holds more than 50+ Trillion events amounting to over 500 petabytes of raw data and continuously ingesting new data streams growing at a never-before rate. There are many technical challenges, design decisions, cost vs performance tradeoffs, limiting factors that we faced on our journey in evolving druid to make it scale to handle petabytes of data without sacrificing performance.

In this talk, we will discuss general requirements and key challenges that anyone designing a production-ready scalable analytics stack is expected to encounter on their way due to various constraints. We will also discuss the learning and strategies we have developed over time and our path in evolving Apache Druid as a powerful distributed fault-tolerant scalable Analytics data store.

We hope the strategies we discuss in this session will help anyone struggling to keep up with the growing demands of data analytics.


Nishant Bangarwa: Nishant is Co-founder and Head of Engineering at Rilldata. He is an active open-source contributor and a PMC member of Apache Druid and Apache Superset. He is also a committer in Apache Calcite and Apache Hive. Before starting Rilldata, he was part of Cloudera’s Data warehouse team and Metamarkets Druid team where was responsible for managing large-scale Apache Druid deployments. He holds a B.Tech in Computer Science from the National Institute of Technology, Kurukshetra, India.