Apache Ozone is a scalable, redundant, and distributed object store for Hadoop, which became Apache Top level project in 2020. S As an early adopter of Apache Ozone, Tencent Big Data platform has deployed an Ozone cluster with over 1000 nodes as backend storage for big data applications. Tencent also utilizes Ozone as a primary storage solution for some private data warehouse projects. As Ozone is being heavily used, Ozone high availability is put as top priority to support in production. In 2020, Tencent engineers collaborated with Cloudera engineers on implementing Ozone HA for SCM (Storage Container Manager) with the help of Apache Ratis (Raft implememtation). From this talk, audience would learn:
- What level of complexity Ozone SCM must do in order to maintain high availability.
- Why Ozone decides to use Raft protocal to implement SCM HA.
- How Ozone team uses Raft and Java reflection to replicate data across SCM group.
- How Ozone team optimize the SCM performance after HA is enabled. This talk is also co-authored by Li Cheng from Tencent Cloud, Nanda and Shashikant from Cloudera. All of them are Apache Ozone PMCs and have been actively contributing to Apache Ozone community.
Li Cheng: Current Senior Engineer who owns Big Data Storage in Tencent Cloud COS. Formerly works for AWS S3 and Huawei Storage team. Also active in open source community. Currently Apache Ozone PMC and Hadoop Committer. Shashikant Banerjee: Software engineering professional with 8+ years of experience in designing and building scalable and high performant distributed storage systems. I am currently, a committer and PMC member of Apache Hadoop, Apache Ozone and Apache Ratis community. Nanda Kumar: Software engineering professional with 9+ years of experience in designing and building scalable distributed storage systems. Currently, a committer and PMC member of Apache Hadoop, and Apache Ozone community.