Hadoop Cloud storage connectors - past, present & future!

Mukund

English Session 2022-07-29 15:30 GMT+8  (ROOM : B) #bigdata

S3A and ABFS cloud connectors are widely used by systems like Hive and Spark while running workloads in public clouds like S3 and Azure. In this talk we want to talk about multiple enhancements in cloud storage (Aws and Azure) modules in Apache Hadoop like how the announcement of consistent S3 by AWS led to an end of the S3Guard era and the listing optimisations we did to make S3 listing faster and introduction of new api’s like open file and io statistics in hadoop and its support in S3A connector. We will also talk about lazy seek and read ahead read optimisations done in Azure connector

Speakers:


Mukund: Cloudera, Staff Software engineer., I am an active committer of Apache Hadoop project currently working at Cloudera focusing on Cloud Storage Connectors (aws, azure and gcs) and Ranger Authorization. I have total experience of 8 years designing and developing large scale distributed systems. Apart from software development, I love doing yoga and hiking in the Himalayas.