Lambda Architecture for Big Data with Apache C *, Spark and Pulsar

孟亚斌

Chinese Session 2021-08-08 14:10 GMT+8  (ROOM : B) #bigdata

The Lambda architecture is a generic data processing framework in Big Data. In this talk we first give a brief introduction to the Big Data Lambda architecture. We then focus on how the three top projects of the Apache Foundation (Cassandra database, Spark data processing engine, and Pulsar stream data processing) can be effectively integrated to achieve a distributed, highly available, linearly scalable Lambda architecture. Finally we will demonstrate the implementation of this architecture with a sample IoT sensor application.

Speakers:

Yabin Meng: Yabin Meng is a leading architect at DataStax. In recent years, his focus has been primarily on the design and consulting of solutions for large, distributed databases and stream processing systems. Prior to joining DataStax, he spent most of his career designing, implementing, and consulting on systems in the areas of relational databases, data warehousing, business intelligence, NoSQL databases, and big data.