Structured Data Streaming

Shivji Kumar Jha

English Session 2021-08-08 14:10 GMT+8  #streaming

Type safety is extremely important in any application built around a stream / queue. Type definition and evolution can either be built in the application or relied upon the data layer to support it out of the box allowing the application to only concentrate on business logic, not how of data store and evolution. It is this property of the good old relational databases (among others) that make them a favourite among all the modern NoSQL databases. Modern software architectures requires asynchronous communication (via stream / queue). While the data store and query design changes with asynchronous communication, type safety is still equally important.

In this talk we will go over ways in which one can force structure (schema) over the streaming data. As an example, we will talk about Apache Pulsar. Apache pulsar offers server as well as client side support for the structured streaming. We have been using pulsar for asynchronous communication among microservices in our nutanix beam and flow security central apps for over 1.5 years in production. This talk presents the technical details on what is schema, how to represent schema, what is available in the apache pulsar server and client side, how we have used pulsar’s schema support to build our use cases and our learnings from them.


Shivji Kumar Jha: Shiv is a senior software developer at Nutanix and works for the beam team helping Nutanix customers minimise cloud costs and security risks for hybrid cloud usage. Shiv loves spending time on data stores (databases, streams, analytics etc) and has contributed to MySQL and pulsar codebases. Shiv is an avid reader (tech, fiction, economics etc) and is always looking at ways to simplify software architectures.