analyzing transactional data in Apache druid

Vijay Narayanan

English Session 2021-08-07 13:30 GMT+8  (ROOM : A) #bigdata

Apache druid is a platform for realtime analytics. Handling transactional data that changes around a constant primary key is challenging in druid as druid does not support row level updates. This session will focus on handling such data in druid and using druid to do both traditional OLAP analytics on such data and user behaviour/funnel analytics.

  1. OLAP analytics on transactional data - two approaches can be thought off to do this (this usually focusses on queries like “How many transactions that were created last month completed?"). One is to use druid functions to retrieve latest/earliest information in time from such data. the second approach is to use kafka to hold current time information for transactions and use that as a lookup in druid.
  2. Funnel analytics - This is used for user behaviour analytics and focusses on answering questions like “How many users completed all 4 steps in the sign up process?”. Storing events in time for transactional data allows such analytics to be done in druid. Some approaches will be laid out both for handling funnels where sequence is not important and those where sequence is important

Speakers:

Vijay Narayanan: Vijay has about 15+ years of experience in the data world. Vijay is currently a field engineer with Imply (Imply offers commercial enterprise support for open source druid). In this role Vijay is focussed on helping customers in APAC use the Imply platform (based on Apache druid). Before Imply Vijay was with cloudera for two years helping cloudera partners position and use the cloudera platform. Before cloudera Vijay spent 10 years with Informatica where he was part of the team that put together connectivity for informatica cloud.