© 2020 Strange Loop
Working with data often means trying to locate data that fits patterns, akin to finding a needle in a haystack. When we add big data from non homogenous sources to the mix, this problem becomes exponentially complex. One of the use cases at Netflix, is about improving the Sign Up experience through experimentation. Being able to find user journeys across billions of events; that follow certain patterns, is a key insight into simplifying the sign up process.
This gave us an idea to build a framework to express these user journey patterns that could be translated into a Non Deterministic Finite State Machine. One of the ideas that we adapted from Ken Thompson's 1968 CACM paper, was to create an NDFA around patterns defined using regex that could support backtracking. The next step was applying the state machine across billions of events at scale using Spark. The final piece to the puzzle was to make it easily usable by Data Engineers, Scientists and Analysts alike.
In this talk, we will cover how we built this framework (dubbed "Conduit") and the design decisions resulting from challenges along the way. We will also talk about how this can be adapted to real time applications in the future.
Ajit Koti is a Senior Engineer on the Growth Data Engineering team at Netflix, building products that enable Non Member Acquisition & Experimentation. He has over 14 years of experience building and architecting large-scale distributed systems and services. Ajit has previously built Big Data Solutions for Fanatics and IBM Labs.
Passionate about all things data, Rashmi Shamprasad is a Senior Data Engineer on the Growth Data Engineering team at Netflix, building data products that enable Non Member Acquisition & Experimentation. With over 9 years of experience working in Big Data, her previous stints include building Big Data solutions at PayPal and eBay. Rashmi holds a Masters in Computer Applications and Bachelors in Commerce.