Strange Loop

2009 - 2023

/

St. Louis, MO

Scalding for Data Analysis in Hadoop Systems

Over the past few years, Twitter's Hadoop-based infrastructure has experienced explosive growth in a number of dimensions -- terabytes stored, jobs processed, nodes active, and number of engineers producing and consuming data. Along the way we encountered a number of challenges despite the data-scalability of Hadoop-based technologies. In this talk, I will describe these problems and the solutions we developed as we expanded from 30 nodes and one team of 3 people to many hundreds of nodes, multiple teams, dozens of people, and thousands of daily jobs.

Dean Wampler

Dean Wampler

Think Big Analytics