Strange Loop

Why do tree ensembles work?

Ensembles of decision trees (e.g., the random forest and AdaBoost algorithms) are powerful and well-known methods of classification and regression. This talk will survey work aimed at understanding the statistical properties of decision tree ensembles, with the goal of explaining why they work. After sketching the algorithms, we will give an initial explanation for their effectiveness via generic arguments (bias-variance decomposition, Hoeffding's inequality), then proceed to more detailed topics (the interpretation of random forests as kernel machines, the role of the margin, interpolation). The audience is expected to have some experience with supervised learning and statistical arguments.

Joe Ross

Joe Ross


Joe Ross holds a PhD in mathematics from Columbia University and was a researcher and instructor in pure mathematics, most recently at the University of Southern California. He has given more than 20 talks about his research at conferences and universities throughout the world (Germany, Japan, Turkey, USA). He has also been the primary lecturer for many undergraduate and graduate math courses, and has given countless informal seminars. He has 9 publications in peer-reviewed mathematics journals. Joe has worked as a data scientist at machine learning/analytics startups for five years; in his current role, he focuses on a variety of time series (anomaly detection, forecasting, correlation) and sampling problems that arise in monitoring.