Strange Loop

2009 - 2023

St. Louis, MO

Distributed Systems: The Stuff Nobody Told You

We at "Greplin":https://greplin.com have had an exciting time learning how to scale our service. A year ago, we were storing a few million documents on just two servers - while we're now storing billions of documents on hundreds of servers (and adding up to 5000 documents a second).

When we started designing our architecture, we gave a lot of thought to distributed data stores ("Riak":http://wiki.basho.com/ and "HBase":http://hbase.apache.org/) and message buses ("RabbitMQ":http://www.rabbitmq.com/ and "Redis":http://redis.io/). But we drastically underestimated the importance of the tools and support systems that actually make managing a large and growing cluster manageable, such as:

Monitoring
Analytics
Scripts (deployment/debugging/etc)
Logging
Security

Some of these topics aren't as sexy as the CAP theorem - but they are just as important to running distributed systems in the real world. I'd like to talk about some of the best practices and tools we've discovered or created (and open-sourced!) at Greplin to make this possible. I'll also be sharing some examples of problems that our tools have allowed us to discover and fix quickly.

Click to view published talk video

Shaneal Manek

Greplin

@smanek

Shaneal Manek is the lead search engineer at Greplin. He was previously the co-founder and CTO of Signpost.com, which built a geospatial search and recommendation engine. He has also helped build a service-oriented satellite defense system for the Joint Space Operations Center