© 2018 Strange Loop
We at "Greplin":https://greplin.com have had an exciting time learning how to scale our service. A year ago, we were storing a few million documents on just two servers - while we're now storing billions of documents on hundreds of servers (and adding up to 5000 documents a second).
When we started designing our architecture, we gave a lot of thought to distributed data stores ("Riak":http://wiki.basho.com/ and "HBase":http://hbase.apache.org/) and message buses ("RabbitMQ":http://www.rabbitmq.com/ and "Redis":http://redis.io/). But we drastically underestimated the importance of the tools and support systems that actually make managing a large and growing cluster manageable, such as:
Some of these topics aren't as sexy as the CAP theorem - but they are just as important to running distributed systems in the real world. I'd like to talk about some of the best practices and tools we've discovered or created (and open-sourced!) at Greplin to make this possible. I'll also be sharing some examples of problems that our tools have allowed us to discover and fix quickly.
Shaneal Manek is the lead search engineer at Greplin. He was previously the co-founder and CTO of Signpost.com, which built a geospatial search and recommendation engine. He has also helped build a service-oriented satellite defense system for the Joint Space Operations Center