© 2018 Strange Loop
Distributed databases, queues, and lock services vary in their durability, availability, and consistency guarantees under partition. In particular, designers and developers often assume that system clocks are monotonic, advance at the same rate, or are synchronized between nodes; or that network partitions are impossible, separate the system cleanly into disjoint components, or are stable over short timescales.
How a system tracks causality and reconciles divergent state determines how well it behaves under unstable conditions. I've spent the last six months subjecting popular distributed systems to different kinds of network partitions while under load. Discovering their design limits and bugs illustrates how difficult it is to build reliable distributed services in practice.
Kyle is the author of Riemann, Meangirls, Timelike, Jepsen, and a bunch of other open-source stuff. He writes Clojure and helps monitor distributed systems at Factual.