Presentation: Fault Tolerance at Speed

Track: Bare Knuckle Performance

Location: Pacific DEKJ

Duration: 1:40pm - 2:30pm

Day of week:

Slides: Download Slides

This presentation is now available to view on

Watch video with transcript


Distributed systems providing fault tolerance often sacrifice performance. The sacrifice often happens late when a systems engineering approach is not taken. Performance is an inherent aspect of distributed design and should be considered holistically in the systems engineering process. A well designed distributed system can be both fault-tolerant and fast.

In this session, we discuss the techniques and lessons learned from implementing the Aeron Cluster. The focus will be on how Raft can be implemented on Aeron, minimizing the network round trip overhead, and comparing a single process to a fully distributed cluster. Come to this session if interested in how performance can be a first-class design concern and the results which can be delivered.

Speaker: Todd Montgomery

Ex-NASA Researcher and High Performance Distributed Systems Whisperer

Todd Montgomery is a networking hacker who has researched, designed, and built numerous protocols, messaging-oriented middleware systems, and real-time data systems, done research for NASA, contributed to the IETF and IEEE, and co-founded two startups. He currently works as an independent consultant and is active in several open source projects.

Find Todd Montgomery at

Similar Talks

Java 8 LTS to the Latest - a Performance & Responsiveness Prospective


Java Champion, First Lego League Coach, passionate about JVM Performance @Microsoft

Monica Beckwith