QCon San Francisco 2021 November 1-5, 2021 |

Abstract

Slack is a communication and collaboration platform for teams. Our millions of users spend 10+ hrs connected to the service on a typical working day. They expect reliability, low latency, and extraordinarily rich client experiences across a wide variety of devices and network conditions. In the talk, we'll examine the limitations that Slack's backend ran into and how we overcame them to scale from supporting small teams to serving gigantic organizations of hundreds and thousands of users. We'll hear stories about the edge cache service, real-time messaging system and how they evolved for major product efforts including Grid and Shared Channels.

Question:

What is the focus of your work today?

Answer:

I work on the edge cache tier for Slack. The focus is to make the service more performant with our growing user base and more resilient to failures. The other important aspect is to support new product efforts at Slack. And we are always product first.

Question:

What’s the motivation for this talk?

Answer:

Developers are generally interested in how other systems work. I’ll give a high level introduction on how Slack works, and then focus on our two-year journey of how Slack scaled. There were mistakes made and lessons learned. Other companies with similar rapid growth may learn a thing or two from our experience.

Question:

How you you describe the persona and level of the target audience?

Answer:

Our ability to scale a service excites me day-to-day. As such, I think the problems that we deal with are highly relevant to architects, system engineers, full-stack engineers and site reliability engineers.

Question:

What do you want “that” persona to walk away from your talk knowing that they might not have known 50 minutes before?

Answer:

Building Slack is not as easy as it may appear to be. Users expect low latency, high performance and extremely rich user experience. Slack contains a large, rapidly changing dataset. Individual components of the data (users, channels, files, etc) reference each other. Those changes need to be consistent across all clients. With the rapid growth of our user base and request volume, we have to, at times, make fundamental changes in our architecture to accommodate the growth in addition to the incremental steps.

Speaker: Bing Wei

Software Engineer @Slack

Bing Wei is a software engineer on the infrastructure team at Slack, working on its edge cache service. Before Slack, she was at Twitter, where she contributed to the open source RPC library Finagle, worked on core services for Tweets and Timelines, and led the migration of Tweet writes from the monolithic Rails application to the JVM based micro-services.