Track: DevOps: You Build It, You Run It

Location: Ballroom BC

Day of week:

Pushing DevOps beyond adoption into cultural change. Hear about designing resilience, managing alerting, CI/CD lessons, & security. Features lessons from open source, Linkedin, Netflix, Financial Times, & more. 

Track Host: Justin Lambert

Principal Platform Engineer @StitchFix

Justin Lambert is a Principal Platform Engineer at Stitch Fix leading a team of engineers focused on empowering and streamlining developers and their workflows. Prior to Stitch Fix he worked at numerous startups including FTEN and eCollege architecting and running systems and networks. At FTEN he was responsible for the infrastructure in 4 countries processing over half of the US equities per day for high frequency traders and at eCollege advancing their web and storage infrastructures. In his spare time he enjoys being outdoors camping, hiking, biking and snowboarding.

You Build It, You Secure It

Early on in the "cloud" era, Werner Vogels offered his famous quote "You Build It, You Run It". With DevOps this has become a mantra for shared responsibility between developers and operations. Operations learned how to process infrastructure as code and participate early in the supply chain of a service's life cycle. Developers learned that they had responsibilities to enable and in many cases operationalize their service. Now there is a new movement to include and collaborate in a similar way with Security. This is all part of the ideal approach where we "shift everything left" in the delivery pipeline.

In this session, we will talk about how developers and operators can include security in all parts of the delivery pipeline, and implement security gates in the same way as they implement code test gates.

John Willis, Founder @botchagalupe

Testing in Production - Quality Software Faster

A major part of our developer lives depends on working safely with production - yet few organizations today are designing their production environment to enable high quality, end-to-end verification of the code we write and deploy. In this talk, we build on the foundation of great microservice architectures to include first class design for testability as a critical technique for high velocity, high quality teams. In particular, we’ll explore what it’s like to build quality software with no development, QA or staging environments. We will conduct a deep dive into “verifying in production” - what it really takes to build software that can safely be tested continuously in production. Let’s build a solid deployment pipeline, reliable systems, and developer happiness by *knowing* production is correct.

Michael Bryzek, Co-Founder / CTO @Flow.io., previously Co-Founder / CTO @Gilt

Avoiding Alerts Overload From Microservices

Microservices can be a great way to work: the services are simple, you can use the right technology for the job, and deployments become smaller and less risky. Unfortunately, other things become more complex. You probably took some time to design a deployment pipeline and set up self-service provisioning, for example. But did the rest of your thinking about what “done” means catch up? Are you still setting up alerts, run books, and monitoring for each microservice as though it was a monolith?

Two years ago, a team at the FT started out building a microservices-based system from scratch. Their initial naive approach to monitoring meant that an underlying network issue could mean 20 people each receiving 10,000 alert emails overnight. With that volume, you can’t pick out the important stuff. In fact, your inbox is unusable unless you have everything filtered away where you’ll never see it. Furthermore, you have information radiators all over the place, but there’s always something flashing or the wrong color. You can spend the whole day moving from one attention-grabbing screen to another.

That team now has over 150 microservices in production. So how they get themselves out of that mess and regain control of their inboxes and their time? First, you have to work out what’s important, and then you have to ruthlessly narrow down on that. You need to be able to see only the things you need to take action on in a way that tells you exactly what you need to do. Sarah shares how her team regained control and offers some tips and tricks.

Sarah Wells, Principal Engineer @FT (Financial Times)

CI/CD: Lessons from LinkedIn and Mockito

LinkedIn and Mockito are two different use cases of implementing continuous delivery at scale. Yet the challenges, benefits and impact on the engineering culture are very similar.

In 2015, LinkedIn’s flagship application adopted a continuous delivery model we called 3x3: deploy to production 3 times a day, with a 3 hour maximum time from commit to production. At LinkedIn scale - hundreds of engineers building products for 500M users - implementing 3x3 was really hard. How did 3x3 change LinkedIn engineering culture and what we have learned on the way?

Mockito is a top 3 Java library with ~2M users. Even with that large user base, since 2014, the Mockito project has taken the surprising approach of publishing a new version of the library from every single pull request. This approach is challenging and innovative in the Java community, and Mockito leverages Shipkit to ship every change to production. Why did the Mockito team adopt continuous delivery in 2014 and what we have learned to date?

Join and learn from Szczepan Faber, the maker of Mockito framework since 2007, and the tech lead of LinkedIn Development Tools since 2015.

Szczepan Faber, Mockito Creator, Core Eng Gradle 1.x/2.x, & TechLead @LinkedIn Development Tools

Designing Services for Resilience Testing @Netflix

As an industry, we focus on designing microservices for availability. However, we don’t tend to speak about enabling these same services for resiliency testing. In a perfect world, you wouldn’t need resiliency testing, but that’s not the reality we are currently facing. This talk covers designing microservices for enabling resiliency testing and the moving parts you need to consider when designing them from the get go, and along their lifetime. Yes, the services may all have RESTful calls in place, but those RESTful calls may not always be wrapped in circuit breakers. Yes, the services already have circuit breakers in place, but they may not always have fallbacks enabled, service owners may not know what those fallbacks do, or know how to execute that path confidently. The audience will come away from this talk with tips and tricks on how to design their microservices for resiliency tests, examples of poorly designed services, and how to ensure these pertinent design decisions are in place on a continuous basis. The audience will also leave with how to regularly test confidence in these design decisions through new chaos experimentation techniques.

Nora Jones, Senior Chaos Engineer @Netflix

DevOps: You Build It, You Run It Open Space

Open Space is a kind of unconference, a simple way to run productive meetings for 5 to 2000 or more people, and a powerful way to lead any kind of organization in everyday practice and extraordinary change.

Last Year's Tracks

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.