QCon San Francisco 2021 November 1-5, 2021 |

This presentation is now available to view on InfoQ.com

What You’ll Learn

Hear why is important and helpful to have a quantitative evaluation of risk.
Listen how Netflix is using the FAIR methodology to quantitatively evaluate risk.
Learn some tips and tricks from Netflix’s experience.

Abstract

The FAIR methodology is an emerging standard for measuring information risks. But, it can be intimidating to get started with a risk quantification program, as people may be reluctant to to go beyond Low/Medium/High categories to real numbers. At Netflix, we have introduced risk quantification in our highest impact areas, and are gradually expanding it across the enterprise. I'll share my experience and approach to defining appropriate loss scenarios, and getting real numbers from colleagues.

Question:

What is the work you're doing today?

Answer:

I work as a Detection Engineering lead at Netflix and we have a fairly unique philosophy around how to do detection and response: we call it a SOCless organization. We don't have a large number of people sitting there watching alerts. Instead what happens is each application owner is responsible for their own security, and we help them. Our response team is mostly an escalation point when there's an incident. And my role in detection is two things: one is to address the highest risks in the organization and to build comprehensive detection around those. The second is to build capabilities that can scale out so that all the individual teams can enable their own monitoring.

For the former goal, it was very important for me to understand what are the highest risks and what are the loss scenarios that we need to address with detection. That's where this risk quantification effort came in. That was an effort to understand and break down our risks in such a way that I would know what are the actual scenarios and then address those with various detection efforts, and also as I study the detection I often uncover additional controls that would make sense. Engaging with those teams to get them to adopt new controls while we also build detection for the things that can't be completely locked down. That's my team's role.

Question:

What are your goals for the talk?

Answer:

My goal is to communicate the experiences that I had applying these risk quantification approaches. There's a methodology and an organization called FAIR which is an emerging standard in the sense that it's being pushed as a standard and a lot of people are finding it useful but it has not gained wide adoption yet. People I think are still afraid of quantifying risk in this way. A lot of information security teams tend to just use the low/medium/high or some one to five scale, it's more categorical than quantitative. But there are a lot of good reasons why you want to get more quantitative and more numerical. I want to explain why you should do this, and overcome some of the fears of why it may not work or the difficulties of applying these approaches. Having done it myself I can alleviate some of the fears. I'll explain why the quantification is important and walk through the tips and tricks that I learned while applying these approaches.

Question:

Can you summarize FAIR?

Answer:

Sure. The basic approach is that you want to come up with two numbers. One is the frequency with which some loss could occur, the frequency with which you expect it to occur. That's just a number saying how many times per year would you expect this loss. Hopefully, it's less than once per year, in which case it becomes a fraction. If it's every 10 years then you would say 0.1 Is the frequency.

Then the second number is actually more of a distribution of impacts and the quantification there you put it into actual money terms, in our case dollars. And you set a low to high interval, and then typically they model it with a lognormal distribution.

By combining these two you can calculate an expected loss per year, and that lets you prioritize all of your losses and say which ones are the most important. Something that's not frequent but very high loss can be compared to something that's more frequent but low loss, and you can rank them.

A second thing you can do is to actually calculate a" loss exceedance curve" which is a picture of what the chances that the loss would exceed some amount. This is what insurance companies use for example to insure a building against hurricane damage. They'll say, well, based on historical data here's the range of different kinds of wind conditions that we might expect and the amount of damage that would occur to the building and we create a probability picture where we expect the loss will exceed a million dollars with 10 percent probability. And at that point now you have some quantities you can use to buy insurance against that loss. You can say, OK my losses definitely won't exceed X, because if they do the insurance company will pay, but I have to pay them based on the level of risk.

This is where the industry wants to get to with IT losses, they would like to be able to quantify them so they could insure against large losses. But it's going to have to be a bottom-up approach, and the community has to adopt practices that are actually common in other industries. The FAIR methodology is actually a bit simpler than what other industries use but it's a good way to introduce this approach of quantifying risks and putting numbers to them.

I think it has great benefit also for information security teams to explain to their management what the value they're providing. I give the example of if there is a department that comes to the CEO and says, hey, if you give me a million dollars I have an investment I can make that will return five million dollars. And as the information security team comes and I says, if you give me that million dollars I can turn this red risk into a yellow risk. Which one is the CEO gonna want to put the money to? You can say I'll reduce this 10 million dollar risk to a 2 million dollar risk, in which my rate of return is higher than the other person's. That argument is much easier to make, and unless we get to that point it's very hard to make those business cases.

Question:

Where are some of the places you've applied this at Netflix?

Answer:

I won't be able to get into too much detail about the areas that we applied it in and what the actual risks are and stuff. Essentially we've been applying it to our highest risks: what are the most sensitive data stores that we have, and what are the threats against those data stores.

I'll be presenting the process in a generalized way that people will be able to follow and apply within their own environment. Those risks can be different depending on what your business is and what kind of sensitive data you tend to store.

One thing that everyone stores, for example, is private information about their employees, and that's certainly an asset you want to protect. But how does that balance against the types of user data that you're collecting? It comes down to how sensitive the user data is, and how much of it there is. Getting some quantities around these things is very important, then you can allocate your time appropriately to maximally reduce your risks.

Question:

What do you want people to leave the talk with?

Answer:

I want them to leave the talk with less fear around putting numbers to things, and with a sense that they could go back and apply these approaches, that they have some practical steps that they can follow in order to actually execute on a quantitative risk analysis. The idea of quantifying risk has been around for a while, but the adoption is difficult because people aren't used to doing it. I want to show that it's not a terribly complicated thing and that they would have most of the skills already.

Speaker: Markus De Shon

Sr. Security Engineer, Detection Engineering Lead @Netflix

Markus has worked in security since 2000 at SecureWorks, CERT, Google and Netflix, mostly on problems in Detection Engineering. He has a passion for developing a comprehensive framework to guide the engineering of detection and response systems, an effort that he has written about and continues to work on today.