Presentation: License Compliance for Your Container Supply Chain
This presentation is now available to view on InfoQ.com
Watch video with transcriptWhat You’ll Learn
- Understand issues around OSS licensing with containers, governing distribution, and satisfying license obligations.
- Learn some of the strategies to mitigate vulnerabilities and risk exposure when building your software supply chain.
- Hear how Tern can automate some of the investigation and license auditing for you.
Abstract
Modern container images are an Open Source Software (OSS) legal compliance nightmare. In the simplest case of building a container using a Debian base OS, installing dependencies using the package manager, and adding a home grown app at the end, meeting legal compliance obligations is as simple as using Debian's own machinery to pull corresponding sources. However, container images are built and used in so many different ways, it becomes impossible to track the provenance of such images, let alone try to figure out what is in them.
In this session, Nisha Kumar will talk about Tern, an open source tool for inspecting container images for OSS compliance. Nisha will provide examples of how enterprises can evaluate container images, Dockerfiles, and container supply chains using Tern, even for the impossible situations. Along the way, you will learn about the pitfalls of long advocated best practices for building and reusing container images for the software supply chain, and what you can do to correct these practices.
Tell me about the work that you do.
I am the technical lead for containers at the Open Source Technology Center at VMware. The OSTC is the technical arm of the Open Source Program Office and the representative for VMware’s upstream Open Source engagement.
What is this talk about?
To support a microservices architecture, modern app developers package their apps in containers. Oftentimes the way this happens is that an app developer would make their app or service work in a container, they would create a Dockerfile to automate that process and then push the resulting container image to Dockerhub. They do this for all of their services.
This process of containerizing a service is done in a superficial way. There's not much understanding of what exactly is in the container, and current common thought leadership around this issue is that we shouldn't care as long as we test the container’s endpoints. This leads to the assumption that the containers being consumed by modern app developers are updated, recoverable, devoid of vulnerabilities, and the supplier has gone through the process of publishing the corresponding source code, which is not always the case. Another supposed best practice is using Docker’s multi-stage builds. This method does not provide any continuity to a build and release pipeline. We cannot trace the components of this type of container image back to source, let alone try to figure out what components are installed.
If I'm using Spring Boot and Alpine, that should be enough, right?
For demo-ing your app, that's totally fine. If you are going to send your app into production you probably want to know whether the Alpine image contains any vulnerabilities. And you also want to know whether you can meet the license obligations of the open source software components that are in the Alpine image.
A good number of licenses that govern the distribution of an Alpine image are copyleft. That means you also have to provide the corresponding sources. How are you going to do that?
Everybody has that problem, right? Why would I be the one to fix all this?
This is true. Everybody has that problem. But if you are a serious business looking to scale and there’s money at stake, you probably want to think about mitigating your risks. Your app is not going to stay in demo mode forever. Another thing you may be doing is using an Alpine image in your build and release pipeline rather than distributing it. You may not even know this. It could be an intermediate container to build your binary which you will eventually ship. In this case you want to know in great detail what the toolchain is and what licenses govern using it. This is especially the case with Golang binaries which are statically compiled. The toolchain uses components which are under a copyleft license but by the time you finish compiling and shipping your binary, you would have lost that information.
How does Tern help?
VMware originated this open source project called Tern. The project has two goals. The first one is to find all the components that are installed in a container image. The second one is to find the corresponding sources for the Open Source components. This is an exponentially hard problem when it comes to container images because, as I’ve said earlier, final container image could be the result of many intermediate steps that use other container images. Tern aims to not only list out what is in the container image, but to also give you an idea of what is known and what is not known.
Tern operates differently from security and license scanners in that it tries to maintain the context in which the container image got built, and that usually means analyzing the layers in order and relative to the information in the previous layers.
How does it work? Is this a statement inside of a Docker file, or something that calls Docker for you?
As far as Docker images are concerned, it has two modes of operation. The first one is using just the images. If you give it an image it will tear it apart and it will try and rebuild it step by step and at each step it tries to figure out what exactly got installed. It's almost like debugging a container image. The second mode is if you give it a Dockerfile, it will build the image and map the Dockerfile to the image. This is useful in cases where you are basing your app off a container image that may not be ideal for your use case. Tern gives you this extra information to help you make decisions on what container image you should be basing your app on. You could be using a Golang image without knowing it was actually built on top of an Alpine image.
So it's not part of your CI/CD pipeline, it's a separate tool you run against your Dockerfile?
It’s a standalone tool that anyone can use, but you could run it in a CI/CD pipeline and use the results for auditing. You can either look at the report to help you decide whether to promote a build or not, or you can use another tool that automates the auditing for you.
Who is the target audience for your talk and what do you want them to leave the talk with?
I am hoping to reach software architects in particular because one of the disturbing things I find when working with this tool is that there are a number of best practices that are counterintuitive to tried and tested software distribution best practices. We want our builds to be repeatable, our artifacts to be descriptive and we want to be able to update all portions of the pipeline. For license compliance, we want to be able to trace our build and release pipeline back to source. The best practices to build container images do not take any of these considerations into account. I would like technical leaders to understand why these considerations are still valid, and think about changing the way containers are built and distributed.
Similar Talks
Linux Foundation's Project EVE: A Cloud-Native Edge Computing Platform
Co-founder, VP Product and Strategy @ZededaEdge & Member Board Of Directors for LF Edge @linuxfoundation
Roman Shaposhnik
Evolution of Edge @Netflix
Engineering Leader @Netflix
Vasily Vlasov
Mistakes and Discoveries While Cultivating Ownership
Engineering Manager @Netflix in Cloud Infrastructure
Aaron Blohowiak
Optimizing Yourself: Neurodiversity in Tech
Consultant @Microsoft
Elizabeth Schneider
Monitoring and Tracing @Netflix Streaming Data Infrastructure
Architect & Engineer in Real Time Data Infrastructure Team @Netflix
Allen Wang
Future of Data Engineering
Distinguished Engineer @WePay
Chris Riccomini
[CANCELLED] Balancing Priorities: Revenue Generation vs. Revenue Protection
Director of Digital Transformation @Tasktop
Dominica DeGrandis
Mapping the Evolution of Socio-Technical Systems
Agile Methods Coach & Advocate for Woman in Tech
Cat Swetel
Coding without Complexity
CEO/Cofounder @darklang