Podcast: Kim Lewandowski and Michael Lieberman on Securing the Software Supply Chain with SLSA
MMS • Kim Lewandowski Michael Lieberman
Article originally posted on InfoQ. Visit InfoQ
Subscribe on:
Transcript
Introductions [00:21]
Charles Humble: Hello and welcome to the InfoQ podcast. I’m Charles Humble, one of the co-hosts of the show and editor in chief at Cloud Native Consultancy Firm Container Solutions. I think listeners to the show are probably familiar with the idea that the software supply chain is under increasing levels of attack with a number of major stories having hit the headlines in recent years. You might think of a NotPetya attack that targeted M.E. Doc in Ukraine in the spring of 2017, or maybe the SolarWinds attack from 2020. Attacks indeed have risen a staggering 650% in 2021 when compared to the previous year for a total of 12,000 malicious incidents, that’s according to, Sonatype’s 2021 State of the Software Supply Chain Report. Synopsis meanwhile, says that some 84% of commercial code bases have open source software vulnerabilities. And according to Gartner, that trend is likely to continue with the analyst firm claiming that by 2025, some 45% of organizations will have experienced software supply chain attacks.
My two guests on the show this week are both focused on the problem. Kim Lewandowski created the SLSA framework while she was at Google, and has since left and co-founded Chainguard where she is CEO. Michael Lieberman works in the software supply chain space, is co-chair of a CNCF financial services user group, is a SLSA steering committee member, and recently co-led the secure software factory reference architecture for the Security Technical Advisory Group. The supply chain problem is something that the industry is really just beginning to get to grips with, and the SLSA framework is still very young, it hasn’t yet reached 1.0 status. It is however, something that builds on the existing work that’s been done at Google, so it’s maybe a little bit more mature than its current status implies. It’s a very interesting topic, an important one for our industry. So welcome both Kim and Mike to the show.
Kim Lewandowski: Thank you, Charles.
Michael Lieberman: Thanks for having us.
Why do you think we’re seeing a growing number of supply chain attacks? [02:18]
Charles Humble: As I mentioned in the introduction, we are seeing a growing number of attacks really all the way along the supply chain from source, to build, to deploy. There are obviously a lot of factors in this, but why do you think it’s happening?
Michael Lieberman: I definitely think there are a lot of factors at play in there, but I think a common one is in the past several years with the big push for DevOps, DevSecOps, everything is code. It kind of goes across the board. That includes attacks. So attacks are now attacking more the code than necessarily just purely a firewall or just trying to steal a password and just get into one account. They want to figure out ways to sort of, “Hey, I get into a library somebody is using, and now I have not just access to one system or server. I have access to all of your systems or servers.”
Kim Lewandowski: Just to add to that, maybe we start with Michael’s is that just the attack targets are juicier now. So instead of just going after one entity or something, you can get malicious code into one organization and it spreads out to all of their users and impacts all of that population.
Charles Humble: I was thinking as well, I mean, it will presumably vary from organization to organization, but I imagine that in a lot of companies, the CI/CD pipeline is maybe something that is more of a developer concern. So it might be something that security teams don’t necessarily have that much oversight of or that much insight into.
Michael Lieberman: I think it depends on the organization or institution. It’s definitely been a big concern in a lot of regulated enterprises. They’ve had to have a lot of these things, at least conceptually in place. Practically there’s probably a lot of issues in implementing it, but it is a fairly common attack point for these sorts of vulnerabilities. Because when you think about it, when I’m running code, I want to understand what I’m running. When I’m running a build, a build is potentially running all sorts of arbitrary actions. So you could be potentially running a compilation, and what exactly is that compilation doing and how is it doing it? It can be very difficult compared to let’s say certain other things where if I have a piece of software and it’s managing a firewall, I know that that piece of software should only be managing a firewall and if I detect it doing something different, I know that something’s gone wrong.
Whereas when it comes to just generally like a CI pipeline, there are so many different things taking place, it becomes very difficult to know what to look for in there. So the attack can come from almost any direction.
What is SLSA? [04:44]
Charles Humble: Right. Thank you. Now Kim, I mentioned that you kicked off the work on SLSA while you were at Google. So could you maybe just talk a little bit about what SLSA, S-L-S-A, what it actually is?
Kim Lewandowski: SLSA as an acronym stands for Supply chain Levels for Software Artifacts and it’s a leveling system. So there’s four levels currently SLSA one, two, three, four. And the ideas that you get kind of stronger security guarantees the higher the level you go. For each level, there’s a list of requirements that are around the source control system that’s being used for the code, the build system and providence. And so a lot of the inspiration for SLSA came from what Google does internally for its production software. And so really where it’s used is engineers, they have to get their code bases to reach SLSA level three or four. It’s called something differently internally at Google, but before it will even be allowed to deploy to production. So it’s looking at some critical weaknesses like along the entire supply chain and trying to fill a bunch of gaps in for a lot of the attacks that we’ve seen lately. We’ve been mapping those attacks to the actual SLSA requirements and trying to make sure SLSA is as good in practice as it is in theory.
Charles Humble: Right, and it basically gives you a set of tools that you can use to start reasoning about the problem.
Kim Lewandowski: It’s a framework right now, so a specification, so a lot of terminology that we’re trying to get on the same understanding with the broader industry. So it’s more of like a supply chain integrity framework with some tooling that’s falling out of it, trying to reach the SLSA levels.
What challenges did you find as you started trying to apply those ideas to the wider open source community? [06:16]
Charles Humble: Now, given that it came out of Google and specifically out of the binary authentication for Borg work, and Google is obviously a very large company and also famous for things like having a monrepo. What challenges did you find as you started trying to apply those ideas to the wider open source community? I mean, I’m thinking that even just getting like a two-person review on a fairly typical open source project that might only have one maintainer, even that might not really be realistic.
Kim Lewandowski: I mean, that’s a great question. I think Michael can talk to some of the discussion that’s happening now within the framework. I think everyone recognizes that what works for Google won’t work for the entire sort of open source community and every company.
Charles Humble: Mike, do you have anything to add there, maybe particularly thinking with your regulated industry’s perspective?
Michael Lieberman: I think especially in sort of the regulated enterprise space or legacy enterprise space, one of the big things that sort of come out is Google might be sort of tracking these sort of things and how long they take in milliseconds, whereas for a lot of banks, pharmaceuticals, medical stuff, et cetera, they’re looking at a lot of these things in hours, days, weeks sometimes. Those are the timeframes. And so when it comes to applying these rules, these practices, there’s a different set of, I think, expectations that a lot of regulated enterprises are taking compared to your average tech company startups and those sorts of things.
Kim Lewandowski: I’ll just add to that. A lot of the inspiration was to really uplift the open source software that everyone relies on and fully recognizing that, like you mentioned, many projects only have one maintainer. But the thing I just wanted to point out is like the spec is about a year old now since we created SLSA and it’s very much in its early phases of really ironing out all the kinks. And so working with the broader industry and people from different backgrounds and different organizations and coming together and kind of trying to figure out those exact details that work for the majority, is really what we’re trying to do within the community and the project.
What is SLSA provenance? [08:16]
Charles Humble: Now one of the kind of key ideas in SLSA is this concept of provenance. Mike, maybe you could explain what that means.
Michael Lieberman: Just more or less means figuring out where something came from. And so one of the big things in SLSA is we’re not trying to say stop doing what you’re doing from a security perspective, do a whole set of other things, it’s more like a set of practices that are layered on top of what you’re doing to sort of make sure that the security actions you’ve performed, the builds you’ve performed, et cetera, all these different things that are part of your CI pipeline actually happened against the things you expected it to. So that means if I have some source code, I want to have the provenance. I want to be able to track a chain of custody of I downloaded this source code, I performed some linting on it, I performed some security scans on the code. I then compiled and built the code.
I then ran some additional security scans against it and then eventually pushed it into production, right? That seems like a very normal sort of CI layout. And what SLSA does is sort of says, “Great, now let’s track that. Did you actually pull down the source code that you thought you did, and do we have a record of that? Do we have a record that the security scans you ran were against the source code that you downloaded? Do we have a record that the code you built was actually the code that you downloaded and scanned?” Those are the sort things that provenance is trying to answer there.
Kim Lewandowski: One of the key principles is SLSA and is just to make sure that the software artifact that you’re getting hasn’t been tampered with. I think the reality of the situation today, and I used to be a developer and very guilty of this too, is you would look for a software package to do something that you’re trying to do and just go take it off the internet, not thinking about who’s behind the project, how it was built, anything. It was like, “I need to get my job done as fast as possible. What package might help me do that?” And then not think about any of the security things along the way, or what risk you’re opening yourself up to after you start using this thing in your production environments.
Michael Lieberman: And to some extent, a lot of folks in the regulated enterprises almost have like a little bit of the opposite problem, where it’s everything has been locked down for so long that in order to sort of keep up with everybody else, we need to have more automation to make it easier for developers to be able to use libraries that meet the compliance requirements. Because today a lot of times that is a lot of manual spreadsheets and manual tickets and weeks of meetings, whereas with stuff like SLSA, that could be sped up via policy as code. You could just sort of say, “Yep, this library meets all of our SLSA requirements. You are allowed to use it just by default. You don’t have to ask for approval.”
Charles Humble: Just to carry on thinking about the provenance side of things for a little bit longer, once I have my provenance metadata, what does that do? What at a high level are we able to do with the metadata that we have?
Michael Lieberman: At the highest level, the main idea behind SLSA and behind a lot of the supply chain security things that are going on is it’s applying zero trust principles to your software and the software you write. So the expectation here is yes, your code has probably already been compromised. So how do I make sure that I’m only using stuff that I expect to? So the sorts of things that at one level is going to be, you are looking at who built the code, as in what system? Who wrote the code, what developers? And as long as I assume like, yep, I trust all these people or I trust all these systems, then yes, that can move forward. And then there’s still sort of, I would say some conversations going on with how you would then be able to go out and independently verify. There hasn’t been a lot of conversation yet outside of just saying that that’s going to be definitely something in the future that we want to look at.
Kim Lewandowski: A lot of the focus too right now has been largely around the build systems and the build system environment. So I think ideally if we can kind of get to a place where we know specific build systems, they reach the highest level of SLSA then it gets us a lot further in the supply chain challenge than we are today. Just using a build system in general is a big first step. And then when we know that code was built on this hardened build system, then we can have more confidence that it was built correctly with the dependencies that were expected and from the source code that we expected.
How does that provenance metadata get generated? [12:35]
Charles Humble: So that makes sense from a theoretical point of view to me, but then how does that provenance metadata actually get generated?
Michael Lieberman: So there’s a couple of different ways. Largely, it could be generated by really anything that is tracking what the build is doing. However, a lot of the SLSA requirements like non-falsifiability and so on, they really sort of focus that your build service, your CI service should actually be recording what is happening in the build. So as opposed to having the build sort of record its own metadata, because obviously if that build is running arbitrary code, that build could compromise itself and generate bad data. So the idea there is you should have your trusted CI service that is looking. They’ll track everything that’s going through the logs, track the inputs to build, so like what source that got built and so on and what was the output artifact? Record all that information, and then one of the other key requirements there is to then have that build service.
Once it’s generated that metadata, sign it with an identity, preferably one that’s short lived, so as in like not a long running key. Stuff like OIDC identities and the like are pretty common nowadays, but I would say probably the most common way of generating that metadata right now.
Charles Humble: And because the metadata is signed, that prevents it from being tampered with at least to some extent.
Michael Lieberman: Yeah, correct. The signature serves a couple of purposes. One is to sort of verify that yes, the thing that had the signing credentials, whatever they may be, is the only one that generated that metadata. And it helps out because it also helps out that if something was compromised, you can now invalidate that key or those signing credentials, generate new ones and you’ll now know like, “Okay, these artifacts are potentially ones that are now suspect. We now need to reevaluate, rebuild against a more secure build system.”
Kim Lewandowski: It’s a common misconception that signatures mean the code is safe. And I think we need to sort of break that thought process throughout people’s minds. And it’s really to serve a couple purposes too. You can start defining policies around the signature piece. So to use the example of a CI system or something signing that artifact, an organization might say, “Okay, we only trust these particular CI systems.” And they can then make a policy that says, “Any code that’s not built by these systems we’re not going to use.” So it’s really about that policy piece at the end, that makes signatures pretty valuable.
But even if the metadata is signed, an insider can still falsify the results.
Charles Humble: Right. But even if the metadata is signed, an insider can still falsify results.
Michael Lieberman: A lot of it is inevitably going to be based on the organization’s own internal governance and policy. If you give developers access to modify those CI systems without adequate controls, sure. If you sort of lock those systems down enough where these systems themselves are also following SLSA good practices, where you have two person code review for changes to the CI system and so on, then that also helps out there.
What comes between SLSA level two and SLSA level three iun the context of the metadata specifically? [15:37]
Charles Humble: And then we’ve said that SLSA is a sort of four level framework as it currently stands. So maybe just using the metadata as an example, the signing comes in at SLSA level two. So how do we get to the higher levels? What comes between SLSA level two and SLSA level three and so on in the context of the metadata specifically?
Michael Lieberman: There’s a lot of debate on exactly what counts, and this is one of the reasons why SLSA is not 1.0 yet, but hopefully in the next few months it will be 1.0. But to some extent, a lot of the differences sort of come into making sure that you sort of retain the history of your source code indefinitely so that you could always go back and audit the source code that was used to build an artifact. Another thing to really note here is SLSA is not just purely to prevent necessarily supply chain attacks, but help you detect when they happen and also help you react, which is huge. And so if something does happen, you want to go back, you want to audit, SLSA 3 is about a lot of the stuff that helps you in that area.
So the requirements of keeping the source code and keeping the history of that source code indefinitely, the build should be 100% as code as in like there should not be manual steps in the build process and those sorts of things. There’s also another requirement around isolated in ephemeral environments. So that means if I run a build on a server, that server should be cleaned up. So that’s why builds are often in containers, right? You can get rid of the container at the end and spin up a new one, and it’s not like you don’t have any cruft from the old one. The environment should be isolated, meaning no build should be able to sort of interact with another build, right? Because if you do have a compromised build, one of the first things it always tries to do is reach out to the host and figure out, “What can I do here? How can I compromise everything else that’s going on?”
So SLSA 3 helps out in that case, because it’s like, if that build environment is isolated, it could only compromise its own artifact. You don’t have to worry about it compromising artifacts further down the line. And so those are just a few of them. And then the other big one is the non falsifiable requirement, which just pretty much means that as a developer, if I have a build .sh script, my build .sh script should not be the thing that’s generating the SLSA metadata because I could put whatever I want in there. The SLSA metadata should be generated by the build system that has gone through appropriate security approvals, and is validated to be generating good provenance.
Kim Lewandowski: Yeah, I think that covers most of it. And then there’s just a few common requirements that are tacked on at the highest SLSA level, just around more like physical access to machines and remote access and who has access to what? So doing the least privilege approach to these systems.
What’s between where you are and a 1.0 release of the framework? [18:07]
Charles Humble: In terms of the overall SLSA project, you said that it’s still not reached the 1.0 status. So what’s left to do? What’s between where you are and a 1.0 release of the framework?
Kim Lewandowski: I think a couple of the conversations happening now are around the scope of SLSA, like which parts of the supply chain does it cover, does it not cover? And then from there trying to break down the problem apart into a little bit smaller chunks. So I think Michael mentioned that we’re aiming for a 1.0 release. And I think that 1.0 release is really going to be scoped down around the build requirements, is my understanding. And so there’s a few other things on the roadmap. We keep hearing lots of companies that are trying to adopt SLSA and use SLSA, and it’d be great if we can get more folks to share their experiences and have some more of these case studies so the whole community can sort of use and look at them as references as they’re trying to apply the principles to their own projects.
Michael Lieberman: I think one of the big things for SLSA compared to a lot of other requirements you might be seeing coming out of NIST or some of these other things, is that we are trying to kind of focus on a limited scope first and expand later, because some of the feedback we have gotten from users is just a lot of times if I just get, “Here are the 500 things you need to do.” It can be very difficult to really understand, okay, how do I prioritize? Like some of the stuff there might be something that you only want to focus on after you’ve done 90% of the other things, and you might be spending most of your time on that one little thing.
So with SLSA, we’re trying to make it as easy as possible for folks to begin to adopt, to start that journey, which is also why if you look at something like SLSA 1, why SLSA 1 is very few requirements and we are also being very clear on in SLSA. Like if you’re following SLSA 1, that’s the start of your journey. That does not mean you’re secure in any way, it just means it’s the start. And then as you move on up, there’s going to be more and more and more. But I think those sorts of things are really important for a lot of end users to sort of slowly ease them into the supply chain security journey.
Kim Lewandowski: The other thing I’ll add is folks are working on the tooling. We talked about a little bit of that in the beginning. So trying to reach SLSA Level 3, and here’s kind of an example repo of how you would use GitHub actions to get to SLSA Level 3. I think there’s another project too, that is showing how you can emit the SLSA provenance data and how that can be generated, which can then potentially be used in making trust-based decisions within organizations.
How well does SLSA fit in regulated industries? [20:32]
Charles Humble: Mike, I mentioned at the beginning that you have a background in financial institutions predominantly. So I was interested in whether there was a particular sort of match between SLSA and the kind of things you are dealing with in a financial institution.
Michael Lieberman: I think actually one of the reasons why SLSA made perfect sense to me was in a previous life at a hedge fund, we had sort of built out something similar. Also we had some former Googlers, so there’s a lot of Borg authorization style of things in there. But at a lot of large financial institutions, we’re not just purely looking at what open source are we using and is that good? But we’re also looking at vendors, right? Are our vendors building the code the right way? Because a lot of times it’s not that the code that we’re writing maybe is fine, but “Hey, the code the vendor gave us has some issues.” And so there’s a big worry about external, and then there’s also a big worry about internal, right? There’s a huge worry about internal threat actors especially at investment banks, hedge funds and the like, where a big concern is that a developer is going to do something to front-run trades, for example.
And so they want to be able to sort of detect also the chain of custody of the software they’re writing so that they know that the binary, they can audit that code and make sure that no developers have done something bad there. But in addition to that, I think there’s also with the US executive order from last year around supply chain security, they did mention critical infrastructure and a lot of banks and financial institutions are a little surprised. They’re like, oh, banks count as critical infrastructure. So you will need to secure your supply chain, not just for the sake of it, but because there’ll be regulatory requirements coming out of it. So those are some, I think, big challenges that banks are facing.
And then banks, of course, also face the legacy challenge that a lot of the code that we’ve written at banks, some of that code is from the 1960s written in COBOL. And even if a lot of things have been replaced by new modern 12-factor apps or whatever, there still is that legacy there. And so it’s a huge challenge when trying to adopt new techniques, especially new security techniques.
What do you hope to see in 5 or 10 years time?[22:44]
Charles Humble: So if we now fast-forward five or 10 years into the future, what would the ideal state be? What would you like to see?
Michael Lieberman: I think in five to 10 years, the sorts of things I want to see coming out of SLSA are we do want to see folks adopting SLSA. We want to see wide adoption because I’m coming at a lot of this from the perspective of an end user, right? And so for me, it’s a lot about, “Hey, look at what SLSA can do.” And a lot of it is, I’m just going to be blunt, it’s irrelevant if the rest of the world doesn’t follow. There is some value in just sort of running your own stuff for SLSA, but you want to be able to also verify that your vendors, that your open source provider are also generating SLSA, because if they’re not, then there’s big gaps in that knowledge picture about your supply chain. The real goal of SLSA is to help you protect your supply chain, but also build that knowledge around what is in your supply chain.
SLSA helps you understand that yes, this build came from this source code and used these dependencies, and those sorts of things. And so that lets us build out. And today there’s not very many people using SLSA, so we only have a limited view into that supply chain. But over time, more people start using it, we get increased visibility into the supply chain and it’s useful not just for like any particular vendor, but it helps make the industry as a whole significantly more secure because we’re all following those good practices. We’re all generating good metadata. And in cases of open source vendor, we’re sharing that metadata with folks so that they can better understand that supply chain picture.
Kim Lewandowski: And I think for organizations too, it will help them do risk assessment and it gives them a benchmark for what risks they’re willing to take on. And then just to add one other thing, like some of the inspiration was it was just around open source software. Once we have this kind of worldwide picture of SLSA levels across these critical pieces of software artifacts that we’re all using, we can start breaking down that problem too. So say like the top 1000 critical open source projects, if they don’t have high SLSA scores, ideally we could start improving those as an industry and getting all of this critical software into a better state.
Are you starting to see any interest or take up amongst the open source communities that you are involved in? [25:03]
Charles Humble: Now it is obviously very early days, but I was wondering, are you starting to see any interest or take up amongst the open source communities that you are involved in?
Michael Lieberman: Especially in the OpenSSF and CNCF, there’s definitely interest in trying to make a lot of the projects that fall under their purview, start to generate that SLSA metadata. You see that now with Kubernetes, and we want to kind of build significantly more of that as time goes on. Because once I think a few big names start to generate SLSA provenance, it’ll help compel others to do the same.
If listeners want to find out more about SLSA, where would you suggest they look? [25:36]25:36
Charles Humble: Right. Yes. I’m sure that’s true. If listeners want to find out more about SLSA, where would you suggest they look?
Kim Lewandowski: So a couple different places, there’s a landing page at slsa.dev. S-L-S-A.dev. And I think it should have links to a few of the community places where we’re having these discussions. Like we have a community slack channel, there’s a SLSA repo on GitHub under the OpenSSF. And we have every other week community meetings I think on Thursdays 9:00 AM Pacific time that a lot of us get together and talk about different items that come up and what we want to do in the future.
Michael Lieberman: I second Kim on that. Mostly OpenSSF I think is super valuable. There’s also the CNCF under the Security Technical Advisory Group. If you want to understand how people are starting to use SLSA, I think that there’s some interesting work happening on that end, where we are trying to build architectures that make it easy to both follow the SLSA requirements as well as generate the SLSA metadata. And so there’s some interesting projects out there that are trying to be SLSA first.
Charles Humble: I will make sure to provide links to all of those resources in the show notes, which listeners can find on the infoq.com website. Thanks to everyone for listening and particular thanks of course, to Kim Lewandowski and Mike Lieberman for joining me this week on this episode of the InfoQ podcast.
Kim Lewandowski: Thank you.
Michael Lieberman: Thank you.
Mentioned
What are the major trends that matter right now in software development and technical leadership?
Uncover emerging software trends and practices to solve your complex engineering challenges, without the product pitches.
Save your spot now!,
.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.