Presentation: From Anti-patterns to Best Practices: A Practical Guide to DevSecOps Automation and Security
MMS • Spyros Gasteratos
Article originally posted on InfoQ. Visit InfoQ
Transcript
Gasteratos: We start with something we all know: fear, uncertainty, and doubt. This concept is truly ancient. While researching this, found out that this concept dates back all the way to the 1600s, but in the context of computers, because this conference is about computers, it started from a gentleman named Gene Amdahl in 1975, who used FUD to describe the sales tactics of his competitors, a really big company, three letters. Essentially, his competitors went to his customers and started saying that if you keep using Gene’s hardware, your business will fail, and we don’t know how it will fail. If you use our safe hardware, you will definitely succeed, so buy from us. As you see, in 2024, this is considered a little bit of a bad practice, despite the fact that is literally everywhere.
However, in all our operations, one thing we keep forgetting is that computer security is not about computers. LLMs have seven-year-old knowledge, but everything else is about the humans who write them, or better known as keyboard-chair interface. Humans do not perform very well when they are fearful, doubtful, or uncertain. Humans need a stable environment as a new management book says. However, despite that, we have seen an increasing commonality of people, consultancies, general security personnel using FUD tactics or buying into the industry’s FUD tactics. How many of you have said or have heard someone else say in your day to day, do what I tell you, or else we will be in trouble? I see the non-security people definitely have. We all agree that we have fallen, me first, into non-human efficient tactics. We know that this is not good already.
In this talk, I’m going to talk about five big mistakes I did, and I’ve seen industry friends and colleagues do while running security teams. I’ll talk about why were they mistakes and how we fixed it, mostly with technology, because it’s a technology conference. It’s not a management conference. We do not set policies here. We write software.
Background
My name is Spyros. You can find me on LinkedIn. You can find me on GitHub and see all the contributions I’ve done on GitHub. My wife says I’m a compulsive thinker. I prefer the term open-source developer. As such, I maintain the biggest knowledge graph of security information in the world, it’s called opencre.org. Yes, it has an LLM. Yes, you can prompt inject it to do anything you want. I, with some friends, we founded a small company that helps fix security teams, and we combine open-source projects to help fix security teams permanently, or at least a little bit better than giving them advice. I help with OWASP. I help with the Linux Foundation, and a lot of other places.
Itinerary
Here’s what we’re going to talk about. Here’s the five big mistakes. We’re starting with how I personally shifted left very badly. We move over to how I learned to shift left correctly, but then messed up offering bad security services. We go forward to reporting and measuring, and how I managed to mess up literally every single part of my journey so far, because if you don’t make mistakes, you don’t learn.
Observability
We start with shift left. How many of you are working in a shift left environment? Here’s how I messed up shifting left. I got all the open-source tools that seemed shiny and new, and they were in all the nice newsletters, and I autorun them. I played with them for as long as I had time, and then I outsourced them to developers, because that’s what shift left is? That’s not what shift left is, that’s what marketing says shift left is. When the developers would not fix anything, I would force strict SLAs, and I would complain. Obviously, this is not very good. In more specific terms, I run several SAST tools on a cron job per team. Any team that would fail a specific number of scans, essentially, they had more than specific number of criticals, I would block their production, as you do. I would demand everybody fix everything after three days. Can anybody guess what happened? How many of you are CTOs here? That’s who came down and did not really like it.
In the end, I messed that up. Why? Because tools create noise. Tools know nothing about the development, the engineering context and, generally, who are you and why you write software. Remember, computer security is not about the computers, despite what we all try to make it. Tools will report what they think makes sense. If it’s a vendor, it will report what the vendor thinks will give them bigger contracts, because that’s why they’re a vendor. I haven’t seen any business yet that does not try to make more money. That would be amazing. When I did all of that, when I autorun a bunch of tools, I lost a lot of human capital. I ended up with a bunch of developers who would not talk to me, and who would end up working around the tools and trying to find out what kind of code they can write, so that the tool and hence the weird security guy will not complain. Until a very helpful CTO sat me down and told me, you need to fix this. Great CTO.
How did we fix this? Eventually, we used technology that helps instead of hinders. If you have a tool that has 12 million rules and then fires all of them on every single PR, on every single line of code and gives you, for a JavaScript file, some results from PHP from 20 years ago, probably it’s not a good tool. We changed tools, and we used technology that helps. We added context, simple things. Is this team writing internal versus external software? We ignored findings automatically. Do we use PHP? No. Why do we care about PHP rules? If I had to do this again, I would dare to fix automatically. I would use LLMs, and I would prompt them to fix something. Yes, you can inject them. When I did that, my developers did write SQL injection in a comment so that the LLM would try to fix it automatically and then prompt injected the LLM.
That was great. I got them beers for that. In specific terms, here’s three tools that help. They’re all open source. dep-scan is under the CycloneDX umbrella. It’s a software composition analysis tool that has reachability analysis. Reachability analysis means that it tells you for a specific CVE, if your code actually uses it, that means that you can ignore CVEs that you don’t actually reach. CycloneDX is the biggest, to my knowledge, SBOM generation platform out there. We all know why SBOMs are important. I’m not here to preach using bills of materials. It allows you to find out what you have. Dracon is an open-source project I am very passionate about, and I’m very much involved in. It’s an open source ASPM that has capabilities to fix and send things to LLMs.
Here’s how specifically we fixed our bad setup with SAST. We tag teams per scope or deployment setup, as I said, internal versus external teams. We ignored noisy tools and rules. If I had to do this again, I would go to Google, choose Vertex API, use their very helpful code-bison model, and then write this simple prompt, and let it be injectable by whatever the developers choose to put in those brackets. Then maybe give my customers a little bit of an idea on how this can be fixed. Speaking of fixing, and after I figured out how to actually run tools and how to shift left successfully, as I said, use tools that help and use curated information.
Then a very small company called Google, released a security guide, and they mentioned in their security guide, shift left exactly zero times because they don’t do marketing as much. They don’t shift left in the same way as everyone else does. Instead, they introduce the term which is called secure by default. They have centrally managed services that they offer to their developers, which makes sense. I thought, let’s do that. Let’s automagically run some centrally configured immutable scans somewhere else, over the fence, and give teams only the reports with things to fix. In my specific example, we run an open-source container image scanning tool where teams would get results for their production images when they release.
If there were any critical, releases would get blocked. Do you see what can go wrong with this? I didn’t think, what can go wrong? Turns out a lot can go wrong because, in reality, we did not get anything secure. We got a lot of noise, and we got a secure baseline where we blocked a lot of teams. In best, the computer has no idea what it was saying, because a CVE, as I said, without reachability analysis, or without tools that help, means nothing. It has no idea what it means for you.
Hopefully, the teams that wrote those rules are experts in their field, but they have no idea what is your field or what is your context. Most often, a tool will tell you, fix this ID, this CVE-2007, or fix this lingo, fix XSS, or Log4j, or something like that. Or it will provide some generic information towards a generic fix from five years ago, which is the last time this rule was touched by that specific vendor. Your engineers end up with confusing information, which doesn’t really help them do their job. Humans need to ship, they don’t need to figure out what computer says.
Instead, we provided everybody with an immutable baseline. We sat down as a team and we said, as a minimum, we care about x, something that makes sense. Then we let the teams and the champions, so somebody who knows what they’re doing within the team, hopefully not the junior engineer or the intern, decide what their particular team or scenario needs. We customized both scanning and, most importantly, reporting based on that. In this way, the teams need less rules, and they need to translate less rules. Speaking in software terms or about tools, any modern ASOC or ASPM solution should help with that, because they allow you to tier and layer your rules.
Dracon, the tool I mentioned earlier that I maintain, is the only free and open source one I know. That’s why I made it. If it existed, I wouldn’t have made it. You can also do the same thing with GitHub, GitLab, or any CI/CD. You can layer rules on top of your tools. For example, how we fixed our container scanning, we used sane, distroless or minimal base images, so no Ubuntu, Alpine, or Debian, unless they’re official images. If they’re official images, if I use Postgres bookworm, for example, and there is a high severity CVE on Debian, it is literally everybody’s problem, and I hope that the Debian team has more resources to fix that than my tiny security team, so it doesn’t matter. We sat down and decided that, as a minimum, we care about our own code first and foremost, and then everything else. Then we let teams and champions decide what criticality level they need to fix.
Reporting
After we finished scanning and observability, we went to reporting. In my mind, reporting is how you tell people what needs fixing and how it needs to be fixed. After all, security is a very consultancy-based function. Reporting is our interface to the outside world. For me, when my interface was bad, my service was even worse, and people would not grab beers with me because they hated me. How many of you use reporting or have reporting dashboards? Do you look at them every month? Do you look at them every week? Every day? Here’s a problem statement that I run with, we, as security people, deal with a huge amount of data points daily, and a lot of this information is hyper-specialized. You need a lot of training to do security, otherwise we wouldn’t be here. There is a large barrier to entry and a high degree of specialization required to understand those data.
If you mess up, and you did my mistake, where I misused single pane of glass dashboards, you are going to end up where you started, with a lot of extra steps. I personally fell for that, hook, line, sinker, rod, the guy who held the rod, the pier, literally everything. I maintained a Google Data Studio with over 12 pages of dashboards. I pulled data from Jira, every single tool we had, AWS, GCP, literally everything that could produce a data point. Because that was not enough, I also installed and maintained an open-source vulnerability management solution, which could produce even more data points. It’s called DefectDojo. I made in DefectDojo a single page for every single team.
That took a couple of months, and here’s what happened. I had so many dashboards and the information was so dense that I ended up not looking at any of these, since it required me 30 minutes every day just to decode what happened from yesterday’s scan, and what is our current vulnerability setup. In the end, I had a single pane of glass, which was what every vendor was saying back then, to yet another shiny new thing. My developers needed to sit down and decode all this information that did not make any sense to them. My excuse for maintaining it for way too long was, it has graphs.
The answer to this is, precisely zero people care. It took way too long to realize that before Andra Lezza from OWASP here in London told me exactly that, that was a quote, and suck me out of this. Which is great when your friends are being a bit mean, so you figure out reality. Also, maintaining all these things has a very huge operational cost, because data shipping around is not cheap. Otherwise, we wouldn’t have that many super big companies that specialize in that.
Here’s what we did and how we fixed it. We presented only relevant and cherry-picked information on dashboards already used by each stakeholder. For developers, that was their ID as a minimum, as like ideal, failing that, Jira, GitHub, GitLab, or something else that they need. Here are some more specific examples. Some engineers, let’s say your average backend developer, needs a statement of work, a well described ticket. Some other engineers need some problem statements because they do root cause analysis. I don’t know about all of you who are SREs or the SREs you have worked with, but the SREs I was lucky enough to work with wrote Terraform or CloudFormation, and they didn’t click around on AWS console.
When we had 3000 instances of a firewall that was misconfigured, hopefully somebody did not spend their days and nights clicking 3000 times, misconfiguring firewalls on an AWS console. It was a single line on Terraform that they needed to fix. We decided to aggregate by that. For directors, this is a little bit harder. Directors usually vary. There are directors who are very technical and who code, and there are directors who just want to see aggregates and risk, cost, and benefit analysis. For you, any metrics you want. The important thing is only relevant, only cherry-picked information. Data lakes become data swamps very quickly. Please don’t.
Speaking of data swamps and reporting and dashboards, one thing that I failed to do, and I also fell for, was I expected my audience to know my terms. It is very easy to fall into this for all of us who come from the development world or are mostly technical. Technical people tend to evaluate each other on our knowledge of three letter acronyms. After all, why shouldn’t you show your customers that they can trust you by showing them that you know all the terms and all the CVEs by heart, and all the IDs of all the Metasploit exploits. It makes sense? Not really, because this is alphabet soup, and technology is already an alphabet soup.
If you layer security lingo on top of it, it becomes this spaghetti pile, which sounds delicious, but is very hard to digest. Here’s an example of tickets I would open. You have a SQLi on service blah, doesn’t matter, which prevents your product from getting PCI due to violation of clause 3.1. In my mind, I am a genius. I know what SQLi is. I know PCI by heart. This is great. Any average developer who reads it, it’s typical FUD ticket. I might as well communicate in Klingon, or with colors, or tap dancing, because what is SQLi? What is PCI? What is the clause 3.1? You mean it will prevent my product from going into production? I will lose my job. Typical fear, uncertainty, and doubt ticket. It doesn’t even tell you how to fix it. Pretty much terrible information.
There are enough TLAs, three letter acronyms, and there is more than enough poorly written and conflicting information, and even more, security standards. A while ago, ENISA, the European Cybersecurity Organization, did a study on how many security standards there are out there. The study is a list of existing security standards. How many think that it’s under 100 pages, over 200 pages? It’s exactly 200 pages. This is insane. Just a list of security standards out there is 200 pages. It’s longer than the average book that I’m willing to read on my holiday. It doesn’t make any sense. You shouldn’t write more security standards unless it’s a new emerging technology, at which point you should only write about the new things.
This is why, a while ago, with a bunch of very nice people at OWASP, we created a very nice piece of software which allows you to link, or, as a last resort, create information tailored to humans, reviewed and maintained by the community. We created something called OpenCRE. It’s free. It’s open source. It’s open data. You can find it at opencre.org. OpenCRE is, to my knowledge, the world’s largest knowledge graph of security information. Instead of linking to standards or telling people, fix PCI, which means nothing to most people, you can tell them, go and fix CRE 616-305, with a link to this page.
This page contains ASVS, it contains ISO, it contains OWASP Proactive Controls. It contains cheat sheets further down, and a bunch of NIST. This is most information that anybody in a company should need to have access to, at least according to my humble opinion. We created this and maintained this with a group of very talented people, one of which is Rob van der Veer, which is the coleader on this project, and it holds most information around cybersecurity. If you find standards that are not there, tell us, and we’ll add them.
Measuring
Speaking of metrics, and speaking of a common language, because OpenCRE gives you a common language to connect with the rest of the organization, we all deal with numbers, and we all like measuring things. Personally, what I measured defined the way that the whole company did product security. Changes in what I measured influenced the security culture big time. Who here uses metrics to derive decisions? Metrics are important. Who here makes decisions because of metrics? I have too high a vulnerability of a specific type, that means I need to do something. For me, I was a developer first, then I became a pentester/red teamer, and then I moved on to an internal security person. While I was a pentester/red teamer, I had the idea that all those devs make terrible mistakes that I have to find and fix, because I am a hacker.
As you see, this is a terrible approach to have, and it gives you terrible data. Hence, I would measure resolution time by SLA and misses, so I could punish people. I would measure vulnerabilities by severity, because, of course, my tools say only the truth and nothing but the truth, and they are infallible. I would measure amount of findings per team, so I would know the teams that need training, and then I would push training on them. You can see how this can go wrong? I couldn’t back then. Contrary to popular belief, data does not tell a story. You can torture anything from data, is what my academic friends say, and it is true. What you torture out of the data usually is what tells the story, and how you interpret the data matters.
Here’s why these measurements at least were terrible for me. Measuring a resolution time or SLA misses shows that there’s maybe a problem. Maybe a team is overworked, or maybe your findings are getting ignored. Maybe a team needs to log in to yet another single pane of glass that you introduced to look at things, and they forget. This is smoke, but you have access to the team, and you have access to the data, so you have access to see the fire that is generating the smoke, because you have access to a high-powered thermal camera. Instead, you are removing your glasses and squinting and trying to see the smoke out there, which does not make any sense.
Grouping vulnerabilities by severity. We just established that severity is widely inaccurate. What the tool says means nothing. It’s what the tool thinks it should say, and creates FUD. Most of my tools say that everything is a critical most of the time, despite the fact that the files that I’m scanning are test files, nobody cares. Amount of findings per team, quite inaccurate, as we said. If you have 3000 misconfigured firewall instances, it’s probably one dev coding at 3 a.m. after a night out or during a night out sometimes, and they miss the line in their CloudFormation or Terraform, or maybe they were seeing double. I know I have. Last training taken.
If you measure training taken and you punish people on not taking training, for me, they hated me. I had several lead principal engineers who were missing training constantly because they’ve taken the training 20 times in a row. They know everything there is to know about XSS, and they do not need to be reminded during a crunch time what XSS is, or take time out of their very busy day to learn SQLi, which at this point is a 15 year old vulnerability, or more.
Here’s how you can fix this. The theme about this presentation is security is about the people. Here’s a derivative thing that I want to leave you with. Security team is not the police. We’re not there to punish. We’re not there to dictate. As security team, I like to think of us as the team’s physician. We’re there to observe, measure, detect, alert, and prescribe good ways of operating or remediation as a last resort. The police would tell you what I said at the beginning of the talk, “Do what I tell you, otherwise we’re in trouble”. Your physician would mostly tell you that, “I’ve noticed that you don’t exercise, eat McDonald’s multiple times a day, and you have trouble getting up my stairs. Here’s some training and exercise regime that I prescribe so that you can live past 50 years old”.
You see the difference, it kind of makes sense. The first one is punitive. If you find out architecture or services gaps, then you can offer a better service. Your physician can offer you better advice. If you cluster vulnerabilities by root causes, you can again find out better services and secure by defaults to create. If you measure security services used, you can avoid wasting time in services that sounded like a good idea at the beginning, but nobody uses. They are either not fit for your context, or they are definitely misimplemented somehow. If you calculate your tool effectiveness, you realize that your average SAST has at least a 70%, maybe a 90% false positive rate.
Perhaps you should figure out what to do with that. This is very hard to do, but as security staff, it makes sense to hold ourself to a very high standard. If you calculate your overall risk, you can save resources that you would otherwise waste in teams or situations or scenarios that seem to be interesting, for example, your brand-new R&D application, but have very little overall risk. After all, your new chatbot might be really cool to threat model and spend a bunch of time preventing prompt injections. If your chatbot only has access to reply according to your FAQ that’s already published to your website, it doesn’t really matter if every second reply has the word batch in it, because you can already do that.
Key Takeaway
Security is about the humans. As a security team, we’re not the police, we are the team’s physician. We are there to inform, alert early, and help fix.
See more presentations with transcripts