Month: March 2024
MMS • Zack Butcher
Article originally posted on InfoQ. Visit InfoQ
Transcript
Butcher: I’m going to talk a little bit about zero trust. We’re going to cover a few different things. We’re going to cover these four things primarily. We’re going to get a working definition for zero trust that is tangible for folks. I’m really tired of the FUD around zero trust, so we with the SP got a very specific definition. I’m going to introduce that as identity-based segmentation. We’re going to discuss a possible way that we can implement that using a service mesh as one of the architectures. More generally, I’m going to outline how we can move incrementally from network-oriented policy into identity-based policy.
Why am I here talking to you all about it? I was one of the early engineers on the Istio project, one of the more popular service meshes. Before then, I worked across Google Cloud on a whole bunch of different stuff. I jokingly say if you do enterprise-y things in Google Cloud, I probably worked on some of that code. Specifically, like projects were my baby for a long time. From there, we actually did very meshy things at Google and deployed that architecture internally, and said, we think this might solve some powerful problems in the Kubernetes space that are coming up. That’s when we then started to work on the Istio project. One of the other big hats that I wear, and probably the biggest reason I’m here talking to you is that I co-author a set of security guidelines with NIST. I work on two sets, the series SP 800-204A and SP 800-204B I helped write, which provide guidelines for microservice security. The second one, which is hot off the press, in fact, it was just finalized, it’s been in draft for a few months. It was just published, 207A, and that’s all about zero trust. The 207 series is NIST series on zero trust. 207A is the second installment in that. That’s really what we’re going to be digging into. This is relatively new stuff. Again, we’re going to break it down in this way.
What Does Zero Trust Really Mean?
First, let’s take a step back. What does zero trust actually mean? We’ll walk into a definition. The key thing I want you to understand is that a motivated attacker can already be in the network. The question is, how can we limit the damage that they can do? It’s all about risk mitigation. I want to take a step back, and then we’ll build into a definition. It’s something hopefully most folks are familiar with, is something like an API gateway or just serving an application to end users. There are a couple things that we always do. We always authenticate the user when they come in. We hopefully always authorize that user to take some action. If we’re talking about an API gateway, maybe we additionally do some rate limiting. Maybe we have some other policy like WAF that we might apply. There’s a bunch of different stuff that happen in the front door. Two big ones are that we always authenticate and authorize the user. As we’re thinking about somebody can be in the network, somebody can be inside the perimeter, how do we minimize the damage? Then, definitely, we probably want to start to do that same kind of authentication and authorization for the workloads that are communicating as well. Not just, do we have a user in session, but do we know what those users are using. We want to be able to say, the frontend can call the backend, the backend can call the database. On all those hops, there better be a valid end user credential. Maybe we might even go further. We might say, if you’re going to do a put on the frontend, you better have the right scope in the end user credential. If you’re just going to do a get, then maybe you just need the read. We can build really powerful policies as we start to combine these.
Again, I led off with the attacker is already in the network. Every single time I give a variation on this 207A talk, there’s a different example that I can cite of a network breach. In this case, for folks that follow the news, there was a big bulletin put out by CISA in conjunction with Japan, about network breaches driven by China in a variety of networks, for example. That one’s very relevant. This one is from a decade ago. Anybody know offhand where that picture is from? That is actually from the Snowden leaks. Where U.S. Government was in the infrastructure. The point here is that a motivated attacker can be inside the network. If our only control, if our only security is at the perimeter, then you’re already cooked. This then brings us into zero trust. We want trust that is explicit. I actually hate the phrase zero trust, it really should be called zero implicit trust, because it’s all about making where we have trust in the system explicit. Hopefully doing that with a policy, so that we actually have a piece of code that enforces it. We want trust that’s not based on the perimeter because it’s breachable. Therefore, instead, what we need is a decision that’s based on least privilege. We want to do it per request, because requests can be, you may be doing different actions. We want it to be context based. If you’re accessing from the U.S. a whole bunch, and then randomly, 5 minutes later, you’re accessing from an entirely different geography, something’s probably a little fishy there. We want to have context-based decisions. Again, those need to be on identities. More than just service identity, we also want end user identity and potentially also we want device identity. Because all of those factor in to the context for how we want to allow a user to access a system. If I’m on a corporate approved device, and I’m logged in, and I’m coming from my home geography, I probably can have a lot of access. If I’m on an untrusted random device that’s popping up in a weird geography like Russia, or Eastern Europe, or something like that, then probably I want to give less access to that, because that’s not typical for how the system operates.
Identity-Based Segmentation (Zero Trust Segmentation)
With that high-level definition for zero trust, let me bring in identity-based segmentation. This is the main thrust of 207A. We defined a few different things, probably three things in 207A that I’ll talk to you about, but identity-based segmentation is the big one. Microsegmentation is isolating workloads at the network level. Ideally, we want to do that down to the individual IP address, so we have pairwise policy for who can access whom. We want to do the same thing at the identity layer. We want tamper proof cryptographically verifiable identities for the user, for the device, for the service. We want to use those identities to make that previous decision that I talked about, this context-based per request, least privilege decision. Ideally, we might use network parameters as part of our risk-based assessment. They should not be the only discriminate that we’re making an access decision on. However, again, we’re going to be cooked because a privileged spot in the network is not a good enough security boundary today.
What is identity-based segmentation? It’s these five things. If there’s anything that you walk out of this talk with, if this is it, these are the five runtime activities you need to be doing at minimum, these things, and you can call yourself a zero-trust posture. There’s a lot more we can do besides these, and maybe should do besides these. This is a minimal working definition. We want five things, encryption in transit. Really, we want this for two reasons. One, we want message authenticity. I want to know that somebody can’t change the message that I sent. Then, two, we want eavesdropping protection. I want to make sure that somebody else can’t look at the data that I’m sending if the data is sensitive. Then, on top of that, we want to know, what are the workloads that are communicating? Is the frontend calling the backend? Is the database calling the frontend? Which parts of our system are communicating from the software perspective we’re deploying? Then, with identities, we should authorize that access. Again, we should do that per request. Then, exactly like I said before, we want to additionally incorporate that end user credential and the end user authorization decision as well. We want to do all five of those things. Ideally, we want to do them at every single hop in our infrastructure. If we achieve that, if we’re doing this, then our answer to, what happens if there’s an attacker on the network? Hopefully, we have a pretty decent answer. Because the answer is, now they need to steal a credential. We’ll get into the model a little bit more, but they need to steal an end user credential. They need to compromise a workload or steal a workload credential. Those are very ephemeral credentials too. I’ll talk about the service mesh, and Istio in particular, workload identities may last 24 hours or even as little as one hour. End user credentials typically last on the order of 15 minutes or so without refresh. We can start to combine these things, we start to limit an attacker in time and in space. We mitigate the damage that they can do. I’ll touch on this more.
Service Mesh
The service mesh is one of the key ways that we discuss implementing this in the SPs. It’s not the only way that you can implement these capabilities, but it is a very powerful way to do that. I’ll dig into that. If you’re interested in some of the other use cases beyond identity-based segmentation, there’s some of the other SPs that cover microservice security that go into service mesh a lot more. How can we use a service mesh to do the segmentation? Just to level set for everybody what even is a service mesh, what we do is we take a proxy, a web proxy, a reverse proxy, in Istio we use the implementation Envoy, but you can think of something like an NGINX, or similar as well. We put that beside every instance of every single application, and we force all network traffic into and out of the application through that proxy. What that lets us do is have an application identity and do encryption on behalf of the application. We can make load balancing decisions on behalf of the application. We can do service discovery and load balancing. We can do things like resiliency, timeouts, keepalives, retries on behalf of the application. You can get operational telemetry out of the system to help you close the loop. When you make a change, you can see what the effect of that change is, is it doing what you intended or not? Then change it, again, if it’s not.
Again, there’s quite a few capabilities here. If you’re going to have any distributed system, you have to have these capabilities. The service mesh is not novel in bringing these capabilities into play. What is novel about the service mesh, is that it lets us do all of these consistently across your entire fleet of applications, regardless of the language that we’re in, because we’re working at the network level, and we’re intercepting network traffic. It gives us centralized control, so we have those proxies, that sidecar beside every application instance. We have a central control plane that manages it. We can push declarative configuration to that control plane, and it will enact that configuration on all the sidecars in the order of less than a second, typically. We have a very fast, very responsive system. We can see signals out to see if it’s in the state that we want it to be. If it’s not, we can push configuration, have it go into effect almost instantaneously. Then watch and see if it is in the state that we want it again.
I worked on Istio. Istio is the most widely deployed mesh. Istio is also a CNCF project now. Envoy the CNCF project is the data plane there. Envoy is handling the bits and the bytes of your request. It’s the sidecar. Istio is what’s programming that Envoy at runtime. A couple key ideas that I want to give you all as well, as we’re thinking through this framework of, how do I limit access for zero trust? How might we use a service mesh? The first key idea is that that sidecar forms a policy enforcement point. The idea is because it’s intercepting all the traffic in and out, we can use that to put whatever policy we want. I mentioned things like encryption. I mentioned things like observability. We can also do things like integrate with OPA. We can do things like integrate with our identity provider to do SSO on behalf of applications. We can build a custom library that does something that’s specific to our business, and enforce that it’s called via the service mesh. The idea is that when we have policy that we want to apply across the board, not just to one set of applications, but to most applications in our infrastructure, we can leverage integrating at once with the service mesh to do the enforcement and not have to go app by app.
If you go back to the old school access control literature from the 1970s, that’s where the notion of a kernel comes from. That’s where we talk about the idea of a reference monitor, the thing that intercepts all the traffic and enforces policy. Then extending that, the service mesh itself, and this is one of the key ideas why the NIST folks were so interested in the mesh, can potentially be that security kernel for our modern distributed system. If we think of the operating system kernel and the security capabilities it provides for processes that run on our operating system, it provides isolation. These days it provides cgroups and namespaces to provide soft multitenancy with containers. It provides the Unix file permissioning, as another mechanism of access control. There are quite a few access control systems that we build into the operating system kernel, and we leverage them to protect our applications. The idea is that the service mesh can be the same thing in our distributed system. Rather than deploying processes onto a host, we’re deploying microservices into our infrastructure. In the same way that the operating system provides a key set of capabilities, regardless of the application, the service mesh can provide a consistent set of security capabilities, regardless of the app that’s running.
In this way, the service mesh facilitates cross-cutting change. I mentioned we can integrate whatever policy it is, whether that’s traffic routing, whether that’s security, observability, we can integrate that into the service mesh to enforce it. We can change it and manage it centrally, which means that a small group of people can act on behalf of the entire organization to enact policy and make change. There’s no free lunch, we can’t get rid of the complexity. We want to do encryption everywhere. Somebody still needs to integrate with a PKI, and to get certificates there. The service mesh lets you do that one time, integrate it with the service mesh, and then all apps benefit. One example. It can be a force multiplier that we can use to concentrate the complexity that we need to deal with on a small team that’s well equipped to deal with it. Rather than, for example, in the case of encryption, needing to make every app team go and implement some encryption library in their own system.
To bring this back, if we look at our five tenets for identity-based segmentation, we can use the service mesh to achieve all of them. First off, we can achieve encryption in transit. The service mesh gives you a strong application identity. We use a system called SPIFFE to do this, and we issue a normal certificate that encodes the application identity, and we can use that to do encryption in transit, mutual TLS, specifically. Then using that certificate, we can authenticate our workloads that are communicating, which means then that we have the opportunity to apply authorization. We know exactly what workloads are communicating, we’ve authenticated them. We know for real, it’s them. Now we can actually decide, is the frontend blob call the database directly? No, probably not, it needs to go through the backend. We can start to enforce that policy. Then we can integrate the sidecar as a policy enforcement point for our existing identity and authorization system for end users. The service mesh itself is not well suited to model user to resource access or that kind of thing. It’s good at modeling service-to-service access. We want to delegate out to our existing systems to handle end user authentication and authorization. The service mesh provides tools for integrating with those systems.
Moving Incrementally
The final thing that I want to cover then, is how can we start to move incrementally from the current system that we have today, a perimeter-based system, into some of this identity-based policy? What are some of the benefits of doing that? First off, policy at any one layer is going to have shortcomings and pain. If we think about just network policy and identity-based policy, one of the biggest pain points with network-based policy today is it usually takes a long time to change. Who here has like a firewall in their organization that they have to go change when they do stuff? Who here can change the firewall in less than two weeks? Normally, every single organization I think I’ve ever talked to ever, it’s six weeks to change the firewall. I don’t know why it is. How long does it take? It’s six weeks. Got to go to the spreadsheet, got to find the CIDR. It’s not well suited for the highly dynamic environments that we’re deploying in today. Cloud, in particular, with auto scalability and things like that are not well suited to traditional network controls. We either need to use more sophisticated and usually more expensive technology to do that, or we need to hamper our ability to use things in cloud, slow down the organization to fit it into the existing process, which is a nonstarter.
Identity-based policies, and this is something as an industry, we don’t have a lot of experience managing identity-based policy, in this sense, in the same way that we have with network-oriented policy. Some of the challenges people haven’t hit yet. One of the more obvious and initial ones that you hit is that even something simple like a service has a different identity in different domains. If I’m running it on-prem on a VM, it probably has a Unix username that we’ve allocated in our system for it to run as, but if we’re in cloud, GCP, it’s going to have a service account. If it’s in AWS, it’s also going to have a service account, in Azure and many of the other ones. The problem is those service accounts aren’t the same. If I want to do something like, the frontends can call the backend, and I want to do that regardless of where the two are deployed, then there’s some complexity in how I map those identities across different identity universes, like different cloud providers and on-prem. Then, finally, the other big 800-pound gorilla in the room is that even if we totally bought into identity-based policy, and we say, yes, that’s the way, in most regulated environments, we can’t actually get rid of the network-based policy yet. Because either auditors or regulators expect it, or I have a document that was written in 1992, that sets the security posture for the organization. It says that there have to be network segments. It’s more expensive to change that document than it is to pay Cisco millions of dollars to do network segmentation. For a variety of different reasons in organizations, we can’t totally eliminate network-oriented policies or network level policies, even if that’s desirable. I don’t think it necessarily is.
Multi-Tier Policies
Instead, we want these two layers to live together. What we would like to do is start to tradeoff between the two, so that we can minimize some types of policy in the network level, offset them with identity-based policies. The hope is that we can get more agility out of our system, as a result. I’ll give you some specific examples of doing that. If we talk about multi-tier policies, there’s many more layers of policy, actually, but we try and keep the SP very short and focused. It’s about 18 pages long, about 12 pages of content, if you want to go read it. Most SPs tend to be 50 or 60 pages long. We really kept it focused. At minimum, two policies, network-tier policies, things like firewall rules, identity-tier policies, things like I just talked to you about, like application based stronger identities with authorization. What we can start to do is, where we need to do things like traverse a firewall, we can start to incorporate identity-based policy, and relax some of our network rules. In particular, today, with a network-oriented scheme, when I have two applications, maybe one’s on-prem and one’s in cloud, or it’s my two on-prem data centers or two different cloud providers, doesn’t really matter, and I want them to be able to communicate. I typically need to go change the firewall rules in my organization, pairwise. I need to say App A can now call App B, or Service 1 can call Service 2. If tomorrow Service 1 needs to call Service 3, I need a new firewall change, and I’m back in the six-week cycle of firewall updates. This is a huge killer of agility in folks that we talk with.
Instead, what we can do is deploy identity-aware gateways. Instead of having these pairwise rules that we need to update regularly, we can deploy these identity-aware proxies or gateways, instantiate a single set of firewall rules that say, these two sets of workloads can communicate. Then, when we have pairwise applications that need to consume, we can authorize that with an identity-based policy that says, Service 1 and Service 2 can communicate over the bridge. In this way, what we’ve done is we still have all of our network controls. Our user requests still traverses all the network controls that the organization already had, but we’ve been able to offload the policy change from network to identity. The key idea is that a network-based policy is a CIDR range. Who knows what a CIDR is and what app it is. An identity-based policy should correlate very strongly to the application that’s running. If your runtime identity doesn’t match the application’s identity internally, something is probably a little funky. We want these things to line up. Typically, we can change an identity-based policy much more rapidly, because a human can read and understand the frontends in the default namespace, once it’s called the backend and the default namespace, a lot more easily than I can understand that 10.1.2.4/30, can call 10.2.5/24. Who knows what that means? That’s why we can get a faster rate of change by offloading here.
More than just two layers of policy, typically, we have quite a few layers of policy in our systems. We typically have a coarse-grained segmentation of internet, intranet. Likely we have broad segments there, a DMZ, an app zone, a business zone, the data zone. Again, we already tend to stack these policies today. For folks that implement microsegmentation in their networks, how many of you all did away with your coarse-grained segments when you moved to microsegments? The answer is we tend to keep both and stack them additively. Then Kubernetes brings in a whole additional challenge because it’s a virtual network. We have a new set of techniques like the CNI providers to control network policy there. Those have some better tradeoffs versus traditional network rules, because they’re built for the dynamism. Fundamentally, there’s still a layer 3, layer 4 policy. Whereas we really want to get application identity and end user identity, those are layer 7 things. That’s why we like to see the service mesh stack on top of all of these. I think of it as almost a layered cake, as we’re going up. You can think of it as Swiss cheese, if you really want to think about the defense in depth. We’re trying to make it hard to get through.
Advantages of Multi-Tier Policies
Why would we do this? One, if we do it in this way, then we can start to sit these new identity-based policies on top of our existing network policies. It provides a defense in depth because, again, we’re still going through our traditional network controls, but we now have this identity layer as well on top of the network controls. Of course, the service mesh already mentioned, can enforce this, because it’s non-bypassable, it’s verifiable. We talk about this at length in SP 800-204B. The key thing that I want you to take away is that we don’t need to get rid of firewalls or WAFs or similar. What you should feel comfortable in justifying to your organization is that you can relax those controls in exchange for introducing identity-based controls. The key is in your organization to get the right balance so that you can still move fast, but keep the security and the risk side where it needs to be for that part of the org.
How to Define and Deploy Identity-Tier Policies
At a high level, how would we start to do this? What we want to do is begin to implement identity-based segmentation in the subsets of our infrastructure. If we have multiple data centers, start by just trying to implement that policy in data center. Then after you have a good notion of this identity-based segmentation, then we can start to do some of the more advanced patterns, like tiering these gateways together and some of those patterns as well. Certainly, you don’t need to have 100% identity-based segmentation rolled out to be able to do some of those gateway style patterns. You do need at least enough to be able to authenticate and authorize the services and the users that are going over that tunnel. Again, same controls that their user app traverses, that’s key, because we don’t want to have to go change the existing security policy, but we do want to make things better where they are so that we can move faster as an organization.
Zero-Trust Mental Model (Bound an Attack in Space and Time)
I mentioned the big mental model when it comes to zero trust. We want to bound an attack in space and in time. We want to minimize what an attacker can do inside. One of the key things there is that by stacking these identity-based policies with the network-based policies, we help bind an attacker in space and in time. Obviously, network-based policies impact when an attacker can pivot to attack when they already control one workload. In the same way, fine-grained authorization policies at L7, help limit the blast radius of what an attacker can get to. Then, like I mentioned, those ephemeral credentials. End user credentials usually have a 15 minutes expiry, and the service mesh service credentials tend to have anywhere from 1 to 24 hours expiry. For an attacker to perpetrate an attack, they need to have both the service credential and an end user credential. They need to have the right scope so that we can actually get through and pivot to the system that we actually want to target. Because of the expiry, I need to either have a persistent control of the workload so that I can continue to re-steal those credentials, or I need to perpetrate an attack repeatedly to re-steal after they expire. Either way, the goal is make it as hard for the attacker as possible. Both of those schemes help increase the difficulty, help bound an attacker in space and in time.
Questions and Answers
Participant 1: With all these identities and [inaudible 00:29:43] with credentials, I think debugging this configuration will be tricky. Do you offer any tooling, like formal tools to verify the credential?
Butcher: This is an area that’s still pretty nascent today. The service mesh is not the only way you can implement those controls. There’s a lot of ways that you can implement those. One of the advantages of the service mesh is that it produces metrics out. You can actually observe, what is the state of the system? Let me apply a policy. Is it correct or not? What we’re starting to see then is tooling built on top of that, and it’s still early days there. For example, one of the things we have is a CLI that will basically say, give me 30 days of telemetry data, and I’ll spit out for you fine-grained authorization policies that model that 30 days of access. Tooling is getting there, but it’s still very early days for it. That’s one of the things I think we’ll see mature pretty quickly.
Participant 2: Since you mentioned service mesh is just one layer you might drop it for this auth. Service meshes have a big [inaudible 00:30:59] especially at some very large companies’ scale, if you want to run on the same control plane, it can present a large single point of failure. For those of us that will maybe sell us service mesh, if you want, but what recommendations do you have for those [inaudible 00:31:24]?
Butcher: In this world, consistency is your biggest asset. If I’m coming at this de novo, I’m either a small organization, or I’m trying to approach this without service mesh. There are some maybe modern deployment things that we can do with the service mesh to mitigate some of the pain you have, but not all of it. In short, what I would start to do is focus on libraries. For example, if you’re in a smaller organization, where, hopefully, we have a consistent set of languages that we’re dealing with, especially if you’re a pretty small shop, you have exactly one language that you’re dealing with. In which case, first off, make it a monolith. Secondly, implement those exact five controls, but just do it in library. Nowadays, you can get pretty far with a framework like gRPC. Even regardless of language, you can get pretty far with a framework like gRPC with respect to implementing all five of those controls. My first piece of advice would be something like that, tackle it with either a library, if you have a very small set of languages. If you’re a small shop with a small number of services and small number of developers, don’t do microservices. Why would you do that? Keep it as a monolith for as long as you can get away with keep things simple for yourself. That’s how I would start.
Then, again, as we’re growing up in scale and heterogeneity, either you allocate a bigger team to do things like that library work in the different languages that developers are developing in. There’s a set of challenges there around update and deployment and lifecycle. Eventually, that’s where we see the tipping point for service mesh to become interesting for folks, is that the work to maintain a consistent set of libraries updated across the organization, and have compatibility across different versions, and all of that, tends to be expensive. Things like gRPC make it a lot more tractable, and not just gRPC, you could use Spring Boot or other things. Somewhere along that journey, it tends to be that the service mesh starts to become more attractive. There’s also a lot of things you can do these days, to have more refined blast radius with your service mesh deployment. In particular, we would recommend keeping them one to one with your Kubernetes clusters, so that you have a refined blast radius there, that is your cluster. Then there’s a set of techniques you can do to either shard workloads in the cluster if you’re really a huge scale and you need multiple control planes. Or there’s a set of configurations that you can use with the service mesh to help limit what config has to flow where. Things like in Istio, the sidecar API resource that gives you much better performance overall. The final thing I’ll mention in Istio world is Ambient, that is a node-level proxy that does some of these capabilities. That is yet again a cheaper way to start service mesh adoption in places where you don’t have these capabilities yet.
Participant 3: [inaudible 00:34:32], it’s a very diverse type of shop. The moment we bring up zero trust, it’s always the network thing that comes into the picture, and this and that, now we’ve connected the network. Do you have any recommendations about how we can make the case [inaudible 00:34:51].
Butcher: Show them this 207A, that’s why I wrote it. Legitimately, this is a big problem that we hit all over the place. I’m joking but I’m not. I regularly talk with folks, and people think, or say, or vendors market that zero trust is a network activity. What I talked about is only the runtime side of zero trust. That’s the easy part. The people, process, changes are the hard part. I totally skipped over those. What arguments to make for the network team, one, the agility argument that I made. What is the time to change a policy, and is that hurting development agility? Then you can case it in terms of, what are the features that aren’t getting deployed to our users because of network pain, because of the time it takes for us to do things like policy changes? That’s one angle that I would attack it at. The other one is just that, again, you need these authentication and authorization pieces as well. The network can do only a portion of that. We really need a stronger identity, certainly at the user level. Additionally, we would like that at the service level too. Those are some of the things I would harp on. You can start to get into things like the layer 7 controls. For example, you can say, it’s not just that the frontend can call the backend, I could have a microsegmentation policy that models that. You can say, the frontends can only call put on the backend, only that one method. That’s something that a traditional network policy could not model. Then we can go even further and say, you can only call put in the presence of a valid end user credential, if that end user credential has the right scope, and a put case right scope, for example. Hopefully, it’s pretty clear for folks why that would be a tighter boundary or a better security posture than strictly a network oriented one. That’s how I would start to do it. Legitimately, the reason I helped write some of these with NIST is to move the ball for those teams. For folks, especially that are in banking, for example, the FFIEC has already included some of this stuff in their guidance, and in their IT handbooks. If you’re in the banking or financial spaces, go pull up the FFIEC IT handbook on architecture. There’s a microservices section, and it cites all this stuff. That would be your justification in that industry, for example. It may or may not help yours. In different industries, we’re actually already seeing these guidelines be enforced to standards.
Participant 4: I’m pretty sold on service mesh. We’re using it in production in a couple different places. A lot of what you’re talking about here resonates really well. I’d be curious, you brought all these white papers. One thing we worry about is there’s an obvious tradeoff, you worry about all the power of the centralized control. I think now at this point, if I’m an attacker of the Kubernetes system, what I really want is to take over Istiod. That’s my attack vector I’m looking for. I’ll be straight off and I wanted you to speak to that a little bit. Is that something you guys think about?
Butcher: Yes, it definitely is. This goes back to that core idea of the service mesh potentially being in the kernel. One of the implications there is that, in the operating system, the kernel code gets pretty close inspection, it gets a lot of security review. There are a lot of bug bounties. There’s a lot of value in finding a kernel exploit, whether you’re white hat or black hat, there’s a lot of value there. The economics of the environment are set up to hopefully make that so you’re going to turn to them and do that. We do similar things with the service mesh. Would you rather have a service mesh that’s tackling these security postures and one code base that you audit? All the enforcement happens in Envoy, so let’s do security audits on Envoy data path. The control plane configuration happens in the control plane, let’s do audits on that. Or if instead, we could do encryption, the AuthN, the AuthZ for end users and for services in every app. Either that’s AppDevs writing it 10 different times, or 100 different times, or hopefully you use a library or something like that. The point is, there’s one code base that we can audit as opposed to many. Because that’s a shared code base and it’s open source, and things like bug bounties already exist for the Istio code base, we can have some higher level of assurance that it’s secure.
There’s no magic there. Just like the operating system kernel is the attack vector that people are very interested in, in the distributed world, the service meshes can be too. I think all the service mesh vendors have pretty robust security practices. The Linkerd folks have pretty good practices there. I know firsthand that Istio security practices in response to CVEs is excellent. It is something we think about deeply. We know it’s an attack vector, it’s the clear thing you want to take over. The point is, we can focus our inspection there, we can focus our security audits there and gain assurance for the whole system.
Participant 5: Or definitely, they can add propagating service principles as well as user principle context across the vertical. As you hinted at with Istio’s particular service mesh, you can propagate service identities through DNs, through service mesh encrypt, mutual TLS. What do you think about that versus pushing it up slightly further into layer 7 into say a JWT token, [inaudible 00:41:11]?
Butcher: There’s actually prior art in the space. The Secure Production Identity Framework for Everyone, SPIFFE, is what all the service meshes today use for application identity. SPIFFE was created by Joe Beda of Kubernetes creation fame. SPIFFE was actually loosely based on a Google internal technology called LOAS. There was a white paper written on that they called ALTS, Application-Level Transport Security. Exactly what you’re talking about. That has an interesting set of capabilities. If we do it at that layer, we can do things like propagate multiple services. We can propagate like the service chain, the frontend called the backend, called the database. Then we don’t need to look at policy that’s pairwise. We don’t have to say the frontend called the backend, and the backend called the database. We can have the full provenance of the call graph, and we can make a decision on that. We can say, the database can only be written to if the request traverses the frontend and the backend, and only these methods of that. It can only be read if it traverses these paths. There’s a huge amount of power in having an application identity or service identity. One problem there is you still need to handle encryption in transit. That’s fine. There’s a lot of ways to do that. That’s not a huge deal, but you do need to handle it. Then, two, there seems to be more runtime costs there. A lot of folks have not pursued that one, because, due to that chaining, you tend to have to reissue JWTs since there’s a lot of signing pressure on your JWT server. There’s a set of tradeoffs there. Basically, at runtime, we’ve seen that it’s pretty decent to do mTLS for app identity. There’s plenty of prior art for doing service identity in the app level, in a thing like a JWT, and even nesting the JWTs and having end user and service JWTs together. All of those are good and valid. You don’t have to do with SPIFFE and mTLS, like the mesh does.
Participant 6: In terms of the identity security, how do you see the role-based security and the mesh service?
Butcher: How do we see RBAC, role-based access control, and other access control paradigms when it comes to the service mesh?
When it comes to authorizing service-to-service access, use what scheme works well for you. In Istio, there’s a native authorization API that is an RBAC style API. In that world, if you’re doing just pure upstream Istio, you’re just writing straight RBAC, basically. That’s the only option you have, when it comes to service-to-service access. There are definitely other schemes out there. Plenty of folks implement it with something like OPA, Open Policy Agent, and they encode their policy about service-to-service access there. That has a very different manifestation and policy language, comparatively. From the point of 207A, we don’t really care what the authorization mechanism is. If you go read SP 800-204B, we make the strong argument that next generation access control, NGAC, is the access control system you want to be using for service-to-service access. It’s a modern RBAC. That’s one of the key areas I do research in, is access control in general and next generation access control specifically.
See more presentations with transcripts
LLM Strategies, Platform Engineering, Observability and More: InfoQ Dev Summit Boston 2024
MMS • Artenisa Chatziou
Article originally posted on InfoQ. Visit InfoQ
The software development landscape evolves rapidly, and staying ahead requires continuous learning to inform strategic decision-making, implement new technologies appropriately, and enable your teams to collaborate effectively. Join us at InfoQ Dev Summit Boston, running June 24-25, to network with your peers and experience a curated agenda with topics such as Generative AI, security, and modern web applications. You’ll learn from those who have successfully implemented these technologies and navigated related challenges to help you be successful with your projects.
The conference will feature 20+ technical talks by senior software practitioners over two days, with parallel breakout sessions emphasizing the essential topics development teams should prioritize now. Talk highlights include:
How to Take a Large Language Model from a Data Scientist and Put It in Production
Francesca Lazzeri, principal data scientist manager @Microsoft, author of “Machine Learning for Time Series Forecasting with Python”, will share practical advice on successful LLM deployment in various domains and settings, lessons learned, and best practices such as:
- Choosing the right model for the task based on the model’s size, complexity, and domain.
- Optimizing the model using prompt engineering, fine-tuning, and context retrieval techniques to improve accuracy and relevance.
- Deploying the model using scalable and secure infrastructure, such as cloud platforms or vector databases, to handle variable workloads and latency.
- Monitoring the model performance and user feedback, using metrics and tools such as MLOps and Responsible AI Framework, to identify and address any issues or risks.
- Iterating the model based on the changing needs and expectations of the users and the domain, using continuous integration and deployment strategies.
OpenTelemetry, Opening the Door to Observability for All
How do I start with OpenTelemetry? How can I automatically instrument my application? How do I use the OpenTelemetry Collector? Ken Finnigan, OpenTelemetry architect @Lumigo, author of “Reactive Systems in Java,” will answer these and many more questions. While detailing the ins and outs of application instrumentation and collection with OpenTelemetry, Finnigan explores critical challenges to the success of Observability at an enterprise.
Best Practices for Building Secure Web Applications
In this presentation, Loiane Groner, development manager @Citibank, will delve into the essential neglected aspects of securing your applications. Groner will cover best practices for secure coding, meticulous input validation techniques, the importance of strategic error handling and logging, how to manage file uploads safely, and much more.
Blueprints of Innovation: Engineering Paved Paths for a User-friendly Developer Platform at The New York Times
In this talk, David Grizzanti, principal engineer @nytimes, will provide valuable insights into the nuances of platform engineering in the cloud, offering attendees a blueprint for implementing similar strategies in their organizations. Discover challenges faced while building a multi-tenant platform in the cloud, including specifics on how the New York Times designed:
- A simple onboarding experience, complete with starter software templates
- A secure and isolated environment on top of our centralized Kubernetes clusters
- A simple onboarding experience for multiple cloud accounts for each tenant
- A centralized approach to CI/CD, including standard build and test pipelines
Build better-connected teams in Boston
Attending InfoQ Dev Summit Boston as a team empowers collaborative learning, leading to informed project decisions and clarity for your dev roadmap. Teams as small as three attendees working for the same company are eligible for a group discount. For more details, email info@devsummit.infoq.com and mention the size of the group. Early bird savings and team discounts are available until April 16.
Team discounts for InfoQ Dev Summit Boston:
- 3-6 attendees – save up to $35 off each conference ticket
- 7-10 attendees – save up to $55 off each conference ticket
- 11-15 attendees – save up to $65 off each conference ticket
- 16+ attendees – save up to $75 off each conference ticket
Why attend InfoQ Dev Summit Boston as a team?
- Practitioner-curated talks handpicked by senior developers to focus on the critical technical challenges your team faces.
- Learn actionable insights from over 20+ senior developers who’ll share their learnings to help your team navigate today’s critical dev priorities.
- Zero hidden marketing. We’re committed to providing genuine learning experiences. That means no hidden marketing or sales pitches, just real-world, relatable talks from senior software developers so your team can focus their time on learning.
- “We Care” experience. InfoQ Dev Summit is an inclusive and safe environment with a strict Code of Conduct where everyone’s contributions are valued.
- Social events create dedicated time for your team to connect with speakers and peers during breaks and evening social activities.
Right now, tickets for InfoQ Dev Summit Boston are at the lowest price. Book your place before April 16 and save with the early bird tickets.
MMS • Erin Schnabel
Article originally posted on InfoQ. Visit InfoQ
Transcript
Schnabel: My name is Erin Schnabel. I’m here to talk to you about going your way and navigating your own career path, the one that works for you. I’m a developer of things at Red Hat. I made Quarkus [gestures to the I made Quarkus 3.0 T-shirt]. I was an IBMer for 21 years. I’m a Java champion, a distinguished engineer at Red Hat. I prefer building absolutely ridiculous things.
Career Trajectory
I’m not sure how you thought about your career when you got started. Did you think about it as a path that just happened, maybe with some bends? Did you think about it as a timeboxed thing? If I’ve been here for two years, I should be getting promoted. Then maybe three years after that, and then maybe five years after that, and it just should progress that way. Or you could have been a little bit more like me, I cannot say that I thought about it right away. When I did, I can’t say that I saw a single path. I did have to learn along the way not to compare my path with other paths. It was a difficult lesson to learn sometimes. You really want to keep up with the Joneses in a way, as they say. You want to keep pace with your peers. They’re getting promoted, how come I’m not getting promoted? That’s not me necessarily missing something every time, sometimes I know somebody else is thinking about that, because I got promoted. I have to say, with where I am now, that’s a recipe for sadness, don’t do that. Your path is your path. You have things outside of work. You have things inside of work. You have family things, other stuff going on, your own pace. Choose your own pace, and then achieve your goals as that dictates. Do what’s right for the whole of you and don’t worry about what your peers are doing. Because hopefully they’re doing what’s right for them, and that pace may be different, and that’s ok.
The Great Reflection
A lot of these thoughts also gelled during what I call the great reflection. That’s the COVID era. My move from IBM to Red Hat was over 2019 into 2020. It really happened in May of 2020, which is right when everything shut down. I found that being at home with my kids at home and my husband at home, it gave me a place to be like, what am I doing? I don’t think I’m alone in that. I think that happened to a lot of people. You really got to check in with yourself, like, why am I doing this? I also learned because I had been at IBM, I knew it was going to happen, but it really did hurt to leave. Leaving, especially as I got to the other side, helped me really unpack what I was carrying. Relationships that exist for that long, you get baggage from them. They mean something to you, and that can bring feelings of obligation, and then you can feel guilt on leaving. Maybe not you maybe some of you. I really got the chance to unpack that and understand where that came from and to realize how self-imposed that was, and that emotional connection. I feel very strongly about connection with others. I hadn’t fully appreciated the extra baggage that that connection can bring, and the fact that you can recognize that that’s what that is and set it down so that you have a little bit more room to make rational decisions and not emotionally laden decisions. That all came out of digesting, unpacking, reflecting, thinking about where I was. I really highly encourage you to reflect on where you are, so you’re good.
Essentials For Going Your Own Way
One of the culminations of all those activities, I have essentials that I think you should have in mind for crafting your own career path at your own pace that meets your own requirements. One of those is you have to know what your requirements are, you have to know yourself. You have to know what’s important to you. You have to feed the tree. I will explain this in a moment. It’s important to feed the tree. You also have to push your limits. You cannot remain static in a dynamic environment: you must change, you must grow. The question really is how to do that sustainably, and how to do that in ways that are challenging, and broadening, and purposeful. Why do I say you need to know who you are, and what makes you unique? When I was putting my package together at Red Hat, which I had been at Red Hat for that long, so I really had to be able to articulate the value that I bring that is different than others. I am also a bit of an anomaly. I had made the decision before leaving IBM because I was at the stage in my career where pursuing distinguished engineer at IBM was on the table. I had help. I had offers of additional people to help me review a package if I was going to put one together to try to get to DE at IBM, and I had more or less decided actually that I wasn’t going to try. That was because the way the role is defined at IBM doesn’t match who I am. I would have had to turn myself into such an unnatural pretzel that I felt that even if I crossed the hurdle, which I may have been able to do, even if I would have crossed that hurdle, I would not have been happy afterwards. I’m not sure I would have been successful. Because the way the role is defined, and the expectations of the role as defined, are so not me. I only know that because I know me.
Who am I? Why do I not fit? I’ll give you an example that explains a little bit. This is like a rule. We had an assignment. It was an art class, so it was not really a lot of rules, but it was an assignment. Come in. The teacher had the papers already on the desk, landscape orientation. Look at the person across from you. She happened to look a lot like me. Look at the person across from you, draw that person in the style of Picasso. She gave an example. She drew an example. Her example included the shoulders, the head, the other shoulder. Everyone, every single person in the class drew that. He changed the colors. There was definitely variation. It wasn’t like everybody did exactly the same thing, but everyone followed that original outline. The page stayed in that orientation, and the outline of the head and shoulders remained constant through everything else. This was mine. I can remember people looking at me funny, because the first thing I did was I turned the paper around, I had the ruler out, I was trying to figure out how to bring this face into the whole picture. It’s weird, because there’s so much of my childhood that I don’t remember. I remember, and you can see it because you can actually see the video that it’s right there. I still have it [see minute 8:54 of the video]. It was a moment that I recognized that my brain is doing something differently than the other people in this class. It just came out cool. I’m so proud of it. I love it. The whole process, I just remember it being a realization that there’s stuff going on in my brain that’s not the same.
Core Values
There are things that I value that I know other people don’t value. You will hear mention in some places of core values. I’m not talking about the corporate values. I’m talking about your values. What is really important to you? What energizes you, and what drains you? You really should know both because they are different things. I have a feeling if you go back and look through your career at projects, school or otherwise, school or career. If you look at those projects that you were really excited about, versus those ones that were a real drag, the ones that excited you are going to have ticked some box in an attribute that you find really important. It wasn’t just that it was cool, it was that it did something for you with one of these values. This is a word blob. It’s got lots of words in it. It is the most common way that I found to try to understand what your core values are. I personally, if I went and stared at the sky, I would never understand what my core values are. They sound very like mythical unicorn, sometimes. I did find several sites. There’s lots of sites out there, several of which will have word lists. The idea with the word list is that you look through these words, and you find ones that resonate with you, that mean something to you. Then you can extrapolate meaning, and you can extrapolate importance from those words. That way, you’re not like, I value family. Maybe you do value family, but you’re not thinking about it in a vacuum, you really have all of these options.
I will warn, because I had this problem, and so maybe some of you will have this problem. If you’re one of those people that grew up always wanting to do well on tests, you may have the problem where it’s like, this is an important word because functioning adults need to do this. I recommend making a functioning adult bucket so that you can say, I know that one is important. I’m going to put it in the functional grownup bucket, because it doesn’t actually do anything for me. I know I have to do it, or I know it’s important, but it’s not really me. If you need to do that, do that. The general gist is you come up with three to five things that really make you what you are. There are things that you bring to conversations. There are things that people notice about you. There are things that you feel strongly about. Once you start thinking about these words, you’re like, yes, they drive my behavior. They are important to me. When they’re not satisfied, I’m not happy. It’s that kind of thing. I found it very useful. Even as I looked at actually going through my DE package, some of the themes, at least core values that I had identified about myself, are part of what we emphasized in my DE package, and they are things that were reinforced by my recommendation letters. It’s stuff that I had been doing for a long time but I hadn’t put it together. This was me. Please do that. Depending on where you are in your life, come back to it periodically, because I’m not sure I would have picked the same 5 words when I was 30 something. I’m not sure I would have picked the same 5 words when I was 20 something. You change over time. You grow. Some of your values might shift a little bit too. That’s fine. Normal, expected, great. We grow, we change. All of that is good.
Goal Setting
I also want to mention goal setting. I don’t want to talk about goal setting in a stringent sense. I’m not trying to give you homework, really, or give you chores, but to the Cheshire cat’s point, if you don’t know where you’re going, then it doesn’t matter what road you take, because you’re going to get wherever you get, which since you didn’t know where you wanted to go anyway, it’s fine. When you’re doing these goal setting stuff, you’re looking for things that still align with your values. Where you can, you do want to look at how this next career move, or taking care of your family, or making it a priority to spend time with your kids, within a time horizon, three to five years, do I want to go for this promotion now, five years better? Do we need to actually spend more time over here? Do I need to find a project that will still grow and stretch my skills, but that won’t be quite so intense because my attention is needed elsewhere? It’s that kind of goal setting, the big picture stuff. Know what you want to do, revisit it, but maybe a three-to-five-year time horizon.
My husband and I actually, we started dating when we were in college, and this kind of thing just happens. It’s not a formal thing, but we will check in like, what are we doing? We don’t know. Maybe we should be thinking about this in the next couple years, or I’m thinking about this, or my manager’s talked to me about that. We try to fit it. How does this flow? How are we moving forward? The whole picture. What’s the whole picture? What do I want out of this picture? What do other people need from me out of the picture? It allows you to be a little bit more deliberate about what opportunities you take. If you do need something else, then it helps you have a more concrete conversation with your managers, but also with your mentors, too, about what else you might try, or what else might fit better with where you are and what you want to do in the next couple years. You can start laying groundwork for then that long term horizon thing. You have to be able to articulate those goals.
Push your Limits (Feed the Tree)
Pushing your limits. You can do more than you think you can do. There are limits, but you can do more than you think you can do. You’re only going to find that out if you try. There’s another part to this pushing your limits part, and that is feeding the tree. I have a real hard time separating these two concepts. They’re very different things. They’re very closely related in my mind. I call it feeding the tree. It’s probably more commonly called networking. I don’t like calling it networking, because networking to me feels like it’s transactional. I want to meet that executive, so that that executive knows who I am. I really don’t like that. There is an element of truth to it. It definitely helps if you’ve met that executive. If I’m just saying hi, so that they say hi, it doesn’t actually mean anything. I collected mentors for a long time. Every person I met, anyone that was interesting, I would have conversations with them, like what do you think about this? Here’s where I am. What are you doing? Learning from them, and sometimes bouncing ideas off of them just to get a different point of view on whatever I was thinking. I’m thinking about this, does this make sense? To try to get that feedback to validate my own perspectives about how things were working.
I was really resistant to the whole networking concept for a very long time. At one point, I met a few women in a leadership program, and they were like, what you do with your mentors and how you learn this, that’s what networking is. I was like, yes, ok. It was something about the conversation that we had, where I’ve realized that this living system, this interconnection between people, this fabric of humans that I had met and grown relationships with over time was in essence a muscle I could flex, not in a negative, I need to use you way, but in a, how can I impact the business way? How can I improve our outcomes? How can I get people to the table that are disagreeing? How can I get them to come to the table with a different idea, a different framing of the whole problem, so that we can leave the table with a different answer. It took my friends, took some of my peer mentors to open my eyes to what I had built, which was a nice interconnected system of trusted relationships. I can’t emphasize how important that is. We are all humans, and we need each other.
I am chaos walking. I have no concept of time. I think many things at once. I have a terrible time focusing. Do not ask me to estimate or size anything because I don’t know time. I don’t know how long it takes. It takes how long it takes. I can tell you this is harder than that. That’s more complicated than that. I can’t tell you how long it’s going to take, because time and I don’t get on. When working on a project, I need a partner. I know I need a partner. I function much better with a partner. I need that partner that can build and focus and understands all of the stuff that I don’t naturally do. The person that catches the I’s and dots the T’s, that understands the legal requirements. I need that other person with me. I know that, and then they know that. Then we work together. I can give them more ideas than they had themselves. They can give me things to do that I will do. We feed off each other in the most positive way. Alasdair Nottingham was my man, and we did amazing things. We were a great team. I thoroughly enjoyed working with him. I miss that dynamic all the time, because we were really good for each other, in that we pushed each other, and we supported each other. We built something really great with a whole bunch of people because of the dynamic that we shared. That’s one kind of tree, but there’s more than that too.
Active Participation
Again, this is my early career mentors, and I’ll talk about some of them. I needed that hand up. Then I also needed the catch. I did make the mistake, and I needed the catch. This whole growing of an organic ecosystem, it helps you take risks, it helps you push your limits, because you have support with good mentors who understand how to support you. Even people at senior level still need mentors. Then when you get to senior levels, you are that mentor for the juniors, they need it too. They need the hand up. They need the encouragement. They need the push to form their own opinions, to get the big picture, to put things in perspective. Most importantly, to be encouraged to be themselves, to find themselves, to understand how they can contribute and to feel supported enough to do so.
This all started with my dad. I never expected to be a grown person talking about my father, but it really did start with him. I am way too much like my dad. I get concerned a little bit about what my 70s are going to look like, because he’s a lot. The conversations that we had around our dinner table, were maybe a little different, but also fundamentally formative. My dad is not a passive person. He’s an active participant in shaping direction. He’s high school educated. He had an apprenticeship at NASA. He’s a tool and die maker. He went into a mechanical engineering career, essentially, in a way as an influencer, but also as a builder. He would design the machines in stamping plants that would take the blanks and turn them into roller rocker arms most of the time. That was most of the output. Although there was a whole period of time where it was corrugated cardboard with laser cutters. There was the water jets later and laser cut. It was crazy. His unique contribution was how he thought about a problem, his ability to visualize, his ability to come up with a unique solution that helped solve the customer’s problems, and in the process made the company successful.
He challenged rules all the time. He challenged practices all the time. He did it for the right reasons. He did it in the right forums. Some of this was in the ’80s. He really did not like bullies. I got my fair share of office stories where the bully would come into the office and dad would do, depended on the situation, it wasn’t like reverse bullying. It was just, whenever the opportunity came to knock this bully down a peg, or to let other people understand that this was bullying behavior, my dad would take it, so that people would start to understand that this person who’s bullying everybody around is not worth fearing. Those were the stories that I heard in my formative years. By the time I entered the workforce, I don’t tolerate bullies either. I will talk to anyone at any level, at any time. The content of the conversation will get scary, I will get nervous based on the content. I will try not to waste time. Executives don’t have a lot of time. If I’m going to go and talk to a person that’s at a much higher level than me, I do try to make sure that I’m focused because their time is precious. If you just waste it, you’re not getting anything done, really. You’re not accomplishing anything.
I just learned from him not to fear authority, and that really changed how I’ve gone through my whole career. It is a little different. I also never waited. I would try to encourage my friends, again, like this whole comparing people to each other, the whole thing. I can remember some of my friends when we were still young, I haven’t been given a leadership role yet. What do you mean given a leadership role? Be a leader, you don’t have to wait. I don’t understand the whole giving thing, when are they going to give you, tell you you’re a team lead? How does that work? It was such a foreign concept to me, because to me, you just do the thing, you become a leader. Then maybe a team gets formed because you need to do the next thing. At that point, it’s the obvious answer. It happens because that’s what’s supposed to happen. It’s the natural thing to happen, I should say, not that it’s supposed to. I never waited for anyone to give me anything. That was largely because of my dad’s impact on my formative brain.
Funny story to show you how differently he thinks. He does this kind of thing now with FIRST Robotics as a mentor. Even now in his retirement, he’s working on CAD, teaching high schoolers how to design robots. When I was in high school, my engineering class, we had a king of the mountain thing. It was a bunch of cars, you had to get to the top and stay there. Everybody hot-rodded their cars. My dad asked me a few questions, do the rules say how many wheels the car has to have? He just opened the box, like, what do you mean how many wheels? It was just a real different way to think about the problem. He didn’t tell me what to do. He just asked me brain opening questions. When it came to competition day, I showed up with this, which is the weirdest thing ever created. I still have it. This is actually a recent picture. The back wheel with a water chestnut can with Velcro loops on it. This thing would crawl the mountain, get to the top, drop a trapdoor with nails into the carpet. It was a beast. The whole competition was hilarious, because we had a couple of the guys that had cars that were supposed to go real fast. They just went like trying to get to the top first, they would hit that plow blade and go to the other side of the room. It was funny. It was a great time. It was another one of those like, my father’s questions helped me shape a different idea about what was possible to build. Then I built this ridiculous contraption [see minute 27:53 of the video]. It still cracks me up to this day. That’s what came out.
Years at IBM
I went directly to IBM out of school. I had a manager that I felt I had a good rapport with. She actually interviewed me on campus. I went on site to IBM and I interviewed with a bunch of other managers, I felt she was she was the one I wanted to work for first. As a woman going into engineering at that time, there were not a lot of other women there. There was a company that I interviewed with, and it would have been basically a straight extension of my master’s thesis, I could have gone right there and just kept working on what I was working on. It was like a direct mapping. When I went to that site, as a young 20 something woman, every person that I met that was an engineer was a white male. It was a while ago. Every single one of them. The only other woman that I met was a technician which was an hourly position. No. I’m all about challenging barriers, but that was not a dynamic I wanted to be in the middle of. It just didn’t seem right. When I came to Poughkeepsie, this manager introduced me to her manager who was a woman who introduced me to her manager who was a woman. It was three or four levels up where it was all women in the management chain, which I was working with men. I’d always worked with men. Computer science curriculum is for men. I’d had no problem working with men, but knowing from a support perspective that I was going to be managed, that there were a lot of women in management, was awesome.
I know that can make people uncomfortable talking about gender in that way. I discovered as my career went on, having managers that understood the impact of having children. I had a man as a manager, just after I had my first child, and the questions that he asked me were not ok. I frankly unmanagered him, because I could not trust him to have my back. I was part time as a mother. I was still contributing. I was still participating. I was still working. You’d have to trust management to ensure that your relative contributions are accounted for correctly. This manager is the one that said to me, your being part time shouldn’t matter except that it does. Which is a nonstarter. That’s not ok. I challenged that. I did. I took it to his manager. He also asked me if I was going to have any more children. You’re not allowed to ask that either. No, out. That’s being proactive about myself, knowing myself, knowing where I wanted, knowing why I was still working, and making sure that my contributions counted, and would continue to be credited, so that I could continue to move forward, even as I was on reduced hours, so I could be with my young son.
There were so many good mentors and experiences for me. I started on the mainframe. I got all kinds of interesting opportunities from that team early on. I had some projects, they were mainline. If I would have messed it up, everybody would have known, projects very early. I was encouraged to try them. I had a lot of support. Everything went great. I did have the big mistake. I almost think like that interaction with that manager was almost like an offshoot of this earlier mistake. I should have escalated someone for not doing work basically that I needed to have done. It was another team in another area and I should have escalated that I am not getting this work done, and I didn’t. It was a big catastrophe. A lot of people knew about it. My name was in the wrong places. I had, again, this feeding the tree thing. I had great mentors who were like, “Making mistakes is ok. You got it done. We met the dates. Nothing slid. No permanent harm done, but right now executives know who you are, and it’s not for the right reasons. We have this research that we want to have done, we want to understand what our choices in this area should be. While we wait for the executives to forget who you are, how about you go work on this?” That was a huge opportunity. That grew eventually into WebSphere Liberty, which was amazing. It was a really cool experience to take an idea with recommendations into a prototype with 7 people back into an organization of several squads of like 50 people, which then grew to 100 people, which then grew to 300 people. It was just the coolest thing.
Then I opted to leave that project. That was like a reverse feed the tree moment. I had been in that organization for a very long time. It was time for me to try something new. I knew that if I stayed whoever was the next leader was not going to get a fair shake. I was too embedded in everything. I was an admin of everything. I had been there forever. I took another small team. I told people this, I’m like, I have to disappear. I can’t stay. We actually had fairly new people that we recommended to be the next leaders of the org because they were the right not like manager leaders but team leaders, technical leaders, because they had the right supportive attitude for all of the teams and all of the work that they were doing. We’re like, no, these people need to do it. If I’m here, they won’t get a fair shot. It was right as microservices and cloud native concepts were becoming the big thing. Let me take this team and we’ll go over and investigate something else. I disconnected myself from all admin things and effectively disappeared. That’s exactly what happened. That leader was allowed to grow. I was there for them. It wasn’t like I left them totally high and dry. One of my dad’s other aphorisms was train your replacement. I had trained my replacement. It was ok for me to leave. I didn’t leave anybody high and dry or anything. I did effectively disappear to give that person room to grow. I knew if I stayed there, he wouldn’t have been able to grow. I moved out.
We explored that year. Within the next couple years, we wrote books. We built a text-based adventure game to teach people cloud native microservices, which was super fun. We ran workshops. It was outrageously fun. It was a really good time. It was an excellent departure from the usual. Eventually we moved because of what we were doing, and our ties to microservices and cloud native development, and all of those activities. My small team of six, most of us moved to the IBM Cloud to work on that developer experience. In the process of that whole transition, I picked up the mission of bringing support for another Java technology, Spring, which is probably one of the most widely used application frameworks there is. I own the mission to actually get all of our new IBM Cloud technologies to support the Spring way of configuring things, and to make all of that happen, and even push that into some of our more traditional IBM products, which had never been done before. That all grew also from those initial teams. Some of the people that I worked with at IBM Cloud, and worked with at the beginning of Liberty, those were relationships that I had formed from the beginnings of WebSphere, and some of those early architecture board meetings. Even now, my relationships back still, because I still talk to my IBM friends and some of those people, are people I’ve known for 20 years. Those relationships are hugely important in terms of feeding the tree, and closing the circle and trying to make a difference based on who you know and who you’ve worked with, and how you can see that things should come together.
My move to Red Hat felt very natural at the time. It was a role I invented. It was time for me to go. It was time again for me to change. I realized, that the distinguished engineer role at IBM, I probably would have been able to get there but I don’t think I would have been happy. It would have been painful, because I, as chaos walking, as someone that is a gluer, and a connector, and a bridger, the way I’d fill gaps and break down communication barriers, it’s all very important. It’s a force multiplier, but it doesn’t directly deliver to the bottom line in the way that the IBM distinguished engineer package expected you to. I don’t have a swim lane. I don’t want a swim lane. I don’t want to own things. I want to help the other people who own things build better things. I want to figure out how to get those things that other people are building to grow together, to evolve together, to ensure that we have a consistent story. It’s a different kind of role, and getting that through at IBM would have been a real struggle, which is why I had more or less written it off. Being able to do that at Red Hat was empowering. Being able to bring what I know about myself to work and to work with it, and try to make a difference with whatever this is. It is amazing. I have great support at Red Hat, and I try to provide good support at Red Hat to keep that nice circle of growth going.
Summary
My summary to you, my go do the things and conquer the mountains comment is, understand what fills your cup. What makes you happy. Not just happy, because you can have outside things that make you happy, but what actively charges your battery at work, rather than draining it. Leverage that to find sustainable ways to stretch. You do need to learn new things. You need to keep your head up, understand the horizon, understand how anything that is in your near vicinity would fit with the other things that are going on either in the company or in the industry. Doing that in a sustainable way where you’re not going to burn yourself out is really important. Always feed the tree. It’s rewarding for you to have a mentee. It’s also rewarding to have really good mentors. It’s rewarding to have solid relationships with your peers. Do it. It benefits everybody. There’s no reason not to, even if you’re shy, and even if you at the end of the day find it tiring. I am secretly an introvert. I love people, but only for so long, and then I need to go take a nap. It’s ok. You can do this. Those relationships are what make it safe to take risks. They make it easier for you to push your boundaries. They help catch you if you fail. Do that by all means.
See more presentations with transcripts
MMS • Vanessa Huerta Granda
Article originally posted on InfoQ. Visit InfoQ
Subscribe on:
Transcript
Good day, folks. This is Shane Hastie for the InfoQ Engineering Culture podcast. Today I’m sitting down with Vanessa Huerta Granda. Welcome. Thanks for taking the time to talk to us today.
Vanessa Huerta Granda: Thank you for having me.
Shane Hastie: So Vanessa, you were the track host for the Resilience Engineering, and I love the second half of that, Culture As a System Requirement track at QCon San Francisco. Let’s start delving into the track. What was the message you were trying to convey when putting that track together?
Resiliency in sociotechnical systems [00:52]
Vanessa Huerta Granda: I, and a lot of folks from the Learning from Incidents community, when we’re thinking about our tech systems, we don’t just think of them as technological systems. We like to think about the sociotechnical system. And so socio, that people part is part of our systems. Everything that we do, it depends on people, it’s running because people are making decisions at some point or another. Maybe when something is getting first developed, when something’s first getting architected, but also when you’re maintaining something, when you’re handling issues, incidents, when you’re learning from them or anything like that. So the idea here is that culture is part of the system, is part of that sociotechnical system. And resiliency, making sure that your organization is a resilient culture is a huge part of that.
Shane Hastie: Taking a step backwards, what brought you to focusing on this? What’s your background?
Introducing Vanessa [01:43]
Vanessa Huerta Granda: I am an industrial engineer, which weirdly, actually fits in perfectly with resiliency work, with understanding how it is that people work, how it is that people create the software, write it and maintain it. So I have worked in operations as a Site Reliability Engineer and leader for the past decade. I am currently the Manager of Resiliency Engineering at Enova, and previously, I was at a startup called jeli.io, focusing on products to help people handle their incidents, learn from them, the entire lifecycle.
Honestly, I got into this kind of work because I just really enjoy solving problems and I really enjoy talking to people. And at some point, a boss of mine realized that was actually a good fit for this kind of work, and so I’ve been doing it ever since.
Shane Hastie: So going right down to first principles, what do you mean by resilience?
Defining resilience [02:34]
Vanessa Huerta Granda: A mentor of mine once said that resilience is something your system does rather than what your system is. Resiliency is our ability to sustain challenges, to sustain fractures, failures, whatever it is. I think we often think that having a resilient system means that our system is never going to break. That’s just not true at all. It means that your system, including the sociotechnical system, sometimes something bad is going to happen, something is going to break, and how do we recover from that?
Shane Hastie: And what does a culture of resilience look like and feel like?
Vanessa Huerta Granda: Well, I can tell you I’ve experienced that all week. So a culture of resilience really means when the folks that are working at your organization are able to handle whatever is happening, whatever is thrown at them. So we have this idea that we have these plans for the quarter, we’re going to get so many projects done because we have so many T-shirt size things that we’re going to do. But at some point, something is going to happen, someone’s going to go on vacation, someone’s going to get sick, or maybe the code that we understood actually doesn’t work the way that it does.
And so that’s when you have your socio part of the system figuring out a way to make it work. And that can be through automatically having failovers in your system or just having a process where people talk to each other and figure out like, “Oh, hey, can you help me with this? Can you help me with that? Let’s prioritize this. Let’s prioritize that.”
Shane Hastie: Incidents and resilience, of course go hand in hand, but incident management in my experience is something that many organizations do haphazardly at best and often very badly.
Vanessa Huerta Granda: I hate that.
Shane Hastie: So what does good incident management look like?
Good incident management [04:11]
Vanessa Huerta Granda: Oh my gosh, how much time do you have? When I think about incident management, I think about allowing your engineers to do their best work. So as an incident manager, I am not here to yell at anyone, and I hate this idea. We often think of the person leading an incident, we call them incident commander. I can tell you that when I first started this job, I was 25, the only woman in the room, the only Latina woman in the room, and I definitely did not feel like a commander. What I did feel like was somebody who could talk to people and get them to actually discuss what was happening and figure out the best way forward. Incident management that actually works is understanding that we’re all working for the same team, we’re all working for the same goal, which is to just get our systems back to normal and that we need to work together to do that.
The other side of the coin is that when you’re in an incident, if your company’s making money, that means that during the incident you’re not making money or something bad is happening. And clearly people are going to care, clearly your stakeholders, your leaders, they need to be aware of what’s happening. And so I like to tell the responders to, “Worry about responding, worry about doing the engineering things. Use your brain towards that. I’m going to focus on the communication. I’m going to focus on the coordination and the collaboration, and I’m going to make sure that you’re not answering a million things from your CTO, that you’re not worried that your CMO is going to be upset because the website is down. I’m going to take that away so we can work towards resolution.”
Shane Hastie: And then what happens afterwards?
Vanessa Huerta Granda: Oh, my favorite part, you gossip about it. So we have an incident that’s over, and I work from home a lot more nowadays than I did back then, but you’re outside of the office and you’re outside of the war room, whatever you want to call it. And you’re talking about it, right? You’re discussing, “This is something that happened, that is something happened.” Maybe you go to lunch, maybe you go to happy hour. People are always going to talk about it. In cultures where there’s not that culture of resiliency, you have a postmortem, and that postmortem is usually some sort of document, that incident report that tells you, “This incident started at 5:00 AM and it was over by 7:00 AM and it was Shane’s fault, and Shane really sucks.”
Shane Hastie: Yes, the who-can-we-blame session?
Vanessa Huerta Granda: Right?
Shane Hastie: That’s a really, really important part.
Vanessa Huerta Granda: Oh my gosh. And that to me is really, really sad because an incident is just this big red arrow pointing to you towards something that is happening at your organization that you can learn from. And so, I like to think of incidents as learning opportunities. So a retrospective can be that people are talking about your incidents either way, people are talking after the incident, I’m certainly slapping my work bestie to be like, “Hey, can you believe that happened?” Or during lunch afterwards we’re going to talk about our incident and so we might as well talk about it together and learn from it.
So usually what I like to do during a retrospective is make sure that people are sharing what happened from their own point of view because at my previous role, I was doing a lot of consultant work. I was not an engineer, and for the first time when I was in an incident, I wasn’t seeing what was happening in the code. I was seeing what our customer was experiencing. And so that is a different point of view than the engineer, and that can certainly make a difference in how we move forward. The idea is that you have a learning review, a postmortem, retrospective, whatever it is that you want to call it. You’re learning from that. And then you’re coming up with action items that are helping move the needle in the future.
And it can be as easy as like, “You know what? Maybe we need to have better post-release checks.” Or maybe it’s something like, “You know what? This process that we’ve been working, it probably doesn’t make sense anymore. It made sense back a year or two years ago. It doesn’t make sense anymore. Maybe we need to do some more training,” et cetera, et cetera. There’s many things that you can learn from an incident.
Shane Hastie: How do we avoid it being that blamestorming activity?
Avoiding blamestorming – turn postmortems into learning opportunities [07:59]
Vanessa Huerta Granda: Well, that’s the part of the culture, right? That’s where you have to as an engineering leader and as individual contributors, make sure that when you’re leading a retrospective, when you’re leading a postmortem, that you are not just filling out a document, that you are speaking up and you’re letting other voices heard. So there’s a lot of good information out there. There’s the Etsy’s Debriefing Guide. There is the Howie Guide from jeli.io. I co-authored it a few years ago, and it helps people understand how to best position themselves so they can turn their postmortems into learning opportunities.
From my standpoint, I can give you the best advice that I can give you is to… If you’re leading a retrospective, never be the only person that’s speaking. Let other people speak up, let other people share from their points of view what happened and be forceful of that, “This is not a blaming game. We’re all on the same team.”
Shane Hastie: And communicating the outcomes. What I want to explore there is getting off the hamster wheel of just incident, incident, incident response and breaking what feels to me at times like a never-ending spiral for folks.
Vanessa Huerta Granda: I think I’ve mentioned this earlier, I like to think of incidents as an incident lifecycle where you have your incident, something breaks, then you have your retrospective where you’re learning, out of the retrospective, there’s action items. And so that’s feeding into the process and there’s things that you can do throughout that entire lifecycle to make things better. This is the part where I can give you my example that actually, let me understand… Not let me understand, but really highlighted how the incident lifecycle can be applied to anything.
I currently have 2-year-old twins and they’re my only children. When they were first born and we took them home from the hospital, it was the craziest incidents I had ever had in my life. And I had had a lot of incidents, but it was kind of bananas. I was like, “Shoot, I can’t sleep at all.” And my husband couldn’t sleep either because one wasn’t crying, the other one was. And so we felt like we were in this hamster wheel fighting this incident over and over again, and we just did not have the brain power to do anything about it. And so what we did was like, “Let’s try to make this process easier for us.” And that’s what I recommend a lot of organizations start doing. Make the process easier for your responders. A lot of organizations start introducing tools, introducing Slack bots or team spots, whatever sort of chat platform you’re using, start communicating, automating some of the things.
And what we did is we hired an night nanny. So once you have the bandwidth, you’re out of constantly, constantly fighting incidents, then you’re able to start putting those productive retrospectives, productive postmortems and start figuring out what it is that you can do at a higher level. Right? At that point, that’s when my husband and I were like, “You know what? We don’t have to drag our two infants in their car seats to Costco. We can just have the diapers delivered.” So we’re thinking of a change to the process that’s going to make the entire lifecycle easier, not just one specific incident. And I think I gave that example earlier, right? After an incident, maybe once we are able to have a retrospective, we realize, “You know what?We need better controls for this processor. We need to have a testing suite that makes more sense or blue-green deployments,” whatever it’s that you want to call it. And so those action items make it easier, lead to fewer incidents, give you more bandwidth.
And then the last thing that we like to do is cross-incident analysis where you’ve made those easy changes, you’ve addressed the low hanging fruit and then taking a holistic look at your incidents. Maybe you’re taking a look at all the incidents that you had in the last quarter, and you’re able to say like, “Okay, you know what? This team has a lot of incidents. Let’s try to maybe give them a little bit more headcount.” Or, “It seems like these two teams are working on something similar or it seems like a lot of these incidents are related to this antiquated pipeline. Let’s maybe give more resources to do all of this.” And so you go from making small changes to the incident process itself to making changes out of those incidents. And then you’re making larger transformations.
And actually, during my time, we were able to make a case for changing, for going into more of a DevOps organization, moving away from an ops team that everything was funneled through them, through having the SRE team. And that’s the funnel that not everything goes through them because we were able to look at the incidents and because we were able to try to find patterns there. And I did the same thing with my children and now I love being a mom.
Shane Hastie: Two-year-old twins, your life is a hurricane.
Vanessa Huerta Granda: It’s fun. I do incidents for a living.
Shane Hastie: One of the things that we touched on earlier, but I would like to dig in deeper is stress and burnout. How do we help folks reduce the stress and avoid burnout? Because we certainly know that burnout is a significant issue in our industry at the moment.
Reducing stress and avoiding burnout [12:51]
Vanessa Huerta Granda: Absolutely. We’ve seen it everywhere, right? Burnout. I don’t see people getting burnt out because they’re working so much, as much as they get burnt out because they’re working hard, but it never stops. Nothing ever changes. And that’s why I am so passionate about the learning part of things and not just the resolving problems, but if you’re in the hamster wheel, I want to hear what you are seeing and I want to see where it is that I can help. And that’s why I like to make those… You have those short-term action items that can help maybe things a little bit, but then when we’re making those higher level recommendations, we include things like, “Let’s add more headcount because things are hitting a fan.” Or you always hear engineers saying, “The problem is that we have this outdated architecture that made sense a while ago, but no one’s going to put the resources.”
Well, if I’m the person that has all of the data around incidents and I’m able to go up to your leadership team and tell them, “All of these incidents that you’re having, maybe you should put in some resources into doing something different.” That’s going to allow people to see that their hard work isn’t just for nothing. I’m also a manager of my specific team, and I take that very seriously and I make sure to have personal connections with them and make sure to give them the time that they need to rest, to listen to them, and to be proactive about that, right? Like, “If you were up all night working on an incident, please don’t come in today. Please take some time to sleep.” And the same goes with your personal life, right? Like, “If you were handling your three-month-old baby overnight, there are more important things out there.”
Shane Hastie: You are a manager, you’re a leader of a team. A lot of our audience are stepping into that role, often for the first time. What advice would you have for them?
Advice for new leaders [14:40]
Vanessa Huerta Granda: When you’re an individual contributor, it’s sometimes hard to understand the constraints that management is working with. I think becoming a first-time manager, I wish I had given myself a little bit more grace and realized that I can’t change everything. One of my mentors actually, her mantra was, “Grit and grace.” Yes, try to work through things with grit, but also give yourself grace. Give other people grace. No one’s out to get you. And I feel like it’s taken me a little bit to realize that, especially when you’re working with incidents, when you’re trying to work with people from different functions, they’re all working with their own constraints. And so remember that you’re on the same team, I think makes a lot of difference.
And then when you’re managing your own team, I mentioned that I take that very seriously. These are people’s livelihoods that I have on my hands, right? I’m their manager, and so they spend a lot of time working, and I just want to make sure that I’m listening to them, that I’m understanding where they’re coming from, not making assumptions, giving them grace as well.
Shane Hastie: Grit and grace. I like the combination, grit and grace. Thank you very much. Vanessa, if people want to continue the conversation, where will they find you?
Vanessa Huerta Granda: I guess on X is now what it’s called, I am the v_hue_g. You can also find me on LinkedIn. My name, Vanessa Huerta Granda. And yeah, I talk about incidents all the time and a little bit about reality TV, mostly about incidents.
Shane Hastie: Thank you so much for taking the time to talk to us today.
Vanessa Huerta Granda: Thank you, Shane.
Mentioned:
.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.
MMS • Sirisha Pratha
Article originally posted on InfoQ. Visit InfoQ
Red Hat released version 8.0 of the JBoss Enterprise Application Platform (EAP), an open-source Jakarta EE-compliant platform. The latest release brings several improvements that include support for Jakarta EE 10, changes to the management console and CLI, and removal of legacy security subsystems.
JBoss EAP 8.0 provides implementations of the Jakarta EE 10 APIs and Jakarta EE 10 Web Profile, Core Profile, and the Full Platform standards. See the complete list of specifications in the release notes. The previous version, JBoss EAP 7.4, supported Jakarta EE 8. However, Jakarta EE 10 has undergone several changes, the most significant being the modification of the package namespace for Jakarta EE APIs from javax
to jakarta
. To facilitate the migration to JBoss EAP 8.0, Red Hat has updated the Migration Tool Kit for Application (MTA) to accommodate these namespace changes, among others. The Galleon Provisioning Layer, ee-core-profile-server
, provisions a server with the Jakarta EE 10 Core Profile.
JBoss EAP 8.0 introduced the JBoss EAP Maven plug-in to provision a trimmed server using Galleon and to install the application on the server. Under the hood, the plug-in uses wildfly-ee-galleon-pack
and eap-cloud-galleon-pack
to customize the server configuration file. The plug-in also supports the execution of CLI script files for further server customization and the installation of extra files, such as keystore files, on the server. The Maven pom.xml
file is responsible for maintaining all the necessary configuration settings for the build process.
The legacy security subsystems, PicketBox and PicketLink, were removed in this edition of JBoss EAP. To use custom login modules with the elytron
subsystem, use the security realm jaas-realm
, Java Authentication and Authorization Service (JAAS). Red Hat recommends using the Elytron subsystem’s existing security realms, such as jdbc-realm
, ldap-realm
, and key-store-realm
, over jaas-realm
. Refer to the release notes for use cases of jaas-realm
. By configuring the aggregate-realm
, distributed-realm
, or failover-realm
, it is possible to combine different security realms. Another noteworthy enhancement is the introduction of elytron-oidc-client
that provides native support to OpenID Connect (OIDC). Read more about securing applications deployed on JBoss EAP with OIDC.
Other significant enhancements are the introduction of jboss-eap-installation-manager
to update and install JBoss EAP 8.0 and its integration with the Management CLI under the installer
command to perform several server management operations in standalone or managed domain mode. Furthermore, JBoss EAP 8.0 requires JDK 11 or JDK 17, as support for JDK 8 was removed. See the complete list of unsupported, deprecated, and removed features.
Check out the product documentation to learn more about the Red Hat JBoss Enterprise Application Platform 8.0.