Podcast: InfoQ Culture & Methods Trends in 2025

MMS Founder
MMS Charity Majors Ben Linders Rafiq Gemmail Craig Smith Shane H

Article originally posted on InfoQ. Visit InfoQ

Transcript

Shane Hastie: Good day, folks. I’m Shane Hastie, and I’m the lead editor for Culture & Methods on InfoQ.com. This is our annual Culture & Methods trend reports, what do we call it, episode where we bring the whole team together and we have at least one special guest. Today our special guest is Charity Majors. I’m just going to go around the screen and, Charity, would you, for those three people in the audience who haven’t come across you before, give us a little bit about who are you?

Introductions [01:24]

Charity Majors: Yes. Hi. Thanks for having me. Big fan of InfoQ. Happy to be here. This is one of my favorite topics. I am co-founder and CTO of Honeycomb.io observability company, and I come from ops back when we called it ops, before we called it DevOps, before we called it platform/SRE. But I still very much identify as … I used to say production is where we take all that beautiful computing theory and combine it with messy reality, and that that is where I love to live.

Shane Hastie: Welcome. Going around my screen, Ben Linders. Welcome, Ben.

Ben Linders: Thank you, Shane. Great to be at the podcast again and looking at the trends. Based in the Netherlands, but working worldwide. Most of the stuff I do has to do with improvements in organization anyway, team level, organizational level. That’s usually a combination of technical stuff in there, but also people stuff in there. A lot of my focus is on collaboration, psychological safety, exploring what’s happening in the organization, some kind of assessment metrics, gamification as a way to really get people engaged in there. Many different things in there for many different customers that I work with.

Shane Hastie: And moving on from Ben to Raf Gemmail. Raf, welcome.

Rafiq Gemmail: Hello, Shane. Hi. It’s good to see everyone. I am based in New Zealand. I’m originally from London. I am someone who is still passionate about building things. Started cutting code in the ’80s, and I went through the journey of building and then seeing ways of building and then getting involved in a bit of coaching and technical coaching teams, some agile coaching, bouncing back and forth. I love the idea of the pendulum. That’s been part of me because I can’t stop building because it’s a passion. But using that to empower teams to build it, to own it, to solve it, more importantly, ship it, and keep validating it.

And so, that’s my endless battle. I have been through educating people. All of this came up behind me as I thought I was going to get back into hands-on role in a engineering team, and somehow in a short space, I’m like a senior director now in a MarTech firm. But it’s fun, and I’m waiting for the swing back. No, I’m not. I love what I’m doing, bosses.

Shane Hastie: Yes. Welcome, Raf. Craig. Craig Smith.

Craig Smith: Thank you, Shane. I’m Craig. I’m based in Australia. I am at the intersection, I guess, of the product space and the transformation space, but like Raf, originally started building things, realized that that probably wasn’t my forte, so I then liked breaking things, and quality has been something that’s always stuck with me on my journey. Now I do a little bit more of the inventing things. But, yes, again, that product.

But as Charity was saying, getting things into the hands of users is why I’ve always done what I’ve done, which is why the Culture & Methods part of InfoQ is so important because, yes, you can build things, but how you get people to work together and build things I think is the thing that sometimes people miss a little bit. I think we’re in a bit of that swing at the moment in the industry where it gets forgotten underneath the technology.

Shane Hastie: But aren’t we just going to adopt, I don’t know, a generative AI tool, talk to the computer and it’ll write all the code for us?

Craig Smith: Aren’t we all generative AI bots on this call, Shane?

AI Hype and Reality [05:08]

Shane Hastie: Well, I was a little bit sarcastic there, but I do feel that there seems to be a huge amount of hype. If there’s one big trend that I’m seeing, it is the AI hype. How do we get past the AI hype to AI as a potentially useful reality? Charity, can I throw that one at you to start with?

Charity Majors: Yes. I do think that sometimes in technology, the size of the hype predicts the eventual impact, even if it’s elevated. Then also I do feel like, in the industry, we’ve been through some successive cycles of hype around blockchain, for example, and crypto and stuff where I think that people are a little more fried than usual and a little bit cynical about this stuff. It doesn’t really help that so many of the loudest voices seem eager to put us all out of jobs. It makes it hard to warm up to some of this when it’s like, “Hooray, none of these skills you’ve spent the last decade or two or three decades building are ever going to be relevant again. Hooray”. It makes it hard to embrace it.

The thing that I have a hard time getting over right now is the fact that anytime the word AI, or the letters that gets spoken, it’s super unclear what they’re talking about because there are so many different ways that it manifests all at once. Maybe they’re talking about how I can be doing things as an individual to be more productive or to build things faster, or maybe they’re talking about things that I could be building for my customers, or maybe they’re talking about things that I could be integrate … Data stuff. There’s just so many angles. If you’re not specific about what part of AI you’re talking about, I just see people talking past each other a lot.

Shane Hastie: Raf, you were talking before we got on about a hackathon and bringing in some of the tools. What’s your experience been?

Rafiq Gemmail: Yes. It’s been interesting watching over the last few years. I think when ChatGPT came out, I was in an education startup. It was a safe space to play and we’re encouraging it. Then I moved into the corporate world, and adoption was growing a little bit more. We wanted to use it in our day. You’ve got those obvious questions of who am I surrendering my data to? Where’s it coming from? What’s our liability?

And so, we went through this initial, this … Yes, as an industry, this caution. I think that’s important because, really, we are throwing our data across to someone else, and the safeguards keep coming in, like don’t train your model with our data. We got to a place now where I think it’s last step … A DevOps report talks about copilot and AI-assisted tools being really, really important and just part of the practice of development now.

It also calls out the fact that the change failure rate has gone way out the roof, which is an interesting thing because when I was reading it, I think it said something along the lines of we’ve got larger chunk sizes. We were all, for ages, trying to get down to small chunk sizes. Now you just say, “Robot, give me a solution”, and you get pages of stuff you might not even be able to read. If you’re really irresponsible, you’ll copy-paste that, or you’ll just accept it and it goes into production, and then you get more failures.

So I think we’re getting into a place where we’re using it more and we need to learn to use it a bit better. At the hackathon, though, it’s a safe space to fail. I went in and I was like, “This is skunkworks. We get to pull things together, do whatever you want”, and people came up with all sorts of creative stuff. Some were using the OpenAI API, and I think that’s often the de facto in many people’s minds are where AI is, but I’m seeing the teams now also using SLMs, local models running on Llama.

Just about two or three days ago, I think I read about Docker pushing out like a Docker model command that even runs on the Mac, which is special, because I could run … I’m going to digress, but they’re creating the tools here that are really enabling developers to do all sorts of spectacular things. And our demo at the end, I don’t think I’ve seen a demo that good, because people were using the assistive tools to do creative things in a crazy short amount of time. We went in and I’m like, iron triangle. Hack days are always about you optimize scope. I think some people didn’t have to give up scope because they could smash out more.

Charity Majors: Yes. Hilariously, we did an AI hack week last week, and that thing you just said about they didn’t need to sacrifice scope really resonates with me because we had everything from … We had a team that just whipped out dark mode for the product internally. Not only that, but also used it as a … We’ve been rolling out an internal design system, and they used this as a way to see which part of the code base had been converted to use the design system and which hadn’t, because it would show up in dark mode or it wouldn’t. I was just like, “Damn. This would’ve taken so long”.

We had some folks doing like explain your database query in the Honeycomb UI. We had folks doing like import all your dashboards from this other vendor to Honeycomb. We did … There were 10 different teams and they were all over the place. I was expecting great things, but I was still taken aback by just the range and the depth and the creativity and how much ground they were able to travel.

But the thing that you just said about … God bless DORA. The DORA report, I think they’ve been around for 10 years, and this year’s report, I think, was the best they’ve ever done, because they really dug into how is it going with AI, and some of this stuff is good and some of it is a real step back. Like, yes, we’re shipping faster than ever and the code is worse than ever.

I feel like learning to deal with this unprecedented influx of code of unknown quality, you could … Most teams to this day are used to debugging hard problems by looking for the expert, who did this, who wrote this, who built this, who understands this deeply, and you can’t count on that being anyone anymore.

Shane Hastie: Craig, your thoughts?

The AI divide [11:30]

Craig Smith: Yes. So I agree with what both Charity and Raf said. I think the thing I’m also seeing is the divide between the haves and the have-nots in this as well. So if you’re on the cutting edge and really have invested the time in this … Which I mean you do have to, and I suspect many InfoQ readers and listeners will be there. I just looked at my inbox as I joined today and there’s 15 AI newsletters in from overnight.

I think unlike any other technology, this is something that’s being both accessible to everybody, but also has probably been the most publicized. So, therefore, its hype is also the highest. But on the same token, there are so many organizations that are reluctant in this space.

And so, you talked about ChatGPT, which I think is, yes, the default in the media and for most people the go-to for this, but actually when you get down to more organizations, like Shane and I spend a lot of time with people like government and large corporations, it’s Copilot. That’s their experience of … And very locked off Copilot.

So there’s lots of stuff heading way off into the atmosphere, but also the amount of people being restricted is equally concerning. Like any technology that takes off this, the impact that it’s having on things like, yes, being able to understand what gets sucked out because, as you said, I can suck in a massive document and summarize it or put it into place somewhere, but sometimes at that speed, my abilities then go and go, well, now have I read all of it? You scan it and go, “Yes, that looks okay”. When you start doing that to code, that’s where it starts to get really worrying.

And so, like everything, we’re going to go through this cycle. And so, part of my prediction is that … I talked about quality at the start, is that testing hasn’t kept up. We’ve moved to this place of we don’t really need as much testing anymore. We can give that to the engineers. We can give that to the public to do. We can outsource that. But now the ability to actually stop and question and go, “What does this actually do?” is something that’s going to come back around again, because going to have to, because I think that DORA report, I think we’ve got the starter, the people who’ve … Those early adopters where maybe the quality hasn’t done anything that’s been too dramatic at this point.

But as we start to get into things more than that, I think then we’re going to actually start to see just a bit of a pullback and it’s going to be then how can we use this in a measured state as opposed at the moment, which is more of a hack or an early adopter state.

Charity Majors: Spray and pray.

Shane Hastie: Yes. One of our recent guests on the podcast, Adam Sandman, spoke about a study that was looking at 300% more code being released and 400% more bugs as a result of that.

Charity Majors: That they know of. That they know of.

Shane Hastie: That they know of, yes. Ben, what are your thoughts on this?

Impacts on quality [14:39]

Ben Linders: My thoughts on AI is that it’s a useful tool if you know how to use it, if you look on an individual level. So if you’re an engineer, also if you’re a tester, if you’re involved in product management and you know how to work with AI, how to come with good prompts, how to validate the output that AI providing, yes, it can help you to be more productive, to deliver better stuff in there, and to do a better job than there.

But software development is not about an individual doing something and another individual doing something. Software development is about working together. The key thing in software development, if you look at high-performing teams, is collaboration, and I don’t see AI providing a solution for that.

On the contrary, I think people are asking much more stuff on AI and then looking at what AI is producing and looking like. Okay, we got the answer right now, so we don’t need to go to other people in the organization anymore. We don’t need to collaborate. We can do it all individually and just rely on AI to get stuff together.

So I think the key thing, collaboration. Right now, I don’t think AI is providing a solution for that, or people are at least not exploring if there’s a solution for that. That is where really high-performing teams are making a difference.

Rafiq Gemmail: There’s something I’ve seen, Ben, this relates to this, is that we’re in this place where, as you said, people are writing code very easily, more bugs are going out. They feel that they can be autonomous and work on their local machine and, “Hey, I’m talking to an assistant. It’s told me all about the product requirements”, whatever, or they write code and they ship it really quickly.

I’m hopeful that we’ll start realizing soon that there is something else that needs to be in there, like an enabling function, because the things that I’ve seen, the move away from the test team is great. For those who are moving really quickly without observability in prod and they’re not looking at this thing I pushed out, am I pushing it out in a conservative fashion? Am I rolling it out? Am I blue-green, using canaries? Am I watching metrics on it? Am I …

Beyond code generation [17:00]

The person who inspired that is here. But that whole testing in our prod thing may be encouraged more, validating safely that you’re releasing, which means flow might increase perhaps. People talking. We still need reviews. You still need to have some gates before something goes out, maybe little micro ones if you’ve got small changes. But as we screw up in prod, maybe we’ll shift some of those processes back in so that we’ve got some guards is the hope. At the same time, there’s a part of me which is like wouldn’t it be great if we didn’t block on reviews and we could just take out a multi-factor review? So yes.

Charity Majors: I think … Actually the AI revolution in tech started with generating code, because that’s always been the easiest part, despite … I’m a little sensitive about the engineering classes and like … When I was coming up, it was like, well, the developers are the ones whose time is valuable, so they write the things. They throw it over the wall. Ops people will figure it out.

I’ve always been like, that’s the hard part. Production, reality, migrating, upgrading, maintaining, extending … That’s the hard part. And so, I feel deeply vindicated by this series of events, and then also …

Somewhat tongue in cheek, somewhat not, but I also feel like of course it was first and of course we’re now seeing people apply these very powerful tools to what comes next.

Testing has been mentioned a few times, specifications. Who the hell knows how to use specifications right? If ChatGPT is just going to go off and generate code to do whatever you told it to, but there isn’t really any way to consistently … It’s almost like that’s … We go from machine language to assembly language, to C to C++, to Python and Ruby and Nexus like a specification language. It feels like there’s stuff interesting that’s bubbling up there around how can we use these powerful tools, not just to generate lots of code but also to …

Some of the most exciting pieces that I’ve seen have been using generative AI models to accomplish giant gnarly refactors or rewrites. Now many of us have been … I’ve been scarred by rewrites. At Parse, we wrote the original Parse API on Ruby on Rails. Turns out when you have a million customers and a fixed pile of workers, it’s just not … You need a threaded model. So we were like, “Ah, this will probably take us six months to rewrite from Ruby on Rails to Golang”. It took us two years, almost nothing shipped during that entire time.

I think a lot of us have these really like … We’re scarred in our memories, and this is the kind of thing. We talk … It’s easy to use words like, “Oh, well, these things can be good at doing the things that users don’t want to do”, but it can be really good at slogging through a lot of these really time-consuming, detailed things that were really the beating heart of software that runs the world, airlines, banks, delivery companies.

I’m really excited about the fact that maybe The New York Times won’t have to run their core billing thing on something that uses COBOL anymore, which they’ve never [inaudible 00:20:25] get rid of for 50 years. I’m really excited about this stuff. It’s like, yes, we can generate and lift the lens of code. What else can we do with these tools?

Rafiq Gemmail: I had, Shane was there two days ago, like a pre-conference, the hackathon, and an AWS guy there who was demoing Q. Q, when AWS pushed out, they sent these … I think it was like 40 years of development, or maybe even more. They said they’re saved in migrating things from a historic version of Java to Java 17 or 21 or something, and that was really impressive. So we’ve got a migration in play from Rails to Micronaut. I challenged him to do it. He didn’t actually do it live, but he did give us little live demos of it, and it was quite impressive.

Yes, I think there are lots of stories there, but that, again, could take … As you said, something that could be a massive project and at least give you a big boost even if it doesn’t do the whole thing. I still think that you need the developers looking at the code, validating it makes sense, all of that stuff, but it’s almost like people are getting … Actually this is an insight right now. People are getting to step up as coaches a little bit more to be the senior dev to the … Yes.

This is actually what I’m trying to achieve, some of the cognitive load of … What was it? The extrinsic cognitive load of bad code and running it and spending ages slugging through is gone, and then now you’ve got the focus on here’s the problem to make and I want to solve the problem and read the code and understand it and get there.

Charity Majors: The more tightly scoped the problem, the greater chance AI can help with it. I think it’s telling that a lot of these wins … Like Stripe had another one of these. They rewrote from … Or they rolled out TypeScript instead of JavaScript or something like that, and it’s still like a six-month project. But there are some of these projects that it’s like the longer it takes, the scarier it gets, because the more changes under the … And I feel like the wins that we’re seeing with generative AI there, those are the ones that get me really excited.

Shane Hastie: Segueing a tiny bit, what do our junior engineers do in this space and how does a junior engineer become more experienced when they’re not involved in defining the code, writing the code?

Accelerating their advancement of junior engineers [22:52]

Charity Majors: I’m so glad you brought this up. I’ve seen a lot of folks out there saying things like, “Oh, I’m never going to hire another junior engineer”, and I think that is so short-sighted. This is an apprenticeship industry. We all learned by learning from other people.

Today’s junior engineers, they know so much more. They’re so much faster at learning. When I was a junior engineer, it was like, “Okay, some HTML and you can use a Linux command line. Cool”, drop you into the deep end and you’ll figure it out. Today’s junior engineers, they know a lot of stuff and they have a lot of resources.

One of our junior engineers, Ruthie, they wrote this wonderful thing about how they use AI every day. It’s like they’re a little coach sitting in the corner and they’re constantly asking it questions. It’s not that they don’t ask our senior engineers questions, but they go through the early drafts with their little chatbot until they have a really sharp … They’ve gotten the low-hanging fruit out of the way and they’ve got a really sharp question to ask the senior engineer. The junior engineers that I see are so impressive, and they’re not staying junior for long.

There’s this perception out there. I saw somebody post something on LinkedIn, I got super riled up and I had to respond, where he was just like, “Why would you even want to hire a junior engineer?” He was acting like they’re a net negative indefinitely. I’m like, look, any engineer that you hire is a net negative for a while, but you could hire the most senior principal engineer, they’re going to be a net drain on your resources for some amount of time while they’re leveling up. Junior engineers will be a net cost or expenditure, but all of these are investments.

If you’ve built a team that values learning, that values building systems that are easy and simple to navigate, if you build a team where everyone is expected to be curious and creative and constantly pushing their boundaries, that is work that gets reused over and over again, whether it’s somebody deploying code and owning their own code and production, whether it’s individuals bruising around within a company, whether it’s new hires, whether it’s leveling up juniors. Building a system that values learning and growing is the number one thing that you can do to future-proof your team, and junior engineers are essential to that ecosystem because they stress test it.

Shane Hastie: How do we as an industry get better at building high-performing teams? What are we seeing?

Building high-performing teams [25:22]

Ben Linders: Well, one of the things we’re seeing, and it actually also relates to the thing that we were discussing on junior engineers, is if you want to build high-performance teams, you need to have everybody really involved in the team, everybody contributing the best thing that they individually can do and they’re contributing to the team as a whole. If you want to make that happen, a key thing is that there should be psychological safety in the team, that people feel safe enough to try out things, that people to feel safe enough to bring in their ideas to experiment, to work together with other people in there, and to share their thoughts.

That’s also key thing, by the way, with the junior engineers. If they don’t feel safe enough to bring in their ideas, they’re going to go back to their room and just do their stuff, and you’re going to lose out on a lot of interesting things that they’re actually doing in there. So the key thing is to get them involved, and you need to have a level of trust and psychological safety for that to happen.

This is something that … I think I said it in last year’s podcast that psychological safety has taken a hit with COVID and all the things happening afterwards. I was hoping that things are getting a little bit better right now, but I think still this is something that’s being challenged with all the stuff that’s happening in the world in there, and that it needs specific attention in teams to get this level of safety, that people feel safe enough to bring up stuff and bring up their ideas. So this is something that teams really need to work on.

Rafiq Gemmail: There’s something I’m seeing, Ben, in relation to that, which is I know a lot of people in the coaching space, the capability enablement space, who, in the current climate, it’s not very nice to see, but we’re not investing in that area. I was chatting with someone who’s been a victim of … Someone who moved from agile coaching to enabling product teams and such at a financial company, fast-moving technology company, and he was talking about the fact that there’s this move in his org towards technical product managers who are coming in and it’s very much cost, cost, cost, plan, plan, plan.

The impact of losing coaching roles [27:40]

I see a lot of people being laid off at the moment, and we need those capabilities to enable the team to make them step back, to help them step back. Those are leaving a little bit. I’m seeing it fall on the hands often of managers or people who stepped into that. To Charity’s point, it’s a different role. It’s a role where people are still building that capabilities.

I run a new managers’ circle because we’ve tried to promote some people internally. The people I see having really interesting growth path and they want to track management, they’re in that space for now. They coach each other. It’s a learning journey, and I’m putting in the effort to help them on that journey. But I think there are many who may not get that, and suddenly it’s on them to do the bigger enablement of the team and sometimes they’re engineers who’ve just stepped up, and you do need those capabilities.

I don’t know if … There’s the controversial one is that there’s this side conversation about post-agile. I don’t know where I’m going with that, but that’s a topic out there. Where are we next? Are we living the values and principles enough? Many people think we are in our teams that they’re strangling out that capability. Maybe we can complacent the short term that, hey, we’re already doing CICD, we’re already enabling our teams. But I feel that we’re letting go, quite often, of a key capability to help people work together and to help them understand what the bigger picture is and step back, so we’re not just reactive and cost focused. I’m not sure where I was going with that one, but I see something going on out there, and-

Craig Smith: I think what we’re seeing is the gap between the … Again, I mentioned this before. The haves and the have-nots is growing. So if you come back to the junior programmer thing, I remember 15 years ago bringing on new developers and reminding engineers that when they’re teaching them things like TDD, it’s like just because you know that you can jump six lines of code and get a test, you bring on a new engineer and you’ve got to actually take them through. As Kent Beck rolled out in his book 20 odd years ago, you have to go line by line by line, and that was 20 years ago.

But if you go now, it’s not just things like TDD, it’s all the technology in our IDEs and all the technology in AI and all the things that jumps around it. For most of us on this call, not to date anybody, but we’ve at least been around through successive technologies that you can use something like AI because you know what it’s trying to do behind the scenes.

The disparity across organisations [30:10]

And so, this has been a progressive problem. It’s been there since the start of programming. Now that’s the same thing that’s happening in relation to the culture part. We’ve been through the 20, 25 years of agility and now there’s this portion that, well, everybody understands what that means. But if you haven’t been through that process before, you start to take for granted things like psychological safety and inclusion and all of those things that need to be there because, again, you’re used to a way of working.

But what really concerns me is that, Raf, you were saying that, yes, you have all the things like CICD and things like that. The majority of organizations that I work with don’t. They’re living under illusion of having those technologies in place. They’re still only just scrambling with the sides of working in an agile way because it’s still mostly more traditional ways of working. They’ve got some elements of CICD in there, but it’s not really working appropriately. And so, then you start to add newer technologies and newer ways of working on top of that and they haven’t got the foundation properly done. That’s the problem with any of these technologies. It’s okay if you’ve got that foundation.

And so, the smart people in this world, like all my learned friends on here, we get that because we’ve been on that journey. But so many organizations are just scrambling to make ends meet. What’s worse in this climate is that then you get a sniff of a new technology, you get rid of those people who understand how it actually worked. That’s what concerns me is we’re going to see more failures come because we haven’t quite had some foundation or we haven’t had someone who actually understood what is going on to be able to pick it up.

Charity Majors: That’s such a good reminder of just a sheer diversity in the ecosystem of companies. I somewhat wish that anytime anyone gave me advice, there was a lot that mandated them to give the context that it applies to, because I feel like I come from one world. It’s very different from the world that you’re talking about, Craig.

When it comes to how do we build more high-performing teams, I feel like these are sociotechnical systems. From a technical standpoint, the thing that I feel like the starting point, the tip of the spear is always reduce the amount of time between when people are writing code and when it’s in production. Make that feedback loop as short and as tight as you possibly can, because these are systems, these are feedback loops.

Every engineering org is a feedback, so many feedback, interwoven feedback loops. Then what happens with these amplifying feedback loops is the farther upstream you could make a change, it has these ripple effects as it goes downstream. That loop between writing the code and seeing it in … Hopefully you’re instrumenting it as you go. You look at it in production, you’re like, “Is it doing what I expect? Does anything else look weird?” It is a different company that does this in the order of minutes or a small number of hours versus a large number of hours or days or weeks. These are different companies, right?

Shane Hastie: Yes.

Charity Majors: So from a technical standpoint, that’s where I always tell people to start, that and … This is somewhat self-interested. Obviously I care a lot about observability, but observability is the V-er feedback loop. It is the sense-making for everything you do in production. The farther upstream you introduce it, the more reliably you can make sense of your production systems as these feedback loop. Shrink the loop between when you do something, when you can see what it’s doing and make sure that you can actually see what it’s doing are the two things that I always tell people to start with.

I also feel like there’s this tendency … In sociology, they call it the FAE, the fundamental attribution error, to overestimate our control over the world and underestimate system of influence over us. Where this manifests in engineering works is we scrutinize individuals, but what matters is the team. What matters is what we can ship and support as a collective unit.

The dangers of monoculture [34:07]

I also want to put … This is something that’s very controversial in today’s world, but diverse teams are resilient teams. A monoculture could move super fast, but it gets derailed. Someone gets sick, someone gets pregnant, someone leaves, someone … And diverse teams, where … I’ve been the first woman to join many companies, and I’ve never mind it, but I come from an unusual background. In general, you really want no one to feel like the outlier. You want it to be like, okay, we’ve got a diverse number of ranges, of experience, age, gender identities, racial backgrounds, so that nobody has to feel like the outlier, because these are resilient teams. They know how to roll with it when things change.

I also feel like something that really comes into play with AI is outcomes over outputs. Outcomes are what matter, outputs are not. I think that when you’re looking at how to build more high-performing teams, looking for ways to identify outcomes and to focus on those. Outputs are just so messy and it’s gotten easier to generate more of them than ever, but really zeroing in on what are the outcomes we’re trying to achieve and how do we get there is to me …

And the last thing I want to say is the world that you inhabit, Craig, where these teams are just trying to get a handle on agile. It is so much easier to do this once you’ve seen it, once you know what good looks like, once you know what great looks like, once you know what it feels like to be part of that team, which is why I was saying earlier, this is an apprenticeship industry. We learn by observing each other, by experience. I’m going to pull her off here. I don’t know where I was going with that, but I think it’s really important.

Shane Hastie: Yes.

Rafiq Gemmail: To build on that, this whole podcast produces a trends model, and the trends model follows the trend adoption curve. What’s it called it again, Shane?

Shane Hastie: Technology adoption curve.

Rafiq Gemmail: Technology adoption curve. You’ve got these phases of innovator, early adopter, early majority, late majority. I’ve been in government, I survived. I’ve been in corporate banking, I’ve been in startups, and they all fall in different places on this curve. I’m in a megacorp, which works really well and we have a lot of really good practices. But even in there you see that you can have different islands which are different stages of adoption. You can go into one island and you can point at another one.

I think there was a term I remember hearing long ago, lighthouse teams. You point out … Like there are some really good teams doing amazing things here because you see people have got it into a local optima, and one team thinks that a two weekly release is really good, or a monthly release, and we’re doing really, really well, and even today, sometimes, because constraints and requirements. You can point at where that’s been addressed elsewhere. Being able to perhaps recognize through your own lens where these people are helps to point out others.

The other one when I get in a new context, which is really helpful, is, to your point, I want to bring in some people who know what good looks like, because they can help others on that journey. You see those in the team, and that’s an accelerating factor.

Measuring waste and productivity [37:38]

So I think from what I hear is that there are people in different stages, and understanding where they are is helpful. I often make the point when people are saying, is AI going to wipe out engineers? I’m like I don’t know what the longer term thing is. I think we really need them. I see people doing vibe coding and messing up, but we’re at the start of a curve. This thing is going to get better and better.

But at the same time, there are people, as you said, that’s out there that still maintaining COBOL. I don’t know why, but there are. There are people who are discovering agile. We think that’s really weird. But somewhere in the world, there is a team which is like, “Have you seen this thing people are doing?” I’ve been in places where people see something that you’ve seen for 10 years and they think it’s brand new. So I think there’s going to be like a curve. There’s the long tail that will catch up.

To high-performing teams, I’ll throw in the other one, which is engineering metrics. Abi Noda’s people, GetDX came in earlier. Was it last week or the week before? And they’ve got an amazing platform. I use spreadsheets and other things to measure lean waste. People can go into Slack, it’s like a cannibalized machine, and they can say, “I just lost time. I lost an hour, 20 minutes, 30 minutes because I can’t run my build or I didn’t understand the requirements”, and they can classify it. It’s helped me prioritize things that speed up their flow.

There are tools out there that you measure that more readily. Not to plug a company I’ve got nothing to do with, but their platform’s amazing because it’s very engineering-targeted-

Charity Majors: What’s it called?

Rafiq Gemmail: … for people to use it? Getdx.com. They essentially give you a big data lake. They pull your Jira data, your GitHub data, and they give you ways of combining that and writing your own CTs and queries to pull out the message you want.

Charity Majors: We’ve been using one called Multitudes that we also really like.

Rafiq Gemmail: I’ll make a note of that. But those things are really, really useful for teams to look at. So I do service delivery reviews where we look at cost, we look at cycle times, we look at throughput, we look at incident counts, we look at mean time to recovery, we look at product metrics, the most important thing. We could be optimized, but we’re actually shipping and delivering value to the customers. Looking that on a regular cadence for me is another part of another high-performing team because you know the impact, the feedback loops, of what you’re doing.

High-performing teams are not afraid of metrics [40:09]

Charity Majors: High-performing teams are not afraid of metrics. I know a lot of engineers are very … Believe me, I’ve seen this internally. People are just like, “Don’t measure the stuff that we’re … “, and it’s like, look, this is not going to be used in performance review. We’re not going, “Well, gee, who has shipped the fewest lines of code this quarter? You’re … ” That is not what the … Engineers being engineers, they will optimize for anything that they know that you’re watching closely, but it’s not about that. It’s about giving people some visibility.

These are conversation starters, not conversation enders. You can’t just have a couple of metrics be like, “Well, we’re looking at number of pull requests”. Insanity ensues. But anytime that you’re looking at this, you need to make it clear that you’re looking at a basket of metrics and you’re not using it to make judgments on people. You’re using it to help identify bottlenecks in the system.

Ben Linders: But let’s go back again to what you mentioned earlier, Raf. This goes back to giving attention to these kind of aspects. This is where a lot of teams are failing right now, certainly when they’re under more pressure. So the mere fact that you spend time on looking on how are we perform, what are we doing, what are the metrics telling us, what can we derive from this, having time for curiosity, trying to really understand what’s happening in the team, that is the key thing to get this kind of improvement going. So it’s making space for improvement to happen and giving attention to it. That is the key thing that a lot of organizations are lacking.

Craig Smith: And it needs that-

Observability trends [41:50]

Rafiq Gemmail: I thought for the users, for the listeners, that it can take 20 minutes to set up a dashboard sometimes, even the minimal dashboard. You can look at it, and those sessions don’t have to be long. It’s like looking in a mirror every morning.

Ben Linders: It’s not much time, but it’s the mere fact of allowing people to actually give attention to this and to spend time on this, even if it’s just five minutes.

Charity Majors: Yes. Any team that is running at 100% … This is a fuck it … Sorry, my mouth. This is a failure. Any freeway that is 100% full is stopped. It’s at a standstill, right?

Craig Smith: Yes.

Charity Majors: These systems, there’s a bunch of … I don’t remember the studies or whatever, but you don’t want to run a system of more than 70, 80% capacity because you’re going to need to spike sometimes. There needs to be flex. There needs to be room for curiosity and there needs to be room for experimentation. If you’re running a team at 90% all the time, you’re burning people out, and that’s stuff that takes a long time to recover from.

Craig Smith: Yup. That’s all good, I think, if you’ve got the culture throughout the organization to allow the time, because I take your point, Raf. It’s something when I go in, it’s just like surely you can find 20 minutes. But on the same token, I see so many organizations … And, in fact, if you think of the way the world is right now, people aren’t stopping to look at the things below. They’re making strategic decisions that have absolutely no thought process to what’s actually happening below the surface because they’re reacting to the fast-natured world that we’re in, and you have engineering leaders that aren’t really engineering-focused.

So teams are often then being in that small ecosystem where they almost … The way I started even my whole agile journey was just by hiding it because I wanted to get stuff out. That works if you can get the whole team behind you, but all you need is someone on that team who’s like, “No, no. I think I’m going to follow what the whole organization wants to do”, and they fall into the trap.

That’s why we’re getting that divide so wide and that’s why I say there are so many organizations out there with so many people who just don’t even get the ability to try these things because it’s like, no, let’s continue to go the tried and trusted route, or they’re actually eliminating a lot of these steps because it’s like, no, we’re in this world now where we just have to make a strategic decision and run with it, or we don’t have the money because we just have to … Don’t do observability, just ship stuff because we want to cut that thing out. That’s the scary thing about the world that we’re sitting in right now.

Charity Majors: One thing I will say, I think the zero interest rate phenomenon era was terrible when it came to engineering leadership, because cost is an attribute of systems and architecture. I just think it was really bad for us. I think we got really detached from a lot of the realities of business, and I think we’re in a bit of a catch-up period now.

But when it comes to C-levels, VPs who are overcharged with overseeing, let’s say, engineering, R&D organizations, but they don’t have a background in engineering themselves, it’s really dangerous because there are so many things about building, supporting R&D organizations that are high performing that sound completely counterintuitive. They sound crazy, let’s just put it that way.

The idea … Like change reviews boards. Well, that just makes sense. Especially if you have someone as your CEO who comes from a finance background, they’re pegged to accounting rules where it makes no sense to have the same person submit the receipt and sign the receipt. So obviously you need this in your security function, too. You can’t have the developer write the code and commit the code.

This is such a source of just fucked up shit. I’m sorry, there are no other words that will describe it. I feel like execs, look, if you think that you’re running an organization where technology is a differentiator for you, it’s a value creator, you have an obligation to understand enough so that you’re not just cutting the legs out.

What I tell folks is read Accelerate, at least read Accelerate, because they’re so cautious, data-driven. Data-driven. I’m just going to underline that again. It’s like, look, this is what hurts engineering works and this is what builds good with, because it does sound crazy that a change … A board of people looking at each release wouldn’t lead to higher quality outcomes. But the data shows that all that does is slow you down and make your outcomes worse.

Shane Hastie: Leaning into observability and FinOps, Charity, you made the point cost is a first-class citizen. It hasn’t been.

Fin-Ops: cost is an attribute of systems architecture [46:44]

Charity Majors: It hasn’t been, and we’re all paying the price now, aren’t we? This is so top of mind for me because I’ve spent the last … So in 2018, I wrote this tiny little blog post, it’s like 500 words, where I’m like, I have no data. I’m pulling this out of my ass. But observationally, it seems like people who have good observability are spending between 20, 30% of their infra bill. I had forgotten about this. Nobody looked.

This last year, this blog post has started showing up. Everybody’s … There are funding announcements that are linking to it, and journalists are like, “Experts?” I’m like experts said that they were pulling that out of their ass. Experts were very clear. That was just an opinion.

Anyways, the cost of observability has been in a lot of people’s minds for the past year or two for really good reasons. And so, I’ve spent the last couple of months doing research, like trying to talk to people. It turns out it’s really hard to find out what enterprises are spending on observability because they don’t know. They don’t know how much they’re spending on infrastructure so they definitely can’t tell you what percentage of their bill is being spent on observability. But it’s becoming …

Gartner did this webinar recently where they produced some actual data. They talked about this one customer of theirs that in 2009, they were spending $50,000 a year on observability. Now in 2024, they’re spending $14 million a year on observability. This is like a 40% year over year increase for 15 years. So this is going to put us all out of business if we don’t get a handle on this. So I’m doing research … Anyway.

The first question that comes out of all this is – is your observability an investment or a cost? If it’s a cost, it should just be minimized. If it’s an investment, that means it pays for itself and then some. Maybe it pays for itself 5 or 10 times over. And so, your job in stewarding this is to figure out where that diminishing returns line is and spend more up to that line, and then find ways to standardize, find tiers, all this stuff. Anyway, very much on my mind right now.

Observability overwhelm [48:58]

I feel like the complexity of the stuff and the numbers that we’re starting to talk about means that this is turning into a data engineering problem. Actually, I feel like all of the observability stuff is moving over the next five years into a data lake where you store things once and you rely on AI enabled clients to surface. But having everyone pull data from the same data source actually has these great sociotechnical ripple effects, because you stop arguing about the nature of reality. You stop having this cost multiplier.

Gartner said that most of their customers are using between 10 and 20 tools, which means that for every request that enters their system, they’re storing observability data. It’s a 10 to 20x multiplier. Their observability bill is growing 10 to 20x faster than their business is growing at a baseline. Clearly unsustainable.

But the other thing I want to say is in some ways this is a good thing. In 2009, when people were paying $50,000 a year, what were they getting? They were getting some health checks, some super basic CPU metrics, memory metrics, disk space, health checks for their monolith app and their database and their web tier. I feel like, at its root, this explosion of cost is due to the complexity of our systems is exploding, and it’s been exploding and there’s no end in sight. It keeps curving up upwards faster and faster.

The fact that they had quotes from folks who were like, “It took off like wildfire. We rolled out real observability and it just took off … “, and you can’t pry it out of their cold, dead hands. I’m like, yes, because for a long time we used to interact with our systems mostly via the snapshot that we kept in our heads. This is what the system looks like. So we’d SSH to that node. We’d pull up this dashboard. But our systems are now so complex and so dynamic and they’re changing so fast that this snapshot in our heads is out of date by the time we try and use it.

The only way to understand and interact with our systems is through our observability tools, which means that they have to be complex enough and dynamic enough and fine-grained enough. They have to be an analytics platform for us to understand how deployments are working, how our software is behaving, and what kind of experience our users are getting. I feel like the last five years have been an era of extreme catch up in that regard, but I do hope the cost will start leveling out for folks soon.

Rafiq Gemmail: I was in the BBC back in late 2000s and we used Splunk. Very expensive. People will not pay for it today unless you force them and you beg them. But there are other options out there. So I love all your observability 2.0 stuff, because we used to get everything in there. You could understand customer metrics and you could dashboard things. It was really, really powerful. But, as you said, it was the lake. Everything went in. I was able to create product metrics, performance metrics, see patterns in behavior over time, compare it, all of that stuff.

Then we started … I came out to New Zealand and we were starting to use the cheapest stuff. We had the Elastic Stack, the ELK Stack, and we had other things which were a bit more restricted. Then people started having different solutions. I mean, historically, you did as well. But I see so many teams in the microservices world which have a bit of Grafana and they use CloudWatch for something else.

Rafiq Gemmail: One team is all Datadog and whatever it is. You’ve got all these different, different tools. And so, your thing about being able to ask the right question, I remember being able to go in with the question a decade ago, and you try and make structured events and make the things easier to query. But I love … For those who don’t know, there is this idea of the lake of observability stuff that Charity was talking about. You, in fact, had a post I saw around April Fools and I thought it was an April Fools joke because it was observability 3.0.

Charity Majors: Yes. I didn’t think about that.

Rafiq Gemmail: I probably shouldn’t use the 2.0 term, but this lake of observability where you can just go and ask the right questions. That’s really powerful because sometimes I find myself wanting to ask a question, and half of it is in one observability tool, another in another one, and you’re left to correlate.

Even in the cost environment, I’ve seen people move away from the right tool, which gives you product level visibility, to something that’s cheaper or try to cheapskate on it, which makes your life harder. And the value, I mean there’s a question about what is the value of that observability tool? What’s the return on investment? Because I don’t think we’re often asking that. We’re trying to see it as part of the infra-cost problem sometimes, and that disables teams.

Charity Majors: Yes. One of the … Like this field is so diverse that there are very few blanket recommendations that I will issue. One of the bi-critic recommendations that I do issue in my post, which is coming out soon, is your observability costs need to be managed by the CTO, the VP, not the CIO. If they’re getting managed by IT, they’re getting managed as costs, which assumes that all solutions are identical and you should negotiate the best price. You need to have this be owned by someone who knows how to manage tools as an investment, which means sometimes paying a little bit more, but in ways that will return on those investments.

Rafiq Gemmail: 100%.

Shane Hastie: A lot of passion, some great advice in there. We are coming close to the end of our time, so I’m going to ask just a wrap-up from everyone. What are the things that you are seeing or hopeful for going forward from today? Craig, can I start with you?

Closing thoughts and future directions [55:08]

Craig Smith: I’ve talked about it a lot through here. I’m hoping that the gap can start to wind in. I don’t know what that looks like, but there is so many good tools and processes and things out here. But I think we’ve also lost … We’re going back to almost 20, 30 years ago, and it’s been interesting that the culture and methods area in InfoQ once was really all about things like agile, and we were in that post-agile period. It’s interesting that it hasn’t really been mentioned once in this whole podcast, which sums up the world.

The problem is that where we came from before that, and we might refer to a, say, waterfall as a generic term, was that there nothing to look at in relation to waterfall. There was a paper and it was just big clunky processes of doing stuff. Then we had everyone move around agility and the principles and the values, and we all went, “Yes, we’re doing it”. Now we move on the other side, what is it that we move to?

What worries me is we’re now in this almost messy type of situation that we can’t really refer back and go, “Hey, we’re doing this in an agile way”. We just accept that that’s good practice. But as all these new tools and technologies and things come into the pot, it becomes harder to actually get people to focus back on a core set of values or principles.

And so, in this world where it’s a mess and there’s lots of uncertainty and things like that, that’s where things like that are very useful. I don’t know where that comes from because no one organization can come up and say that. I don’t know, it has to be something that comes from somewhere. But I think we’re missing that next frontier in technology and how we do it in just a simple statement on what this actually means. So what is the next big DevOps or agility or cloud or something like that that actually becomes the stake we can put in the ground?

So that’s what I’m looking for is what is that one next thing? There are so many good voices out there, but we need to actually bring them into a place where everybody can start to use that and make it a north star that we aim for, because, otherwise, we’re going to get further and further apart from this world of great tools and things that we’re building.

Ben Linders: The key thing that I would like to see happening more, and it is actually happening a lot already, is to see that improvement, be it continuous improvement of being people reflecting on how things are going, how their work is going, how the team is going, to have that ingrained into the people and into the teams themselves. I think we moved away already a long time ago from a world where we had things like external orders and people coming in externally to tell us how to do our work, having consultants coming in and telling the organization to start working in a different way. So I really want to see more and more happening, that people take the time and find the time to reflect on how things are going and look for improvements in there.

It’s actually getting embedded more and more different roles in there. I see a role in there, for instance, for our staff engineer, for the principal engineers who are not just focused on the technical stuff, but as a sociotechnical aspect are also focused on how the teams are working in there.

I see this for people who are rotating the coaching role in their team, not having one person being a coach or an agile coach but taking coaching roles on terms and taking the time to reflect, rotating the role of facilitators, for instance, on retrospectives, making sure that everybody has the skills and the insights in there to look for improvement there.

So we need to make sure that improvement is embedded into the teams, into the people themselves, because if you can really embed it, you can give it a boost. Then the people are going to be focused on the things that really makes sense for them and that they want to work on.

What we can do as leaders are working with organizations is focusing on creating the conditions to do this. Sometimes one of the main conditions is just to make it possible for teams to stop and take some time to reflect. So hold the space for people to do it. That is a key thing already to make this happen.

Shane Hastie: Raf?

Rafiq Gemmail: Everything that Ben just said. Craig talked about what comes after the agile principles. There are many lenses out there. The one that resonates with what we were talking about today is Gene Kim’s Three Ways, and we’ve talked about flow. I think flow engineering is a big part of it.

I was talking to someone recently about trying to optimize how we work, try to remove the waste so that we’re looking at that value stream in which to optimize. It’s something I hope that people will continue doing. It’s a big part of it. How do we use things like AI? How do we upskill juniors, we talked about it, so they’re able to get to a set of flow themselves? How do we make sure that that continues into production as we ship a thing, as the pipeline flows into the team? So maybe the AI tools will help us there.

To get there, it’s kind of like the second one, is feedback, the feedback loops. We talked about observability and we talked about experimentation. I can’t remember if we talked about experimentation, but that’s huge. That’s hopefully a part of it, that we’re creating the signals and the visuals we need to say, what’s the state in production? How are these AI tools helping us? What’s the return on investment on something? Are we getting it if we’re making a cost-based decision?

The other one is that learning culture, which we touched on as well. Psychological safety learning. How do we, in that environment, safely bring in the new tools? How do we change our ways of working? How do we acknowledge that everyone’s using copilot and we’re pushing out more bugs? Can we try something to guard against that? Can we use the AI to test? That’s something I’ve been throwing around to someone recently around our test cases.

Is there other gaps in it that we can use these tools? I’m using AI willy-nilly here, but can we use a copilot assist, a code assist tool to write some of our tests for us or guide our observability? What should we be measuring for this particular story when it goes to prod?

So I think maybe there are principles there which are coming from the DevOps space that we can learn from if we’re saying a lot of the agility is happening. Maybe those principles were built on the foundations of the agile principles around technical excellence, around delivering value regularly, around communicating with our customer, working as a team, all of that stuff. So even though we’re not using the old scaffold, maybe there are new ones that we need to bump up a little bit and use as our lenses. But, yes, that’s my hope. That was a long sermon.

Shane Hastie: And Charity?

Charity Majors: There’s that famous Arthur C. Clarke quote about how the future is already here, it’s just unevenly distributed. I keep thinking about that listening to all these great answers. Craig, you were talking about agile. I feel this way about DevOps. I feel like we are in the twilight of the DevOps movement, not because DevOps is no longer relevant but because it’s now the air that we breathe. There are no enterprises up there spinning up a dev team to write the code and an ops team to operate.

The idea that half of your engineers would write the code and the other half would understand it is just like … Nobody’s doing it. I’ve never actually worked in a Agile shop, but without knowing that’s what I was doing, I’ve absorbed a lot of that agile principles throughout my career because they’re in the water.

I feel like for DevOps, for agile, even to some extent for observability, the fact that we’re no longer obsessing over it or capital A-ing our agile, it’s a sign of success. It’s a sign that the battle has been won even if the benefits are not evenly distributed.

One of the things we didn’t actually talk about today, with so many topics, was platform engineering. To me, what platform engineering means is your customers are internal developers. You bring a product mindset and a design mindset to tools that you’re building for engineering experience.

This to me is the pinnacle. It’s the next … The idea that engineers are people too, and good design might make our developers more productive is something that we’ve only really started to recognize in the past five years. Look at the Vim interface or the Emacs. Much as I love them, it’s like they were not developed for human beings. They were developed for engineers. I think this is …

When I think about what makes me optimistic, I think about platform engineering. I think about the fact that we’re starting to grapple with the just skyrocketing complexity of our systems and the fact that we have to use our tools to make sense of them because we can’t just cash a snapshot in our head.

The last thing I’ll say is a little bit out of left field, but we mentioned the pendulum a couple of times. I wrote that blog post in 2017. It was one of the first … Coming up almost 10 years old. When I wrote it, the only avenue, if you were an ambitious engineer for career progression, more salary, more status, was to become a manager. That is no longer the world we live in. The entire emergence of the staff plus discipline, that the parallel hierarchies for builders and doers as well as managers, this stuff brings me hope. It makes me feel like for all of our works and the broad distribution of experiences, there are things that are happening in the industry that are moving in the right direction.

We did spend some time talking about junior engineers, so I just want to put in a final plug. I know of a bunch of places that have spun up new programs to recruit, hire, train, mentor junior engineers, and every single one of them was started by a senior staff or principal engineer. None of these were started or run by managers. They were started by engineers who were like, “I’m willing to take this on because I know how much learning is essential to our entire culture, to our investing in the future, to the future of engineering in our profession”. Engineers have the credibility to do this. Nobody has the credibility of a builder to talk about what it is that builders need.

So I just want to drop that out there because I hope that more of our very senior ICs will leverage … It’s one of the things about having powers. You never really feel like you have power, even though other people can see from the outside that you have power. There’s a lot of power that you get by being a very senior builder in this industry, and I hope that we’ll use some of that power to bring the next generation on board.

Shane Hastie: That’s a hopeful note to end on. Folks, thank you very much. This has been the 2025 trends reports for the InfoQ Culture & Methods team. Thank you so much.

About the Authors

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.