Podcast: Engineering Excellence as a Journey: Platform Engineering, Culture, and Technical Leadership

MMS • Ganesh Datta
Article originally posted on InfoQ. Visit InfoQ

Transcript
Shane Hastie: Good day, folks. This is Shane Hastie for the InfoQ Engineering Culture podcast. Today, I’m sitting down with Ganesh Datta. Ganesh is the CTO and co-founder of Cortex. Ganesh, thank you very much for taking the time to talk to us today.
Ganesh Datta: Thanks for having me.
Shane Hastie: My typical starting point with these conversations is who’s Ganesh?
Introductions [00:45]
Ganesh Datta: A good question. Like you said, I’m the co-founder and CTO at Cortex, which is an internal developer portal. I was a software engineer in a past life. I wrote a lot of the code at Cortex as well, and I stay very involved in that side of the world today too. I had a chance in my last role to see the monolith to microservice journey as part of the team that pulled the first service out of the monolith, got to see it get to a couple hundred services by the time I left.
Built out our production readiness standards, our code scaffolding and all those things, and got a real appreciation for the amount of work and effort it takes to build the organization that operates at levels of excellence, and it’s something I’ve been passionate about for a very long time.
Shane Hastie: An engineering organization that operates at levels of excellence. What does it take to get there?
Ganesh Datta: Excellence is a journey, not a destination. I would actually say there is no there for excellence because I think excellence is constantly changing. I always like to say there’s no company that I would consider excellent because excellence is a continuum, and every organization’s level of excellence is more excellent than they were the day before or a week before or a quarter before. But that being said, I do think excellence generally means are we operating to the best of our abilities across engineering practices that help drive business value?
So as a company, as an organization, if we care about shipping features faster to the market, being innovative, building a great customer experience, doing so with an eye towards cost and efficiency, that do our engineering practices allow us to do so? And that generally gets broken down into are we operating at a level that we’re proud of when it comes to reliability, security, efficiency, and velocity? Like do we have the tools to do all of that? That’s when I think about excellence and that’s why I think it’s a continuum versus an end state. I think it’s a culture of excellence.
Shane Hastie: One of the things that brought us together was thinking about platform engineering and platform as a product. How do platform engineers take that stance?
Treating platform engineering as a product [02:54]
Ganesh Datta: Platform engineering has been around in some shape or form for quite some time. We’ve seen various iterations of this over the years. We’ve seen developer experience teams, we’ve seen DevOps teams, we have our traditional product teams internally, and I think all these groups have come together and turned into the modern platform engineering group. But the platform engineering group is given a charter to build a platform that their end user, which are the developers within the organization, can consume to build better products for their end users, which are the actual customers.
And so the challenge has been platform engineers have been given this charter to build a platform as a product without a background in product management, but that is what they’re being asked to do. And so I think it’s very important for platform engineering organizations, especially when they’re being spun up, to adopt a product-management mindset for the platform they’re building, which will allow them to build a better platform that enables their end users as well as the business at the end of the day.
Shane Hastie: What does a good platform do?
Ganesh Datta: I think it’s worthwhile first defining what product management looks like, because I think that answers the question of what a good platform looks like. And I think the way I would start by defining that is a product manager is there to solve customer needs in service of business outcome. So as a business, we want to drive revenue, and therefore, as a product manager, I will figure out how my product solves for the highest number of monthly active users. That’s my goal as a product manager.
And taking a step back as a platform engineer, if you apply that same lens, the engineer organization cares about delivering value to their customers, they care about reliability and quality, they care about cost efficiency, and they’re investing in the platform as a way to enable those end state. And so a good platform is enabling the business to do better at those things without having to invest in having every team spend time thinking about the foundations. A good platform is the foundation on which the engineering team can go and solve business goals. So a platform therefore transitively enables the company to achieve its business outcomes.
Shane Hastie: One of the key elements of product management is understanding your customer as an engineer and a platform team, how do I avoid the “I am my customer” bias?
Avoiding the “I am my customer” bias [05:24]
Ganesh Datta: It is very, very hard to do that. If you think of it from the lens of as a platform team, we are a fraction of the large organization. So if we were to solve for ourselves, we would be solving for a 10-person team, 50-person team, but we’re actually solving for a 500-person engineering team or a 300-person engineering team or a 2000-person engineering team. And so literally just looking at the scope of which we’re solving for then will give you that unlock of we don’t actually look like our customer.
Yes, I as an individual look the other customers that I’m serving, but if you frame it as we’re solving for the engineering organization, not the individual developer, then the organization I’m solving for is a very different shape and size than the platform engineering group itself, so therefore the problems themselves are vastly different.
So I think it’s that framing of is it the individual or is it the team and what do those things look like. And actually one thing I would add to that is even if within the individual, one of the problems we see in platform engineering is treating the developer as the only consumer of the platform, when actually, there are multiple personas that will consume the platform you’re building. You have the developer who is actually doing your day-to-day feature development, but you also have your reliability teams who are trying to build a reliable product on top of the platform you’re building or ensuring reliability.
You have a security team who’s trying to get developers to focus on security within the guardrails of the platform you’re building. And then finally you have engineering leadership who’s trying to understand are we actually solving the things we should be solving? Like are we adopting all the platform capabilities? Are we solving for liability security and feature development and so on?
So actually, you are solving for a variety of stakeholders through the developer experience you’re producing as a platform team. So I think that’s the other important thing to think about is you don’t have one persona. You actually have multiple personas, and it’s very unlikely that you yourself as a platform engineer are all of those personas in one.
Shane Hastie: So how do I go and hear from those personas, and in some ways convince them that they should even talk to me?
Connecting with engineers as users of the platform [07:33]
Ganesh Datta: Yes. I mean this is a classic problem for all product managers, but I think platform engineers are lucky in the sense that they’re already inside the organization and so people see you as a partner and you should be seen as a partner. I think it’s really important to start with what are the problems we’re trying to solve?
Platform engineering teams sometimes get stuck into a world of, Hey, we’re going to build a really broad, really deep platform and everything, that we’re going to go and take this to the user and show them, and if you build it, they will come. Like look at this great amazing platform we build, look at all the things it does, look how easy it is to use. Why is nobody using it?
But I think if you flip that around and you go to the developers and say, “Hey, our goal as a platform engineering team is to enable you to move fast and not having to think about all the things that we’re going to solve for you. What are the areas of friction for you right now? Where do you feel like you’re bottlenecked? Are there processes in your day-to-day that are slowing you down, doing user research like our product manager would?
As an example, if you’re spinning up a new microservice, walk me through that process from day one. Where do you go to get started? How do you create a repository? Who owns that? Are you allowed to do that by yourself or do you have to go get approval from somebody? How do you know where the code comes from? Is there a template you can use? Walk me through the process”. And through that kind of user discovery, you could find bottlenecks.
And then you can go back and say, “Hey, as a platform team, what are the immediate bottlenecks that I can go and solve for and show value to the developer?” And now you’ve built their trust. You’re saying, “Hey, the platform team is actually trying to make my life easier. They’re not here giving me yet another thing to think about.
There’s enough people in the organization who are giving me new responsibility and tasks, the platform team actually enabling me”, and if you can deliver that kind of value in the short term, you’ll build a trust to say, “Hey, as we build a new platform or the new capabilities, we have your best interests in mind”. So I think it’s kind of that incremental value and user research that are great ways of building trust with the broader community.
Shane Hastie: Iterative incremental development has been the mainstay of the way we build software now for the last 20 years. Are we still not doing well?
Ensuring the platform solves real problems for engineers [09:35]
Ganesh Datta: There’s been a new framing of this recently, which I really appreciate. In the startup world, people talk a lot about MVPs. Just build an MVP and get it out there and see what people say. But there’s so much software out there and so many things going on and so much noise that it’s important to move away from the minimum viable product, which is this is just barely… it just works, to minimum valuable product. Does it actually solve something for the customer that they care about?
The same holds true for platform teams. It’s very easy to get stuck in a world where you build a brand new capability, especially an organization that’s modernizing for the first time or moving from an on-prem environment to the cloud, and look at these really cool new cloud feature that we have. Okay, but what value does it give me? Is it actually solving something that is causing me friction?
The way I like to think about it is if you make it really, really easy to do the right thing, and it’s easier than the current status quo, people will naturally adopt it. We’re all lazy, we’re all human. If you give me an easier way of doing something and it’s going to make my life easier, why wouldn’t I go and install that? Versus if you show me something really cool, I would say, “That’s really cool. I have other things going on right now. Talk to me in six months. I’ll definitely consider it then”. This is the classic fallacy in product management.
And so from our platform team, I think that’s where the minimum valuable product is so important. If you can find a specific pain point and solve something for them, then you’re showing them value. You’re saying, “Hey, I’ve delivered something that’s actually going to make your life easier”, and now you have developers actually using your platform versus just showing something cool.
Shane Hastie: In your CTO role, stepping up, how do you create an environment that allows great culture? And maybe what is great culture?
Enabling great culture [11:17]
Ganesh Datta: As a founder or as a leader, you have an idea of the culture that you want or the type of team you want to operate in, but what culture really is, is do people behave the way you want them to when you’re not in the room? And do people hold each other accountable to that kind of behavior when you’re not there?
That to me is culture. And a great example of this is if you say customer obsession is your value, that is something you care about deeply in your culture, if you have a group of engineers looking at a bug ticket, do they say, “This doesn’t fit the strategy that we have for a product, therefore we’re not going to build it”, which is a culture of product obsession, or do they say, “Hey, we need to unblock the customer. Let’s go figure out how to solve this and then we’ll figure out how that looks in the product in a little bit”, and that’s customer obsession.
Neither one of those is right. Some companies really lean on the customer obsession, make the customer come first. And some companies are like, “Hey, we need to build a great product. Even if sometime that creates friction for the customer, we have to be true to our vision”, kind of like the Apple view of the world. And so culture is do you have confidence that that group of people in the organization will make the right decision because they understand the culture, they all are operating at that level? So that’s what I think about as culture. And this applies to a variety of things.
The way you work with each other, the way you communicate, the way you hold each other accountable to performance bars, quality bars, and in engineering, I think it’s engineering excellence like we’re talking about. Do we hold ourselves to a level of excellence? Is that part where culture.
Move fast and break things is a valid culture, or make sure we don’t break things ever is also a valid culture. Those are different cultures. And from an organizational perspective, I think, as a CTO, as a leader, the things that I think about are how do we reward examples of the culture and behavior that you care about? How do you make sure you highlight those and showcase those to the organization? How do you yourself exemplify those? If you as a leader are going against your purported culture and values all the time, then whatever you’re saying is actually not aligning with whatever you’re doing, and so everything’s broken from day one.
I think it’s about hiring, make sure that’s part of your hiring process. You are including things, you’re testing for cultural things. And a lot of people say, “Oh, we have a mission and values step in our interview process”, and it’s usually just kind of like a chit chat with the candidate. You have a great conversation for half an hour, but you don’t really get anything out of it other than, “This person’s cool, I can see myself working with them”.
But can you actually structure interview process and say, “What are the kinds of things we care about?” Do you want to build an engineering culture where people are really passionate about their craft? I want people who they just get sucked into the problem, it’s 8:00 PM, and they’re like, “Man, this problem is so fascinating. I need to go learn. I want fix this thing”. Or do you want people that have a different culture? So are you testing for those things in your interview process, and are you giving them room to do those kinds of things as well?
So if you say that the quality or reliability are things that you care about, are you giving your teams room to flex those muscles and showcase how they can exemplify that culture? I think all those things matter. I think it’s tooling, it’s prioritization, it’s timing, it’s hiring, and it’s rewarding. All those things come together.
Shane Hastie: That’s a complex and complicated mix, and it’s always going to be a journey, but what are the stages? What are the steps? How do we map out that journey?
Map out the culture journey [14:39]
Ganesh Datta: I think it starts by thinking about what it is you want your company to stand for. What is the type of environment you want to work in? As a founder, I’m lucky in the that I get to shape the organization, and I started the company because I want to build a culture that I really love. And so that was a big part of it. But for any engineering leader, whether they started the company or not, I think it’s saying what are the outcomes we care about, what is the culture that we believe it’s going to help us get there, and what kind of a place do I want to work in? So I think just defining that is a great starting point.
If it’s as a company, we want to build a great product, we want to build a reliable product, we want to be known for our velocity and how responsive we are to our customer, all those things are different.
But then from there it’s like, okay, what does that look like in our day-to-day? And going back to that question around decisions, what kind of decisions are people making and how do we give people that framework? I think starting with those kinds of questions and saying, “How do we structure our process, our team, our practices, our hiring, and our reward structure to really reflect those outcomes that we care about?” Because culture is not static. Culture can be changed.
And sometimes if you’re an engineering leader who’s just come into a new organization, maybe you want to change the culture, maybe the culture has been a certain way and it doesn’t align with what the company needs and what it’s trying to deliver, so maybe you want to change the culture. And so I think that’s why it’s really important to start with what is the outcome you care about? What does that look like in our day-to-day, and how do we exemplify that over time to practices, process, and people?
Shane Hastie: I know that you recently gave a talk on Production Readiness 2.0. What do we mean?
Production readiness as a form of continuous improvement [16:22]
Ganesh Datta: Production readiness is very near and dear to my heart. In my last role, as we moved to a microservices architecture, we’ve realized every time there was an incident, it was impossible to know whether it would be resolved quickly or take a while because the way each team and each service was doing things were so drastically different. And so to just define what production readiness is, it’s in the title of production readiness checklist or whatever you want to call it internally.
It’s is this thing ready to be in production? Does it have the right monitoring and alerting and observability and logging and people and all those kinds of things that if something were to go wrong, can we remediate it, can we mitigate the issue, or can we prevent something from going wrong in the first place? And if the answers to those things are yes, then something is production-ready.
I think for a long time, production-readiness has just been this process that is a checklist or a set of things that you just have to show and get checked off. It’s living in a conference page or a spreadsheet. I personally did it as a spreadsheet in my last job, and you kind of just go through all your services and you say like, “Yep, yep, yep, I’m doing all these things”, and then you pass it and you move on to the next thing, because you have other things to worry about.
But I think there’s a couple of problems with that process, and it kind of goes back to where we’re talking about earlier on engineering excellence, where engineering excellence continuum, production readiness is also a continuum. You may have new services and new software you’re building where you want to hold those things to a higher standard because they’re brand new, but you have a lot of legacy stuff that maybe those standards weren’t even around when those things were shipped in the first place.
So you can’t hold them to the same standard. You want to bring them along with you for the journey, but they were shipped in a different time. So how do you get them to where you are today from a standards perspective? And so I think when I talk about production readiness 2.0, it’s A, about moving away from a checklist into more of an automated codified set of standards where you can kind of create clear visibility into that.
But then more importantly, it’s about treating it as a form of continuous improvement. It’s not just pass-fail or good or bad or binary or one-time thing. It’s this is a spectrum of what good looks like. Here’s where we’re today, here’s where we want to be, and how do we get there and how do we continuously push ourselves there? And I think that’s what’s really important about professional readiness, because you will constantly be learning new things about your production ecosystem, and new standards you want to roll out.
So how do you, in your organization, adopt a culture around production readiness that allows you to flex and grow and bring all your existing software with you to that new level of production readiness? And that’s what I mean by production readiness 2.0.
Shane Hastie: You make the point that this is an incremental journey, that there’s never a nirvana state. How do we stop this being overwhelmingly monotonous?
Ganesh Datta: Yes, I think this is where developer experience comes in, and I’ll talk about that in just a second. So I think, first and foremost, the kind of continuous improvement or incremental improvement in each organization is the place you’re investing in will constantly change. It’s never going to be the same.
An organization may be really focused on reliability this year, and security next year, and efficiency a different year, and then maybe your organization structure changes and you’re really focused on communication and silos. And the type of challenges you’re facing and are trying to improve excellence on are very rarely going to be the same. And so I actually think that immediately cuts a lot of the monotony. Most organizations, there’s variations of this always ongoing at any given time. So I think that’s one thing, but I think the second thing is where a developer experience comes in.
I’m of the belief that developer experience is not an end. Just building a great developer experience is not that important. Developer happiness and satisfaction is important, and I mean it’s well-studied, all the outcomes you get with developers, happiness, including retention and productivity and quality and all those kinds of things.
But again, developer experience is really a means to an end. And that end could be… I’ll take reliability as an example. Let’s say the area that we’re trying to incrementally improve this year is our reliability posture. In particular, we’re trying to improve our MGTR. I’ll tie this back to the platform engineering team. You may start by building out a platform that just allows you to give the SRE team the ability to automated rollbacks or blue-green deployments or canary deployments into the infrastructure. And then you invest in developer experience that makes it very easy to adopt those things for new services.
So now you’re going to go and invest in how do we make it really easy for developers as they spin up new projects to get those capabilities for free from scratch? And so over time, it actually becomes less about, oh, we’re asking people to adopt new processes and new things and we’re trying to push new standards across the organization versus it’s just the way things work. I click a button, it does this thing for me, or it’s built into the platform and not thinking about it. And this has been part of the DevOps mindset for a long time. It’s like if you have to kind of repeat a process over and over again, then you should automate it.
And so platform engineering is the next iteration of that, where continuous improvement is we found an issue or a bottleneck in our engineering process that’s preventing us from achieving the level of excellence we care about right now with reliability or security whatnot. Let’s go and build a platform that enables to solve those things. Let’s roll those things out, now let’s automate it, and now let’s build a developer experience around it where it’s just the way things work and developers will constantly adopt it, and now let’s move on to the next problem.
And so the existing things actually are no longer monotonous because they’ve just been codified into your practice through developer experience. So that’s how I think about where developer experience and continuous improvement, all these things kind of fit together over time.
Shane Hastie: Another trend that we’re seeing at InfoQ and in QCon is the emergence and importance of green tech, of being conscious of sustainability and so forth in our engineering organizations today. Are you seeing much of that, and how do we address it?
Sustainability as engineering excellence [22:29]
Ganesh Datta: There’s definitely been an increased focus on it. The way we like to think about that is it’s yet another aspect of excellence. At the end of the day, you can’t necessarily treat it as its own thing versus it’s an aspect of our Azure organization that we want to excel at. And if you treat it that way, then that bleeds down into your culture, your platform, your tooling, and so on and so forth.
The same way you wouldn’t ask your developers to become database expert. I think we’ve moved away from that world and we say, “Hey, we’re going to build a platform that makes it easy for our developers to build on top of common data tooling, and we’ll kind of abstract away the requirement to become a data expert from our developers because our platform handles that for you”. I think sustainability is the same way.
Expecting the entire organization to adopt a sustainability mindset is difficult. It’s part of the equation for sure. I mean, we see this with cost and efficiency today where you’re trying to create this ability for developers around the cost of their services and the infrastructure so that you can take more ownership of it, but you kind of need both aspects where the platform takes into account the cost efficiency you’re trying to drive, plus you give developers visibility and that kind of top-down, bottoms-up approach of platform enablement plus visibility drives the outcome you care about.
And so I, sustainability is the same thing where wherever you can, you want to codify those practices into the way you operate as an organization, but then give developers visibility to that too. And so if there are practices you want to adopt there, whether it is around… What’s funny is I think a lot of our impact as developers is really around energy consumption from a sustainability perspective, especially in the modern remote world like travel and things like that are less important. I think it’s our energy impact that is most top-of-mind.
In some ways, the current focus on cost has a secondary benefit on reduced energy consumption because you’re removing the amount of infrastructure you’re consuming. So it’s been, I guess, an interesting second-order effect of it. But I think fundamentally, my take on it, it has to be part of the platform, it has to be something that’s built into your organization’s outcomes and you drive visibility, and that’s the way you drive progress on any of these outcomes, including sustainability practices.
Shane Hastie: A fair number of our audience are technologists who are moving into a leadership role for the first time, or they’re early in their leadership careers and they want to move forward. You’ve been through that path. What advice would you give to the younger you?
Advice for new leaders [25:01]
Ganesh Datta: I would say I still have a long way to go as well, and I think that’s probably the number one piece of feedback I give folks is never lose your self-awareness. I think every single leader at every stage has never done that thing before. And it doesn’t matter how experienced you are. If you’ve been doing it for five years, you are maybe running an organization that didn’t look like it did six months ago, and you’re doing that for the first time.
Every leader is constantly faced with something new. And so being very self-aware is important, because we’re always getting better. There’s always something you can get better at. And I think it’s particularly important, especially for folks who are moving from more of an IC role to a leadership role. Sometimes, you are going from a pure IC role to more of a staff or distinguished-level role where a lot of people work as a part of your IC role, or sometimes some folks move directly into management.
And I think it’s really important to understand that being successful as a people and a technical leader is very different from being successful as just a technical leader. And especially for new technical leaders, I would say learn storytelling. A lot of what you do as a leader is tell stories, and telling stories is not a bad thing, but storytelling is really how do you bring people along with you and help them see what you see?
Because as a leader, you’ll have contacts that people that you work with don’t have, or you may be having to make decisions that everyone doesn’t agree with initially, or where the organization is going or the future state. And so can you tell a story for those kinds of things, whether it’s, “Hey, here’s why we’re making this for a technical decision, and here’s the current state and the future state”, or, “Hey, here’s a product that we’re really excited about, and I know there’s a tight deadline around it, but here’s the impact that has on our customers and why it’s so important, and here’s what we’re going to do about it on the other side”.
Those are all stories we’re telling, and so having strong storytelling skills is really important, both for managing down and managing up, because your managers or your leaders will ask you, “Hey, what’s going on with this project?” Or, “Hey, what’s our plan for the next six months?” You’re telling the story.
So storytelling is really important, being self-aware is really important, and understanding that people management is different is also really important. And I say storytelling because I know as an engineer I would say like, “Oh, I don’t want to work in a political company”. But I think when people are involved, there’s always an element of, quote-unquote, politics, but politics is really about we’re all trying to deliver for the business, and how do we do well, do that together?
So how do I make sure that people understand what I’m trying to achieve, how do I understand what other people are trying to achieve, and how do we come together and deliver something for the business? And so don’t shy away from those kinds of conversations. Don’t shy away from having tough conversations with your peers and your stakeholders and your managers around things you’re trying to build and ship. I know that there’s a lot of things, but those are the lessons that I’ve found to be the most impactful. And always stay learning. I think that’s the most, most important thing.
Shane Hastie: Ganesh, there’s a lot of good advice and points to dig into there. If people want to continue the conversation, where do they find you?
Ganesh Datta: The easiest place would be add me on LinkedIn.
Shane Hastie: Thank you so much for taking the time to talk to us today.
Ganesh Datta: Thank you so much for having me. Really enjoyed it.
Mentioned:
.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.