MMS • Sam Newman
Article originally posted on InfoQ. Visit InfoQ
Transcript
Newman: Welcome to this virtual presentation of me talking about the pitfalls associated with getting the most out of microservices, and the cloud, or cloud native microservices, as we now call them. I’ve written books about microservices. You probably don’t care about that, because you’re all here to hear about interesting pitfalls, and also some tips about how to avoid the nasty problems associated with adopting microservices on the cloud. Also, some guidance about the good things that you can do to get the most out of what should be a fantastic combination. This combination of cloud native and microservices. Microservices which offer us so many possibilities. Many of you are making use of microservices to enable more autonomous teams able to operate without too much coordination and contention. Teams that own their own resources can make decisions about when they want to release a new version of their software. Reducing the release coordination in this way helps you go faster. It can improve your sense of autonomy, in turn has benefits in terms of motivation, and employee morale, even. This allows us to ship software, get it into the hands of the users more quickly. Get fast feedback about whether or not these things are working. These models nowadays are shifting towards a world where we’re looking for our teams to have full ownership of their IT assets: you build it, you run it, you ship it. It’s amazing how developers that have to also support their own software end up creating software that tends to page the support team much less frequently.
Why Have Cloud Native and Microservices Not Delivered?
Cloud native and microservices should have been a wonderful combination. It should have been brilliant. It should have been like Ant and Dec, or Laurel and Hardy, or Crypto and Bros. Instead, it’s ended up being much more like Boris and Brexit. Something that’s been really well marketed, but unfortunately has failed to deliver on some of its promises, has ended up costing us a lot more money than we thought, and left many of us feeling quite sick. Why has cloud native and microservices often not delivered on some of the promises that we might have for them? Many of these big enterprise initiatives you may have been involved with around these transformations have somehow failed to deliver. Part of this, I think is about confusion. Let’s look at the CNCF landscape, and it’s easy to poke a degree and follow this. This is a sign of success, really. This is all the different types of tools and product in the different sectors that the CNCF manages. This is a sign of success, but it’s a bewildering sign of success. How do you navigate this to pick out which pieces of the big CNCF toolkit you could use? It’s ok because help is at hand that is through an interactive website now where you can navigate this space and try and find the thing you want. Although whenever I read this disclaimer, I have some other view in my head. Namely, this is going to cost me a lot of money, isn’t it? In terms of money, the CNCF landscape also talks about money quite a bit. It talks about how much money has been pumped into this sector. If we just look at the funding of $26 billion, where does all that money go? Obviously, a lot of that money goes into creating excellent and awesome products. A lot of that money also goes into marketing, which leads to confusion, dilution of terms, misunderstanding of core concepts, and increased confusion.
Undifferentiated Heavy Lifting – AWS (2006)
Back in 2006, AWS launched in a way, which was interesting. At the time, we didn’t really realize what was going to come, because at this point, we were dealing with our own physical infrastructure, we were having to rack and cable things ourselves. Along came AWS and said, no, we can do that for you. You can just rent out some machines. When you actually spoke to the people behind AWS about why they were doing this and why they thought it was beneficial, this term, undifferentiated heavy lifting would come back again. This describes how Amazon thought about their world internally. We want autonomous teams to be able to focus on the delivery of their product, and we don’t want them to have to deal with busy work. It’s still heavy work, the heavy lifting, but your ability to do that work to rack up servers doesn’t differentiate what you’re doing compared to anybody else. Can we create something whereby you can offload a lot of that heavy lifting to other people that can do it better than you can? If you think about what AWS gave us, it gave us, hopefully, these capabilities, in the same way that Google Cloud does, or Azure, or a whole suite of other public cloud providers.
APIs
Really, when you distill down the important part of a lot of this, it was around the APIs they gave us. A few years ago, I went up to the north of England. It’s not necessarily quite as cold and hospitable as this, but it definitely has better beer than the south. I was chatting to some developers at a company up there. I was doing a bit of a research visit, which I do every now and then. I was talking to them about what improvements they wanted to see happening in terms of the development environment that would make them more productive. This was just, I’m interested in what developers want. Also, I was going to feed some information back to CIOs, so they could get a sense of key things they could do to improve. These developers said, what we want is we’d love access to the cloud, we’d like to get onto the cloud, so that we don’t have to keep sending tickets. I went and spoke to the CIO, and the CIO says, we already have the cloud. I went back to the developers, I said, developers, you already have the cloud. They were confused by this.
I dug a bit deeper, and it turned out that this company had embraced the cloud in a rather interesting manner. What happens is, a developer, if they wanted to access a cloud resource, or spin up a VM, what they would actually do would be to raise a ticket using Jira. That Jira ticket would be picked up by someone in a sysadmin team, who would then go and click some buttons on the AWS console, and then send you back the details of your VM. Although the sysadmin side of this organization had adopted the public cloud for some value of the word adopted, from the point of view of the developers, nothing had changed. This is an odd thing to do. Using public cloud services, which give us such great ability around self-service, it is fashion. It’s really bizarre. It’s like putting a wheel clamp on a hypercar. You could do it. Should you really do it?
Why Do Private Clouds Fail?
I’m a great fan of cherry picking statistics and surveys that confirm my own biases. I was very glad to find this survey done by Gartner at their data center conference several years ago. This is from people who went to a data center conference run by Gartner. That’s already an interesting subsection of the overall IT industry. They were saying, what is going wrong with your private clouds? Why are your private clouds installs, are they working, are they’re not working? They find out only 5% of people thought it was going quite well. Most people found significant issues with how they were implementing private cloud. Really, interestingly, the biggest focus amongst people who went to a data center conference run by Gartner was accepting that they’d been focusing on the wrong things. That by implementing a private cloud, they had been focusing on cost savings, rather than focusing on agility. Also, as part of that, not changing anything around how they operate, or how the funding models work. Thinking back to that previous example, this seemed to tie up. You adopted the public cloud, but didn’t really want to change any of the behaviors around it.
Pitfall 1: Not Enabling Self-Service
This leads to our very first pitfall around this whole story, and that’s not enabling self-service. If you want to create an organization, where teams have more autonomy, have the ability to think and execute faster, you need to give them the ability to self-service provision things like the infrastructure they’re going to run on. You need to empower teams to make the decisions and to get things done. AWS’s killer feature wasn’t per hour rented managed virtual machines, although they’re pretty good. It was this concept of self-service. A democratized access via an API. It wasn’t about rental, it was about empowerment. You only saw that benefit if you really changed your operating models.
Top Tip: Trust Your People
A lot of the reasons I think that people don’t do this and don’t allow for self-service really comes down to a simple issue, you need to trust your people. This is difficult, because for many of us, we’ve come from a more traditional IT structure, where you have very siloed organizations who have very small defined roles. You had to live in your box, the QA did the QA things, the DBA did the DBA things, the sysadmins did the sysadmin things. Then there’d be some yawning chasm of distrust over to the business who we were creating the software for. In this world of siloed roles and responsibilities, the idea of giving people more power is a bit odd. It doesn’t really fit. We’re moving away from this world. We’re breaking down these silos. We’re moving to more poly-skill teams that can have full ownership over the software we’ve delivered.
Stream Aligned Teams
I urge all of you to go read the book, “Team Topologies,” which is giving us some terms and the vocabulary around these new types of organizations in the context of IT delivery. In the “Team Topologies” book, they talk about these ideas of stream aligned teams. Instead of thinking about short-lived project teams, you create a long-lived product oriented team who owns part of your domain. They are focused on delivering a valuable stream of work. Their work cuts across all those traditional layers of IT delivery. They’re not thinking about data in isolation from functionality, or thinking about a separate UI from the backend, you’re thinking holistically about the end-to-end delivery of functionality to the users of your software. This long-lived view is really important, because you get domain expertise, in terms of what you own. This is why microservices can be so valuable, because you can say, these microservices are owned by these different streams. That strong sense of ownership is also what gives you the ability to make decisions and execute without having to constantly coordinate with loads of other teams.
The issue, obviously, with this model is that there is lots of other things that need to happen. There’s other work that needs to be done to help these teams do their jobs. Traditionally, those kinds of roles would be done by separate siloed parts of the organization, there were often a functional handoff. I had a need for a security review. I need a separate Ops team to provision my test environment or deploy into production. That work still needs to be done, and we can’t expect these teams to do all that work as well. Where does all that extra work go?
Enabling Teams
This is where the “Team Topologies” authors introduced the idea of enabling teams. You’ve got your stream aligned teams, and they’re your main path to focusing on delivery of functionality. What we need to do is to create teams that are going to support these stream aligned teams in doing their jobs, so we might have a cross-cutting architecture function, maybe frontend design, maybe security, or maybe the platform. These enabling teams exist to help the stream aligned teams do their job. We want to do whatever we can to remove impediments to help them get things done. This isn’t about creating barriers or silos. It’s about enablement. At this point, many of you who have got microservices are sitting there face nothing going, but that’s ok because I’ve got a platform.
Amazon Web Services (2009)
Let’s go back in time a bit further. Back in 2009, I found myself accidentally involved in helping create the first ever training courses for AWS. At this point, AWS itself were just saying, here’s our stuff, use it. They weren’t putting any effort into helping you use these tools well. I remember myself and Nick Hines, a colleague at the time, we’re having a chat with them. The AWS view was, “We’re like a utility. We just sell electricity.” Nick turned to them and said, “Yes, that’s all well and good, but your customers are constantly getting electrocuted.” Without good guidance about how to use these products well, you can end up making mistakes, and you may then end up blaming the tool itself. It’s amazing that still in 2022, I meet people that run entirely out of one availability zone, for example. To be fair to AWS, they spotted this gap and have plugged it very well. If you go along to the training certification section that they run, and this is the same story for GCP, and Azure as well, you’ll see a massive ecosystem of people able to give you training on how to use these tools well. Because all of these vendors recognize that without that training and guidance, you won’t get the best out of them.
Pitfall 2: Not Helping People Use the Tools Well
This is a lesson that we need to learn for our own tools that we provide to our developers. Having cool tools is not enough. You’ve got to help people use them. This is another common pitfall I see, people bring in all these tools and technology, here is some Kubernetes, and here is some Prometheus. Unless you’re doing some work to help people get the most out of those tools, you’re not really enabling them. If you’re somebody working on the platform team, your job isn’t just to build the platform. It’s actually to create something that enables other people to do their job. Are you spending time working with the teams using your platform? Are you listening to their needs? Are you giving them training when they need it? Are you actually spending time embedded with them on day-to-day delivery to understand where the friction points are? Because if you’re not doing these things, the platform can end up just becoming another silo.
Top Tip: Treat Your Microservices Platform like a Product
This leads me to my next tip, you should treat your platform like a product. Any good product that you create is going to involve outreach, is going to involve chatting to your customers, understanding what they need and what they want. This is the same thing with a platform. Inside an organization, talk to your developers. Talk to your testers. Talk to your security people. What is it that you need to do in whatever platform you deliver to help them do their job? This is all about creating a good developer experience. Although maybe the term developer experience should be delivery experience, because obviously, there are many more stakeholders than just the developers. Yes, that’s right, developers, there are people other than you out there. Think about that delivery experience if you’re the person helping drive the development of the platform team. I actually think this is a great place to have a proper full time product owner. Have a person who has got product management experience or product owner experience. Have them head that team up, and drive your roadmap based on the needs of your customers.
If you are the person providing that platform into your organization, it is your job to navigate this mess. Again, I don’t mean to beat up the CNCF at all. Absolutely not. This is a sign of success, but it is bewildering, trying to navigate this world. If we go and look at the public cloud vendors, we also see a huge array of new products and services being released all the time. AWS is in many ways the worst culprit. Again, a sign of success. It’s interesting that the easiest way to keep up with all the different things that AWS are launching is actually go to a third party site. This particular site tracks the various different service offerings that are out there. I screencaped this probably about three months ago. The number of products offered by AWS is constantly growing. As of 2021, they had as many as 285 individual services. That’s 285 individual services, many of which overlap with each other. Each of those services can have a vast array of different features. How are you supposed to navigate this landscape?
Top Tip: It’s Ok To Provide a Curated Experience
There is this idea that if I’m going to allow you to use a public cloud provider, like Azure, GCP, or AWS, and I want to let you do that in a self-service fashion, that I should just throw you in at the deep end and say, just go for it. No, you should train your support people in using those tools well. It is also true that it’s ok to provide a curated experience. If you’re somebody working as the experts in that platform team who’s providing these services to the people building your microservices, you are the person who should be responsible for helping them navigate this landscape and curating the right platform for you. If you’re going to build your own private cloud, it’s something I don’t tend to advise you to do, you might start with Kubernetes as your core. Then you’re going to have to pick the right pieces to create the right platform for your team, for your organization.
Governance
Another potential issue around the platform, though, comes in the shape of governance. Governance often gets a bad name. Partly I feel not because of what governance is, but because of how governance is often implemented. We see governance as a barrier. On the face of it, governance is an entirely acceptable and sensible thing that we should be doing. Governance is, simply put, deciding on how things should be done, and making sure that they are done. I think this is totally fine. In many small teams, you’re doing governance without ever realizing it. In a large scale organization, you have different governance that needs to take place. This is all governance is. How should things be done, and making sure that they are done. The problem is that when people start seeing, especially people who are more familiar to more centralized command and control models, who are being dragged kicking and screaming into the world of independent autonomous teams, and they’re like, how do I use my skills to operate in this environment? Then they see the platform, they go, I can use the platform to decide what people can do.
Pitfall 3: Trying To Implement Governance through Tooling
This leads to another pitfall. Because if you say, what we’re going to do, is we’re going to basically use the platform as a way of enforcing what people do. We’re going to limit what you do. We’re going to stop you doing things. We’re going to do that through the platform, through the tools that we give you. The issue with that mindset is the moment you say that, the next thing follows, which is you say, because if you use the platform, we know you’re doing the right thing, therefore, you must use the platform. Because if we enforce that you use the platform, we know you’ll be doing the right thing, and my job is done. This is another real problem with people who are adopting platforms. Enforcement of tools and its function really undermines the whole mindset around this. If you force people to use a platform, it’s not about enablement, it’s about control. Here’s the reality. If you make it hard for people to do their jobs, people are either going to bypass your controls into the world of shadow IT, or they’re going to leave. There are also the other people that will just put up with it. Often, those people have already been beaten down by other problems and issues in the organization.
I remember Werner Vogels telling a story many years ago. He was going into these fortune 500 companies, early days of AWS, and they were trying to encourage these big U.S. companies to take AWS seriously. The CIOs would say things like, we’re never going to use a bookseller to run our compute. He said, “You already are.” He put out a list of all the people that worked at that company that were already billing AWS, paying for things on their corporate credit cards. People used AWS as a way of getting their job done without having to go through the traditional corporate controls. A lot of people are horrified by this idea. This is what we now call shadow IT. IT that isn’t centrally provisioned and isn’t centrally managed. Shadow IT is massive now and is growing. People want to get their jobs done, they find a way to get their job done. It’s almost like a logical manifestation of the world of SaaS. SaaS has made it so easy to provision software services that people are cutting out the middlemen.
In an environment like this, you can try and use a platform, so you’ve got to use a platform, and this is a way of stopping you doing stuff. Is it really going to stop people anyway? Because here’s the thing, those people that actually bypass your controls to get the job done, they’re motivated. They’ve gone out of their way to find a better solution to solve the problems that they’ve got, because they want to do the job. They’re motivated. These are the people you probably want to promote, or at the very least listen to and help. You don’t want to sideline them or make it so difficult for them to do their job and become so demotivated, they just go somewhere else. As part of governance, it’s about talking about what should be done and making sure it is done. Part of that is being clear and communicating as to why you’re doing things in a certain way. If you explain the why you want things done in a certain way, it’s going to be much easier for people to make the right decisions. It’s also completely appropriate for you to make it as easy as possible to do the right thing.
Top Tip: Provide a Paved Road
I love this metaphor of the paved road. Creating an experience for your delivery teams that makes doing the right thing as easy as possible. We’ve laid a path out in front of you, if you just do these things, everything’s going to be absolutely fine and rosy. If you realize that the path isn’t quite right for you, you can head off into the woods yourself, but you’re going to have to do a bunch of work yourself. You’re still obligated to follow what should be done. You’re still accountable for the work you do, but you’re going to be a bit more on your own. On the other hand, the paved road is all going to be gravy, because in many situations, you can justify going into the woods in niche situations, and that’s ok.
Provide that paved road. Create an experience that makes it easy for people to do the right thing. If you make it easy for people to do the right thing and explain why those things are done in a certain way, you’ll also find that when people do need to go off that beaten path, they can be much more aware of what they’re doing and how that fits in. When you identify that people aren’t using your paved road, that becomes feedback back into your platform. What was it about what we gave them that didn’t help them do their job? Why did they go to this third party service? Is that actually something we’re not bothered about, and actually, their situation is niche enough that we don’t have to worry? Or does that speak to a gap in what we’re offering our customers? Our customers in this case being our fellow developers, and testers, and sysadmins, and everything else.
There are some great examples of companies doing things like this, creating not only a paved road, but also something that doubles as an educational tool. Sarah Wells has spoken before about the use of BizOps inside the “Financial Times,” which on the face of it looks almost like a service registry, but it goes further than that. It talks about certain levels of criticality, and what microservices and other types of software products need to deliver to reach those levels of criticality. If you want to be a platinum service you have to do this and this. A lot of those checks are automated. There’s links you can go to, to find out how to solve those problems. At a glance, you can see what you should be doing, what you are doing, and get information about how to address those discrepancies. This isn’t the big stick. This is the paved road with guidance, with a map.
Top Tip: Make the Platform Optional
This leads us on to maybe one of the more controversial tips that I’m going to give you. This is actually a piece of advice that comes straight from the “Team Topologies” book. Many of you who run a platform might be quite worried by this idea, but it’s this, you should make the platform you give to your microservice teams, optional. This is scary. Why would I make it optional? Partly, because of reasons I’ve already talked about. We don’t want to just put arbitrary barriers in front of people. If we really want to create independent, autonomous teams that are focused on delivering their functionality, they might have real needs that aren’t met by the platform. If we force them to use the platform, and all aspects of the platform, we’re actually effectively undermining their ability to make decisions that are best for them. That’s part of it. Absolutely.
There’s another thing which is a little bit more insidious. If you make the platform optional, it means that the owners of the platform are going to be focused on ease of use. If everyone has to use a platform, and that’s mandated, it’s very easy for the platform team to stop caring about it, in terms of what it’s like to use about that delivery experience. Whereas if the platform is optional, then one of the key things that’s going to be driving how successful your platform is viewed is how many people are using it. By making it optional, you will go out of your way to make sure that it’s easy to use and easy to adopt. It triggers you into doing that outreach, aside from also, of course, enabling this self-service where it’s warranted.
Summary (Pitfalls)
The big pitfall we started off with was giving people these awesome tools, these awesome platforms without enabling self-service. Once you’ve given people these tools that maybe do allow for self-service, it’s not helping people use those tools well. Then, finally, we talked about the challenges of trying to implement governance or enforcement around governance through the tooling and all the pitfalls associated with that.
Summary (Top Tips)
Firstly, and really importantly, you’ve got to understand to trust your people. This might be difficult, but fundamentally, this is where a lot of the journey starts from, trust your people. I didn’t say verify. Verification is also useful. You should start with trusting. You need to treat your platform like a product. You need to treat the people that use that product like users. Understand what they want, do the outreach. As part of that platform, it is ok to deliver a curated experience. Make it easy for people to navigate their world. This lines up really nicely with this idea of the paved road. The paved road helps deliver the things that people need most of the time. Finally, but maybe controversially, make the platform optional. Making the platform optional signals that it is ok to use alternative products where warranted. Also, it makes sure the team that builds a platform is going to be focused on making that platform as easy to use as possible.
If we need to distill all of it down, when I’ve looked back at these different companies I’ve worked at and people I’ve chatted to, it still feels that so many people are using the cloud without really using it. Many of us have bought the hypercar and stuck the wheel clamp on it. Taking that wheel clamp off all starts with trusting your people.
Resources
There’s more information about what I do over at my website, https://samnewman.io, including information about my latest book, the second edition of, “Building Microservices.”
See more presentations with transcripts