Presentation: Overcomplicated Architecture: Scaling Bottleneck

MMS Founder
MMS Cassandra Shum

Article originally posted on InfoQ. Visit InfoQ

Transcript

Shum: I will actually be talking about, how do we get into this actual scaling bottleneck? What is this scaling bottleneck? What are the things that we need to think about when we’re looking at our overcomplicated architectures? My name is Cassie Shum. I have been a technologist for the past 15-plus years. In those years, I’ve actually seen quite a few organizations, both enterprises, all the way to startups. Most of my talk will be focused a bit on somewhere in the middle, the digital scaleups. I’ll explain a little bit of what I mean and why this matters, in terms of scaling bottlenecks.

Outline

The agenda that we have is really around, what are the scaling bottlenecks that we discuss? Then focusing and zooming in on one of the major scaling bottleneck, which is overcomplicated architectures. How did we get here? What are the common patterns that we’ve seen in organizations getting to the state? What are some of the warning signs of an overcomplicated architecture? How do we get ahead of it, and understand, what are some of the signs? Then, of course, more importantly, how do we get out of some of these overcomplicated architectures, or how do we begin that journey? Then we will wrap up with a nice, little summary and some questions.

What Are Scaling Bottlenecks?

What are scaling bottlenecks? One of the things that my old company, Thoughtworks, has been doing for the past many years, is really looking at the digital scaleup. This initiative led by Timothy Cochran really looks at the whole journey of how organizations come to light. Essentially, one of the things that we have seen and observed over the years is some of these organizations go through what we call different phases. We’ve categorized this in four different phases here. Phase one, this is where you’re experimenting with the product. Phase two is when you actually start seeing that traction in the market, and there’s a lot of interesting things that people are excited about. Where are these bottlenecks and where we’re looking at is where people and organizations are trying to grow and have this hyper-growth state. This is where we’re chatting about in phase three. Then, of course, getting into more of a steady state is around phase four and optimizing for this type of product. In this inflection point here, this is actually what we’ll be talking about in some of the common technology bottlenecks. I’m referring to what we call the bottlenecks of scaleups, which is a series that is now published on Martin Fowler’s blog, https://www.martinfowler.com/articles/bottlenecks-of-scaleups/. There are a few bottleneck articles out there already.

Through all of these experimentations and experiences that we’re working with all these different companies that are scaling up, there has been identified some common bottlenecks. Here are some of the bottlenecks. You’ll actually see the first three that are in Martin’s blog at the moment. Then, of course, there’s a few more towards the end, like cost of serving traffic is too high, not being able to find that next product evolution, and looking at the internal team partner dependencies, legacy technology, and developer productivity. These are some of the up-and-coming articles that will touch upon some of the technical bottlenecks.

Scaling Bottleneck: Overcomplicated Architectures

However, we will be talking about overcomplicated architectures as it helps scaling organization. Let’s dig into my topic: overcomplicated architectures. A few years ago, I did a few talks around microservices, the pros and cons. You can see here with my wonderful colleague, Rachel Laycock, we actually talked a lot about what are the pros and cons of microservices. What are the do’s and don’ts? What are the things to watch out for? In those talks, we talked of course about the big ball of mud. Essentially, at the time that we were talking about this, he was talking about, how do you move from a monolith, your big ball of mud of monolith to microservices? How do you actually streamline these things to really take advantage of how microservices actually can benefit you. In that talk, and in a few talks that I did there, we actually looked at each other thinking, if we don’t do this right, if organizations don’t do microservices right, we might get into some trouble in future. That was back in 2016.

Let’s look now in 2022, and some of the things that we’re seeing. Similar to the big monolithic, big ball of mud, now I am starting to find a lot of different organizations in a distributed big ball of mud. Essentially, we moved all of the different issues and problems that we had from the monolith. Instead of looking at some of these foundational things that we needed to address, we’re finding a lot of organizations just rushed to the microservices age, and broke everything up. What we can actually see here is we broke everything up into more of a distributed big ball of mud, a lot of different interdependencies now with all of these different services and things. In a way, it’s gotten a little bit worse, because back in the day with your big monolith, yes, that was slow to move, that was hard to deploy, that was hard to make changes in. I think now that we’re looking at what we’re calling the distributed big ball of mud, is a lot more overhead in terms of communications and interdependencies between all of these different microservices. A lot of what I’m going to get into is, did we actually get into some of the foundations of understanding how to optimize for these microservices? That’s the thing that we’re seeing in some of these overcomplicated architectures, and where we’ve missed the boat a bit as we moved a lot of our architecture into this big ball of mud.

How Did We Get Here?

The big question here is, how do we get here? I’m sure there are a few other reasons as to why organizations get into this big of an issue. One of the things I wanted to focus on is the two key things that I’ve noticed as patterns. One of the patterns is all of the buzzwords in the world. One of the things that I’ve actually seen every time I’ve gotten into different organizations is, we get very excited, and I am excited. I’m a technologist who loves shiny objects. I look at event-driven architectures, CQRS, Kubernetes all the way, microservices, multi-cloud. I love all of these technologies. I’ve used them in different instances, in different areas for different things. I’m not saying these things are bad. What I am looking at is, people get a little bit overly excited about all the things. One of the things that I’ve seen, and I call is, buzzword-driven architecture. This is a bit of an interesting way, because, for me, when I look at different organizations who are starting off, or scaling up, or trying to figure out all these things, I find myself advising them to, slow down, what’s going on? Because now we’re just saying all the buzzwords and we need everything in one thing. My advice is not to use these wonderful architectures and technologies, but to take a step back and understand why these things came about. Are they applicable for your product and applicable for your architecture? That’s one of the patterns that I’ve seen. This is one of my favorite cartoons that I’ve seen off of Pat Kua’s blog, called, “The Builder’s Trap.” One of the things that gets us into this buzzword-driven architecture is really, as engineers and as software developers, we actually tend to overestimate our own ability to write code. We look at CQRS or event-driven architectures, or how do we do microservices, and say, “I think that’s actually quite simple. We can figure this out.” How hard could this be, is a catchphrase that comes up all the time. One of the things that we’ve seen is being able to introduce all these different technologies, and things, and buzzwords. We don’t actually recognize that hidden complexity in the beginning, that all starts appearing at the end when things get very hard and very complicated.

The other reason why we really love overdoing it on the architecture and bringing you all the shiny new toys is, it’s our inherent desire to learn while we try to solve problems. I remember getting into this industry as a software engineer. I was excited to be able to solve this very hard problem, or this algorithm, or this thing. We get very excited and we over-index on all of these new tools, and how do I learn how to spin up a local Kubernetes cluster? As opposed to, is that what this product actually needs? Is this a decision that we should actually make for this product? This is the thing that we end up in this builder’s trap, and using all of these wonderful technologies, and all of a sudden, a few years later, we are in this very overcomplicated architecture, because we’ve pulled in a lot of things that we may not have actually needed. Now we have to maintain all the decisions that we made, as we were putting this together. That’s the first pattern that I’ve seen.

The second pattern is actually quite related to that. It is what I would call, on the reverse, not, we want to put these ridiculous architectures in. From a digital scaleup point of view, it’s more when you have an accidental architecture, or accidental, overcomplicated architecture, driven by many local decisions. In this cartoon, this is an example, a very famous cartoon about how as one software engineer, I can have a clean slate, I can build all of these different things much like the builder’s trap. Then I will say that most of these decisions are actually made more in the context of your own team or your own code base. Once you actually start multiplying that to multiple teams and multiple different people making decisions, imagine the accidental architecture that could come about if there is really no standardization, everybody’s doing their own thing wanting to actually build their own design and all of these things.

Now you have all these, again, accidental architecture and accidental design, that has been a bit more reactive, so being able to make decisions based off the context in front of you, versus proactive, which is actually looking at more of the bigger context. Where does my piece fall into the bigger picture? Without that proactive mentality, we often find ourselves in these, what we call accidental architectures that just completely get overcomplicated, without you seeing it. This is a slow burn. This is something that happens over time. All of a sudden, you see a lot of different frameworks, different languages, because this team over here really liked to do things in Rust but this one over here likes to do things in Go. All of a sudden, the different design patterns and things like that tend to start spurring up, and then all of a sudden, you’re in this very overcomplicated state. Something to think about, like when you’re actually looking back at as an individual driving out some of these decisions. One of the things that we have seen is everybody likes to make those decisions in isolation and living in their own context. Looking at that bigger picture is going to be very important as a step going forward. Those are the two different patterns that I’ve seen more commonly. I’m sure there are more, but these are the main ones that come about. One, again, is really around overarchitecting and really coming up with very complicated things, and introducing all the things to it. Then the second one is this accidental one that is more of a reactive type of approach. A lot of the warning signs of these overcomplicated architectures are actually quite similar between the two. One thing I would urge everyone to look at is, how did you get there, and does this resonate with anyone?

What Are the Warning Signs of An Overcomplicated Architecture?

Let’s move on to, what are the warning signs of some of these overcomplicated architectures? When I say warning signs, and this is going back to the bottleneck series on Martin’s blog. This is a template that we’re actually following because warning signs is really around, it’s ok to pragmatically incur tech debt as a good example, or it’s ok that you’re making certain decisions. What we want to look at is some of these signs that says, these are signs, and if you’re starting to go down this road, it’s about being aware and understanding that, are we starting to get into an overcomplicated architecture? Is this something we pragmatically want to defer, but know that we need to get ahead of this, once things happen? This is the section around looking at some of these warning signs.

I think the first warning sign, going back to what I alluded to earlier, is looking at how the traditional monolith, as we’ve seen before, has many jumps to different things. In this picture to the left, I tend to talk about making different method calls within your monolith, making calls to different classes in order to get information to do all the things. If you think about the user talking to this monolith and say, how do I actually get a color to all of a certain SKU in a retail domain, in a retail monolith? In this monolith, you can see that they’ll probably call a catalog class, or go into the inventory, or understand what the item services, or look at the attribute. You can see the different calls that they’re trying to make, all of course, speaking to one datastore. Then bringing that response back to the user. That’s how we traditionally have been doing things in the past. One of the things and warning signs that I talk about is around that distributed monolith. Have you literally taken those same hops, those same method calls and split them up into different services and have many different calls in order to drive those different services? If so, this is a warning sign. One of the things that I ask about in a lot of my different clients is, for a simple call to get that color of the SKU, or go get an information based off of a person. How many calls are you actually making through many different microservices in order to get that? The reason I ask this is because, a lot of times, the answer is I don’t know. The answer is, I think there’s a few microservices involved, but we have to go check. I think this is very important for us to be a lot more aware of and understanding when we design some of these things. It’s about looking at the latency cost. Because now once you move from that monolith to now a distributed monolith, there’s much higher incurred cost in terms of jumping from one thing to another, and being able to get those responses, and then take that, times many different calls, and now we have a little bit of a scary distributed monolith. I’ll jump into some of the things that we see from that scary distributed monolith.

Some of the symptoms or some of the signs that we see is, as an engineer, any small change to the application logic requires concurrent changes to many other services. One of the big things that we talked about microservices is being able to have these bounded contexts in a certain domain, and within that domain, being able to understand the wants and needs and all of those things. Being able to isolate some of these things, so you can actually have a smaller, more independent, deployable artifact. If you see a pattern of your engineering teams trying to make one small change or one feature change, and you have to touch 20 microservices or 10 microservices, we may have a bit of a smell there. That’s one warning sign that you have a distributed monolith. The second one that I look at is really around God objects. We see this in monoliths as well. This is not just a microservices problem. We can also look at, did you actually architect your microservices correctly if you have single architecture structures that contain multiple details from multiple domains, or is this a thing where many domains actually depend upon? This talks about the efferent and afferent coupling that you have to certain things. The design principles that we talked about back in the monolithic days still pertain to the microservices age, it’s just more expensive now if there’s a lot more dependencies on these microservices.

Just looking back at the last few things, the need to deploy these microservices at once is probably a sign that there are multiple dependencies across all of these microservices. This usually has led to the no clear boundaries between groups and lacks of services. This is a problem that we see a lot because I think a lot of organizations rush to split everything up very quickly. There’s not a lot of thought or there’s little thought into what the actual domains and what the clear boundaries actually are. We’ll talk about that in some of the solutions as well. I see this antipattern a lot, when we’re actually looking at people splitting up microservices. They like to split up the application layer and split up the different services, but they’re still talking to the same datastore, as you can see here. This is another warning sign of a distributed monolith is that you’re still talking to the same datastore, so how are you actually looking at refactoring and breaking those things up appropriately? If you’re having a hard time doing that, then go back to what I said before, looking at those clear boundaries. Then, some of the other warning signs is really around, what are the number of parameters that are passed to multiple microservices? It continually grows over time because you have these interdependencies around all the microservices. Then, of course, by nature of that, you’re going to probably have a lot of shared common boilerplate code within each microservice, so how do we address that? These are some of the signs that we look at when we’re looking at a distributed monolith.

One of the other things that is a sign of the distributed monolith is the lack of, and this is something I warned about when going into microservices, please look at observability and monitoring. I think a lot of people actually are logging and putting tracing and all of these kinds of things, but it may not be as useful. Some of the warning signs I urge everyone to look at is, look at your teams today. Are debugging issues, productions, and bugs, and everything, does that take a very long time? Because if it does, your tracing, your logging, your monitoring may not actually be correct or may not actually be useful. If you look at your architecture, if you look at the services that you have right now, are there orphan services in code that exists without you guys knowing it? If that’s happening, that’s definitely a warning sign that your logging, your monitoring is either not useful, or you don’t have any. The last thing, of course, related to the second one is the inability to understand the tech sprawl in your organization, in your architecture. If you actually can’t look at it and visualize all of these kinds of things, then please go back and relook and see, do I have observability? Do I have monitoring? If I do not, let’s talk about actually introducing that sooner rather than later. If you do, it’s useful. That’s why looking at more of these warning signs, not to say, yes, I have it, so we’re good. We need to look at the actual symptoms and say, are you actually being able to debug your things at a timely manner? Are engineers in the support able to fix something quickly because they’re able to pinpoint where it is, instead of guessing around where everything is.

The next sign is potentially too much abstraction. A sign of a very overcomplicated architecture is that if you have layers of abstraction in different areas. Don’t get me wrong. I grew up in learning Java. I grew up with object-oriented programming. Abstraction is one of the key factors of how I learned how to code. Abstraction is a good thing. I’m not saying it’s not a good thing. However, it can also be a sign if you have too much abstraction, or over-abstraction happens too quickly, because sometimes we rush to say, let’s abstract out all of these different things so we don’t have to deal with them later. However, this goes back to maybe one of the signs of debugging, do you actually know what’s going on underneath the scenes? If I have to get through 10 different layers to understand what’s under the cover, this could be actually quite difficult, and now you’re overcomplicating your architecture, unnecessarily. One of the things that I talk about is pragmatic abstraction. How are you looking at the different entities and what makes sense to abstract away, versus abstracting for the sake of abstracting? Everyone look at your code base and see, how many layers of abstraction do you have on the infrastructure level, or from a data point of view? Take a look and see, is that causing a little bit of more of the overcomplicated architectures? Is it getting too confusing?

This actually leads to the next symptom around high cognitive load. Is it really hard to onboard an engineer into your code base because of maybe those many layers of abstraction. This symptom or this sign of an overcomplicated architecture is when, I would say, what is the time for an engineer to onboard into a tech stack or into building a feature as quickly as possible? When you’re looking at going back to all that multiple domains and multiple microservices that maybe someone has to touch in order to make a change, that increases that cognitive load. That increases the amount of time it takes to learn different parts of the code base in order to deliver one feature. Thinking about, as a developer, how many repos do I need to pull in? How many multiple services do I need to spin up on my local machine just to test one thing? This is the thing that you want to watch out for in your organization is, go ask your developers right now, how long does it take to onboard into the code base? If it’s a long time, then maybe this, again, is another warning sign of an overcomplicated architecture. I think sometimes, as you see here, the difficulty to make changes for the engineering teams then leads to the fear of changing or evolving any code base. Look at the different practices. If you hear people saying, I don’t want to touch that area of the code base because it’s a bit scary, it’s not tested. If I change something, it might break the whole system. That’s probably the time to say, let’s look at that area, is that very overcomplicated? Is it too many domains talking about one thing, and those types of things?

The next warning sign is really when we see individual contributors owning whole domains and services, because it’s literally too complicated for anyone else to understand. This is one sign that I have seen is that you have some of your key engineers who were there from the very beginning of creating the code base. Of course, over time, they’re the ones who actually know the most about those particular services or how it actually works. Once you start getting into that point where it’s just too hard to explain that to anybody else, then you have to take a look because you probably have a very overcomplicated situation on your hands. Of course, the fallout from that is having high build times and long developer feedback cycles as your organization grows.

Again, related to all of these, another massive sign is really around the difficulty in actually making technical decisions. The tech sprawl is large. It takes too long to understand the current state. Maybe there is a transitional architecture that we were in the middle of but it’s not cleaned up, it’s not documented, so we’re unsure why this pattern exists. Then we have to go talk to, again, that individual contributor who knows everything, to understand what that actually looks like. If you cannot clearly, as an engineer, see the different patterns or determine which of the many patterns to follow, then it’s a big sign that we’re overcomplicating the situation. Then, of course, alluding to another bottleneck is around tech debt. Tech debt not only in the code base, but tech debt in your architecture is something that you’ll see, because it’s much harder to really grasp, especially in an overcomplicated situation. One of the things that I will say, is making the technical decisions taking too long at every single level. It’s not just in, how do I actually change my code? It’s a decision about, how do we bring in a new technology, or how do we refactor something? How do we rearchitect things? Some of these technical decisions people tend to defer if things are a lot more complicated? Yes, that’s a big one.

Then the last sign that I will talk about is really around that build in-house mindset. This is something I definitely see of a lot of different scaleups and startups coming to bear, going back to the builder’s trap, is, “We can build everything. We can actually figure this out in-house. I can custom build and create custom software for all the things.” I think that’s well ingrained, and us as developers really love doing things like that. One of the things that you’ll have to start thinking about is, how do you start replacing some of these things that are more commoditized and moving out of the build in-house mindset? Stop reinventing the wheel, which is what we tend to do as engineers. I am guilty of all of that. If you’re seeing a lot of custom things that we could actually buy off the shelf or use third-party integrations for, think about that for a second and look at your organization. It’s just a sign that can also lead to some of these overcomplicated architectures because everything’s bespoke. That’s one of the last signs that I’ll talk about.

How To Get Out of Overcomplicated Architecture

Then, let’s go ahead and get into getting out of this overcomplicated architecture. This is the most important part that we want to talk about. A lot of things that I’m going to talk about is not rocket science, it is a lot of things that we’ve talked about throughout the years. A lot of this advice applies to when we are in the monolithic world, but it’s so much more important to figure that out now that we’re in this microservices world. One of the things I will talk about, getting out of this overcomplicated architecture, is a theme around, simplicity is key. We are now in this world where there are so many different things. There are managed services for different cloud providers. There’s Kubernetes. There are events. There’s microservices. There’s different streaming, Kafka, all of these different things. Things can get complicated quickly without us helping that along. Being able to go back to the fundamentals of looking at good architectural design principles, and thinking about evolving architecture. This is referring to Neal Ford’s and Rebecca Parsons’ book around “Evolving Architecture.” It’s really understanding what is the need from a business point of view, and architecting around that is very important. Then the second theme that I always like to put out there is, even if you can, doesn’t mean you should. Even though it is possible to put this in here, should you actually do it? Going back to that simplicity is key. Always ask yourself, what should we be doing in order to justify that business outcome and that product that we’re actually trying to drive to?

Some of the actions that I would look at, and this should come to no surprise to everyone, is really around domain-driven design, and going back to the fundamentals and going back to the basics of what we mean about clear domains. Again, going back to that sign, if you’re finding that you have four or five microservices that have to be deployed together, because there’s that interconnectivity, then you seem to not have those clear boundaries within that domain itself. One of the things I would say is actually go back to the basics of domain-driven design, look at the bounded context. Look at how many hops you’re actually taking in order to make simple calls. Start with some of these simple calls and look at that, and figure out, should these two entities or microservices right now actually become one service, because there are so many interdependencies between that? Because that will then simplify, going back to that theme I’m talking about, simplify your architecture and simplify that cognitive load, if you can actually start putting things back together, that actually makes sense. It’s ok to evolve your architecture and understand that, we thought these domains are separate, but now, they’re not that separate, or the business has changed a little bit and this is actually one domain. Think about simplifying the situation as well looking for those clear domain boundaries. Again, basics of domain-driven design, needs to come back into play here.

The next one is quite obvious as well. It adheres to the line of what I talked about back in 2016, around observability and monitoring. One of the things I will say is, you can’t just put a tool in there and say, ok, we have observability and we can monitor all the things. I think on the other side of what I’ve seen now is that we have a lot of dashboards, and we have a lot of logs, and we have a lot of tracing. Again, going back to that sign that I talk about, is it actually helping you? Can you actually debug things quickly? If you cannot, then it may be worth now going back to the tools that you put into place, the processes that you have around here, and say, is it helpful? Do we need to remove the noise of too much tracing or too much logging, and actually focus on some of the more specific services that may be higher risk? Being pragmatic about that, and looking at what that actually looks like is going to be really important.

The next one is around lightweight architecture decision records, so ADRs. Really, this is actually to address a lot of that cognitive overload, to really talk about the onboarding of your teams and actually switching your mentality and thinking about, ok, when I make an architectural decision, how do we actually proactively write it down? Instead of, again, that reactive decision making within the context, how do we look at writing these things down and understand why we actually made those decisions? I think that’s really important here. Because as your architecture evolves, it is really important to record some of these design decisions. It’s ok to be wrong in some of these decisions. It’s ok that this might not be the solution going forward in future, but that’s why we call it evolvable architecture. Being able to understand why people make decisions at the time, at the context they’re in is very important for future state, so they can understand what that meant. Maybe they say, “I see why they made it. It made sense at the time. Maybe we should remove that now that our architecture has evolved.” This is actually quite important to embed very early on into your organization as a process, as a practice. A little bit about ADRs, they are lightweight. Few tips, they should be stored in source control. The closer you can have that to the code itself, the better it’s going to remain in sync. Then, of course, you can use automated scripts to detect updates in these ADRs when you’re in source control as well. This can actually give a nice, better process for your engineers of like, something has changed, let’s see what that actually looks like. This here is a great example of a template that we use in some of our ADRs. Something to think about in terms of actually driving this as a process in place very early on in your engineering teams.

The next thing that I always talk about is really around cross-functional requirements. As you can see here, this is some of the list. It’s not all of the list. Again, referring back to the “Evolutionary Architecture” book that Neal and Rebecca have written, it’s really around looking at some of these cross-functional requirements and these -ilities. These things tend to get forgotten very quickly when we’re actually trying to push features out as quickly as possible. Being able to prioritize these CFRs based off business value is going to be extremely important from the beginning. Here are some examples of how you want to actually think about prioritizing CFRs on the business value. More often than not, I actually see a lot of architects and engineers saying, this needs to be the most scalable thing in the world, so that’s why we do need event sourcing or event-driven architectures. If you actually move it backwards a little bit, and going back to that simplicity theme, what is your business actually needing? What does the product actually need? How do you actually talk about the value versus the cost of development and maintenance based off of that? Something to think about when you’re actually looking at your CFRs, and really prioritizing that.

One of the things I talk about as well is this prioritizing of CFRs needs to happen on a quarterly basis, if not more. This conversation needs to be more seamless. It can’t be something that is one and done. This is actually what’s going to happen, and then we’ll never revisit them again. This is a continuous, evolvable thing that needs to be driven based off of your CFRs, and the business values that you’re looking at. Of course, the architects need to be able to articulate the options and impact so the product and business can make these informed decisions together. This is a partnership that needs to happen. We cannot make these decisions in isolation. I’ve done this before, like, I’m pretty sure we need to scale up this. Every time we ask someone like, what is the SLA? What is the business of your uptime, for example? Most of the times, the architects and engineers look at each other and we’re like, “I don’t know what the business thinks, but this is what we think.” This is the perfect time to really start communicating and talking about this thing.

The next one is around removing that hero culture, so removing that cape. Essentially, this goes back into when you have overcomplicated architectures, and you’re going down the route of only these three people know how to change code in these domains, so let’s just keep talking to them, and making them the hero. What you want to do is break that cycle quickly. If these three are the only people who actually know this particular domain, then let’s talk about looking at that domain, looking at that microservices, or a group of microservices, or the technology there to figure out, what can we start simplifying there in order to uncomplicate your architecture? This is the first thing from an organization point of view to really not enable this type of behavior, but actually remove that hero mentality. The KPI that we’re looking for is, how quickly can we onboard engineers into the code base? Not, how did you save the day or that production issue, because you’re the only one who knows how to deal with that part of the code base?

Technical debt. This is actually one of the bottlenecks. Really, what I wanted to talk about here is, how do you apply this to your architecture in itself? How do you have a better understanding between your product and engineering team? You have the product person that says, “Why is it taking so long for this feature to come out? I just want to add a new window.” Of course, the engineers are just scowling because it’s just falling apart. It’s too overcomplicated. It’s really hard to make those changes. Being able to have that continuous conversation and collaboration with product and engineering teams will help pull a lot of that tech debt out of the backlog and make it a first-class citizen with the product team, so you guys can both get on the same page in terms of that.

Then, of course, one of the things that we talk about is around developer effectiveness. How are we actually as developers delivering the maximum value to our customers? This is really about applying more of your energy towards the product itself, instead of the churn. When you’re in an effective environment, you can actually put high quality software into production. We don’t want to deal with unnecessary complexities. We don’t want to deal with the privilege churns, or long delays, and things like that. The four things that I talk about here is really around, optimizing for the four key metrics in availability. Being able to incrementally and continuously deploy these things, but looking at the four key metrics based off of DORA. Looking at the micro feedback loop, so low-level friction quickly can compound to affect the developer motivation and productivity. How are we measuring those different types of things a day? What is the churn that a developer will have on a given day? This might be compiling an application, running a test suite, trying to debug problems. How long are those things actually taking? How are we actually measuring them, and actually looking at them very closely? Instead of saying, where’s my window? The other thing that we’re looking at is organizational effectiveness, so being able to look at some of these broader capabilities. I’ll talk about this in a bit around thinking about platform strategy. How do we actually holistically look at uncomplicating the architecture by taking some of these shared capabilities and improving the efficiencies around developers and the cognitive overload? Then I talked about this a lot, but information discoverability. Being able to understand what’s going on. This is where ADRs come into play. This is how to see why things have become what they are, and make it very easy for developers and engineers to onboard and understand these kinds of things. A developer portal here can be actually quite powerful. A lot of things that you can introduce into your systems here.

The next I’ll talk about is around platform thinking. Essentially, it’s like, how do we actually start uncomplicating our architecture and really looking at different foundational capability services and shared capabilities? How do we look at the business capabilities that are driven by domain-driven design and really looking at that? Platform thinking really takes some of these things and extrapolates a lot of the shared capabilities. It lowers the barrier to productivity within the engineering teams themselves. There are things like developer platforms and business capability platforms. Really, platform engineering teams have been actually built in order to optimize for some of the infrastructure and developer experiencing, using engineers and developers as the customer in some of these different developer platforms themselves. One of the things that we want to do is really thinking about platforms that are reducing friction for engineers. As you can see here, and this is something that we talked about at Thoughtworks quite a bit is, instead of having your teams really having to think about not only the customer value, but supporting infrastructure, building CI/CD pipelines, environment setup, crosscutting concerns, all the things. When you start looking into treating your product teams as a customer of your platform and developer platforms, then you can actually start lowering that need to support all of the things that they don’t need to really cognitive think about. This is about lowering their cognitive load, so they can focus on the product themselves. You want to remove that friction so your teams can actually focus on the customer, and the product can move faster. That’s a big one for when I think about platform strategies.

I talk about commodity versus differentiation. The last one around, are we bespoke all the things? Are we trying to build everything in-house? I urge everyone to look at things like Wardley Mapping and tools like that. This is a very good example to understand, from your point of view in your organization, what are the things that are commodity, that you should not really be having your engineers in your organization spending time on? Are these things that we can actually buy? Because it’s been done before and probably in a very good way, versus putting all of your energy in more of the differentiated things that actually drive your organization. What is your product company that’s your bread and butter? What is that special sauce? What is that differentiation? Should we not put all of our resources and energy into that differentiation, as opposed to all the commodity and things? Should we be building our own auth, auth system, or has that already been done before? Thinking about that can really simplify your architecture, and really focus a lot of the bespoke mentality on the things that make your organization truly special. Something to think about in terms of looking at more of the bigger picture, and what are the things that you want to invest your resources in?

Summary

What are the warning signs of your overcomplicated architecture? Going through our, do you have a distributed monolith? The lack or not useful tracing, or observability? The over-abstraction, how many layers does it need to get to some of the core functionality? Again, going back to your engineers’ high cognitive load, which leads to the difficulty of making technical decisions. Then, are you building all the things? Have you thought about the build versus buy versus rent mentality? Should you start thinking about that in order to uncomplicate your situation? Of course, some of the advice to get out of the overcomplicated architectures around going back to the need for clear domain boundaries. This is some of the fundamental things that I’d urge everyone to come back to. Looking at proper observability and monitoring, and is that actually helping you? Please test that out in production. Go test some of the tools and say, can I debug this quickly? If I can, then you’re in a much better state. Being able to, again, drive out lightweight architecture decision records. Again, this is not like a big governance, this is what shall be done. It’s more about recording the things and understanding how to evolve your code base in the architecture in itself.

Looking at cross-functional requirements, and actually driving that with the business outcomes, is really important, because you can’t be all the things. You have to actually pull those levers in some of these requirements, and they have to be somewhat pragmatic. Removing that hero culture, again. If one person is knowing all the things in the code base, then break that cycle and start coming into, how do you onboard everyone? How can I send any engineer over to there and they can actually figure out what’s going on? That will show for an uncomplicated architecture. Looking at technical debt, making that a first-class citizen with your product team and your business team. Then, of course, the biggest things around developer effectiveness. Looking at the churn of your developers and understanding, do you need some of this platform thinking that we just talked about? Then, again, looking at your organization around driving the commodity versus differentiation, and seeing what that means.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.