Article: Rules of Thumb & Traps When Approaching Tech Stack Decisions

Uncategorized

Article: Rules of Thumb & Traps When Approaching Tech Stack Decisions

MMS • Stefan Miteski

Article originally posted on InfoQ. Visit InfoQ

Key Takeaways

Every company has several budgets: capital, knowledge, and risk. Each has different amounts of them and can use its specific mix to its advantage.
The MVP stage includes only the stuff you would do in a hackathon. We care about scalability once survivability is established.
Delaying tech decisions to the Last Responsible Moment increases team velocity in the long run, so it is not in conflict with the Build Fast and Break Things approach.
Outsource all aspects of the tech solution which are not in a relationship with your competitive advantage. Control and build yourself everything, which is part of your competitive advantage.
Create a tech stack direction but balance it with the team’s autonomy to make decisions outside the directive

Every company manages money, knowledge, and risk. The abundance of risk allows startups to have a different mindset and use the risk as fuel to obtain knowledge faster. A startup can set up Wizard of Oz, or a Landing Page MVP that established companies would think twice before implementing.

I was running my startup with this sentiment, and I was unhappy with my inability to make faster tech stack decisions. A few years later, working in TeamViewer and doing some gigs for Fortune 500 companies helped me learn a few tips that I want to share.

I have decided to invite an imaginary character from The Unicorn Project book – Erik, and do this in an interview format. Erik is the legendary barman who coined the Three Ways of DevOps, I hope he will help us out.

Stefan Miteski: Erik, let me dive right into the topic. When should the team switch from building an MVP “as quickly as possible” mindset towards a more scalable, maintainable, and sustainable one?

Erik: First of all, not all hypotheses need code. Use whatever is cheapest and fastest. I prefer the tech that finishes the job fastest. If this is a low code platform that I will have to re-write entirely later, it is not a problem. So this is an altogether team-knowledge-based decision. If you have a person who knows how to code in NodeJS, you use that. If you have Python, then Python it is. At this point, you should not worry about the best quality processes, CICDs, or the best agile approaches. Those are extra burdens that the startup should not care about at this stage. Do only the stuff you would do in a hackathon at the MVP stage.

So the simple answer is – that scalability and processes will become important once we have predictable revenue streams.

Miteski: What about when money is not a problem, and an already established company launches a new project?

Erik: It is more or less the same. We care about scalability once survivability is established. If the project is a testing ground, you go the MVP path. If the project is going to be there for good, think about scalability.

Miteski: OK, so glad you have put this so elegantly. I often mentor early-stage startups, and I have met a lot of startups that failed because they have spent too much effort on scalability way too early. The risk of failing as a startup by creating a product that nobody needs (35%) is far more likely than dying due to inadequate technical decisions (less than 8%).

But what about the scaleup phase? We have a predictable revenue stream. We can finally ditch that low-code spaghetti code. What technology to choose now?

Erik: Conway’s law says – any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure. So at this point, we should think about structuring the whole organization, not just the tech stack or software architecture. How many teams, how they will work together, how they will align, and what processes for testing will be there. CICD. What will be our discovery strategy? How would we achieve alignment of our marketing and development efforts? How are we going to ensure that we have a healthy culture overall? How do we acquire and integrate talent?

Every technology is built with a set of problems in mind. So for some projects, this is quite obvious. But should you develop your web app using Java or C# on the backend? It might not be so obvious. SQL or MySQL? The short answer is to let the team decide and use The principle of Emergent Architecture and The Last Responsible Moment rule. See-through both perspectives – the functional and non-functional.

The non-functional requirements:

How big is the community online? Suppose you choose C# over some exotic language. For C#, you already have comprehensive documentation and support from the likes of Jon Skeet – who has reached over 1 million points reputation on StackOverflow. The chances are far more likely to find answers online with C# than with the exotic alternative. How many available libraries are there, and how mature is their support?

How big is the total hirable market in terms of developers? What languages do graduates study at the nearby universities? Yes, you can always train people in technology, but then add the time needed into the equation and budget for higher salaries if you want to hire people into unpopular or dead languages in 5 or 10 years from now.

How long do you expect the system to last? Sometimes we forget that we as humans are developed in 9 months and have a life expectancy of roughly 70 years. So we need to have a similar approach if we build things that last. Still, it is also a matter of ongoing tech strategy, not only about the first choice we will make.

Making significant changes in the production sometimes feels like repairing a plane while flying it. That is why it is wise to postpone decisions about critical parts of the system to The Last Responsible Moment.

Miteski: So what is the Last Responsible Moment?

Erik: It is a principle that originates in the Lean philosophy. Sometimes the cost of rework is higher than the cost of slowing down. The Agile philosophy favors Empiricism – learn by doing. However, you would be taking a significant risk by deciding too early on irreversible decisions. So we are cleverly identifying them and cutting from them the parts from which we can learn and reduce the risk before making the big decision.

Miteski: Delaying things sounds like a conflicting principle with the lean startup one – speed is our main competitive advantage.

Erik: Postponing the decision does not mean we just sit and think. It means that we do a lot of architectural spikes in the surrounding of this decision and learn about the business aspects, and the user needs to make the best decision in the Last Responsible Moment. We can mock the system and build all the APIs around it.

Miteski: So, how do business aspects influence our tech-stack decision-making?

Erik: In many ways:

Where do you expect your competitive advantage to be? What technology can best support this competitive advantage?

Outsource all aspects of the tech solution which are not in a relationship with your competitive advantage. Control and build yourself every piece, which is part of the competitive advantage. We can extend this mindset to all tech and other non-essential services. Be careful, though, not to create competitors from your suppliers. After all, Microsoft and Intel were IBM’s suppliers.

So if going bare metal is not essential, it makes a lot of sense to go with a cloud provider. Which one? Well, again, it depends on the nature of the business. If it is essential for your customer how quickly the cloud provider will react during an incident, then look closely at their SLAs and the type of partnership you can build. If it is not crucial, then you can think in terms of the services that they provide. From simple things like CloudWatch to flavors of Kubernetes. What benefits would speed up your development time in the long run? And lastly, look at the pricing. Choosing a cloud partner can be like deciding on your startup co-founder. Take your time, don’t rush it, especially if you go into deep integration with their services.

Miteski: Can you tell me some traps we can fall into as the organization matures?

Erik: The first one is the Tram trap. It happens in unhealthy organizational environments where developers build silos of knowledge. I have talked to a tech giant where a single engineer wrote essential services. He held the organization hostage to receive a better salary, did not get what he wanted, and left the company in the end. They had to rewrite it as nobody was able to support it.

Silos, however, can occur naturally due to high pressure from management for fast delivery. In high-pressure environments, developers have to specialize in certain areas to be more efficient. So the de-silofication should be considered a complementary task while dealing with technical debt.

Regardless of why such silos occurred, we should know how many of them are critical. I have witnessed huge companies that should not allow five specific engineers to travel on the same tram, as the risk of survival of the company if something happens with the tram is simply too high. If this is the case in your company, then it is time for you to think about doing things differently. Spread the knowledge, and implement a proxy strategy where other engineers will start taking tasks intended for the “tram-people.” Instead, we would consult the experts only when the “proxy engineers” are stuck. This tactic slowly spreads the knowledge at the cost of short-term efficiency. We are creating long-term robustness, which will pay off as higher efficiency in the longer run. Does it make sense?

Miteski: yup, it does.

Erik: Another one is the puppy trap. Sometimes our childish naivety overwhelms us when we like a cute puppy. We decide in a rush to adopt it based on how it looks now and how we feel about it at this very moment. This puppy can grow into a big animal that will require maintenance over more than ten years. In many cases, the solutions also last over ten years. So we need to make the technological decision with this in mind. Just because an engineer drinking a beer or a manager reading an in-flight magazine got inspired by the shiny new thing, we should not dedicate the next 10-15 years to this technology. Visual FoxPro in the late 90s sounded like a great new technology, but it did not stand the test of time. The tech graveyard has many examples like this. With the pace we are getting with new front-end technologies, we can expect the tech graveyard to remain a busy place in the upcoming years.

On the other hand, it is risky to shut the organizational door to new ideas and experimentation. So if we experiment, it is easier if we have micro-services or loosely coupled systems to use a small isolated part of the system for implementing new technologies. Then we can experiment with novel technologies in production but on some completely non-essential system parts on the outskirts (which would not affect customers if we have to shut them down at some point in time). Use these to build up operational knowledge. Building just developmental knowledge is very risky. We need to experience how our tech stack behaves in a live scenario to create a holistic understanding.

Miteski: OK, we have followed the Last Responsible Moment rule and let every organization team choose what they want. A dilemma emerges when balancing the team’s autonomy – to use their “tech stack of choice” vs. the company’s unification of the tech stack?

Erik: The organizational leadership should set up a tech directive. A loose set of principles for making these decisions.

There is a significant advantage for all teams in a company to use a similar tech stack. It allows the company to negotiate better prices for licenses. The teams can reassemble faster according to new needs, making the company more adaptable. Tech challenges would be solved once, and the knowledge disseminated throughout the company.

The downside is that the wider the variety of the markets and solutions the company offers, the more significant the need for the teams to use a different tech stack is. So the rule of thumb is to create a tech stack direction but give the team autonomy to make decisions outside the directive. The team should be able to articulate the factors that led to not following the company’s direction, though.

Miteski: If we are using the Loose coupling approach (such as microservices), you would be able to allow different teams to create services/system parts in various languages. That level of flexibility can bring us certain advantages, but when should we exercise this flexibility? What would be a rule of thumb here? Engineers usually know best, but they do not always have the proper motivation when proposing a new tech stack. I have just seen the situation where some engineers created services in GO just because they wanted to learn GO as it is a novel and incredible language.

Erik: Loose coupling is always a good idea. Using various languages can enable the company to evolve slowly. What is essential is to start with a slight change. If it works well, we learn from it, and we encourage the other teams to build on top of this knowledge. Depending on the company and code size, it will take different times for the whole company tech stack to evolve towards a new concept. Maintaining a continuous evolution and having an incremental approach is less painful than cutting the cord and starting again. However, the rule of thumb is that those further language/tech explorations should be limited, planned, and executed on non-essential services/system parts. We do this exploration on a service that we can kill at any moment without a significant impact. So the evolution and the tech change waves are starting from the outskirts with the most negligible customer impact towards the inner core: slowly, the whole organization and code-base changes. We rarely start a new big wave while we have one already flowing inwards that has not yet reached all the core system parts.

Focus on what will allow you to develop at a high pace sustainably while not neglecting the operational aspect and deliver better value – sooner, safer, and happier.

Miteski: Thanks! I enjoyed this conversation. I believe every company should open it from time to time.

Erik: Thanks, and don’t forget to revisit me while re-reading The Unicorn Project.

About the Author

Stefan Miteski

Show moreShow less

Mobile Monitoring Solutions

Uncategorized