March 2024 - Mobile Monitoring Solutions

Uncategorized

Improving GitHub Deployments with Merge Queue

MMS • Aditya Kulkarni

Recently, GitHub talked about using merge queues to implement code updates. Merge queues have been developed and scaled to manage over 30,000 pull requests, alongside the corresponding 4.5 million CI executions, for GitHub.com.

The merge queue system organizes pull requests into deployable batches, initiates builds and tests through GitHub Actions, and maintains the integrity of the main branch by preventing updates with failing commits, by following branch protection regulations. Conflicting pull requests within the queue are automatically identified and excluded, with the system regrouping as necessary.

Will Smythe, staff product manager, and Lawrence Gripper, staff engineer at GitHub, elaborated on GitHub’s journey with merge queues in a blog post. Aside from increasing developer velocity, the move to merge queues was aimed at enhancing the overall experience for developers in shipping their work, preventing problematic pull requests from affecting the broader team, and ensuring a consistent and highly automated process across all services and repositories.

By mid-2021, GitHub began piloting the merge queue feature with several smaller, internal repositories. They implemented changes to the process in stages, which involved testing and reverting modifications early in the morning before the majority of developers started their workday. Over time, GitHub systematically migrated its large monorepo and all repositories linked to production services to the merge queue system by 2023.

The merge queue blends into GitHub’s existing pull request workflow, eliminating the need for developers to learn specific ChatOps commands, or to utilize labels or particular comment syntax for state management, enhancing GitHub’s developer experience. Developers can easily queue their pull requests and, if they identify any issues with their changes, exit the queue with one click.

Following the general release of the merge queue by GitHub in around Q3 2023, we stumbled upon a relevant discussion on Hacker News. The tech community participated actively in the discussion, with a user highlighting their few months-long uses of the system for monorepo pull request merges, acknowledging its substantial improvements to the process. They also praised the system’s faster ability and reliable trunk-based releases, expressing appreciation towards the team behind the feature.

Another user inquired whether merge queues would be arriving in Azure DevOps anytime soon, indicating interest in its availability. One participant responded in the thread perceiving the lack of updates in Azure Repos and pointing out its continued reliance on ssh-rsa host keys for SSH Git—a protocol deprecated by OpenSSH for several years.

Monthly, more than 500 engineers utilize merge queue to integrate 2,500 pull requests into GitHub’s large monorepo, resulting in a 33% reduction in the average time to deploy a change. In a periodic developer satisfaction survey conducted by GitHub, an engineer applauded the merge queue as “one of the most significant quality-of-life enhancements for deploying changes that I’ve witnessed at GitHub!”

About the Author

Aditya Kulkarni

Show moreShow less

Uncategorized

Databrix Announces DBRX, an Open Source General Purpose LLM

MMS • Daniel Dominguez

Databricks launched DBRX, a new open-source large language model (LLM) that aims to redefine the standards of open models and outperform well-known competitors on industry benchmarks.

With 132 billion parameters, DBRX has demonstrated in its own run of industry benchmarks that this model outperforms popular open-source LLMs such as LLaMA 2 70B, Mixtral, and Grok-1 across various language understanding, programming, and math tasks. The new model even competes favorably against Anthropic’s closed-source model Claude on specific benchmarks.

The AI Community expressed excitement about the release of DBRX, with Clem Delangue, CEO at Hugging Face posting on X:

Not a surprise but DBRX is already #1 trending on HF!

DBRX’s performance is attributed to its more efficient mixture-of-experts architecture, making it up to 2x faster at inference than LLaMA 2 70B despite having fewer active parameters. Databricks claims that training the model was also approximately 2x more compute-efficient than dense alternatives.

The model was pretrained on 12 trillion tokens of curated text and code data, leveraging advanced technologies like rotary position encodings and curriculum learning during pretraining. Developers can interact with DBRX via APIs or utilize Databricks’ tools to fine-tune the model on their proprietary data. Integration into Databricks’ AI products is already underway. DBRX is available on GitHub and Hugging Face.

Databricks anticipates that developers will adopt the model as a foundation for their own LLMs, potentially enhancing customer chatbots or internal question-answering systems. This approach also provides insight into how DBRX was constructed using Databricks’ proprietary tools.

To create the dataset utilized in developing DBRX, Databricks utilized Apache Spark and Databricks Notebooks for data processing, Unity Catalog for data management and governance, and MLflow for experiment tracking.

DBRX sets a new standard for open-source AI models, offering customizable and transparent generative AI solutions for enterprises. A recent survey from Andreessen Horowitz, indicates a growing interest among AI leaders to increase open-source adoption as fine-tuned models approach closed-source performance levels. Databricks expects DBRX to accelerate the shift from closed to open-source solutions.

About the Author

Daniel Dominguez

Show moreShow less

Uncategorized

Transactional Serverless Computing: PostgreSQL Creator Announces DBOS Cloud

MMS • Renato Losio

The creators of DBOS have recently introduced DBOS Cloud, a transactional serverless application platform tailored for TypeScript developers. With all state information stored in a highly available DBMS, this new platform assures transactional serverless computing, offering reliable execution alongside so-called “time travel” capabilities.

Dubbed as “the world’s first cloud-native operating system” and a “database alternative to Kubernetes”, DBOS (DataBase oriented Operating System) implements operating system services in SQL, operating atop a high-performance distributed, transactional, partitioned fault-tolerant database. Michael Stonebraker, computer scientist and a Turing Award laureate, writes:

The idea for DBOS (DataBase oriented Operating System) originated 3 years ago with my realization that the state an operating system must maintain (files, processes, threads, messages, etc.) has increased in size by about 6 orders of magnitude since I began using Unix on a PDP-11/40 in 1973. As such, storing the OS state is a database problem. Also, Linux is legacy code at the present time and is having difficulty making forward progress. For example, there is no multi-node version of Linux, requiring people to run an orchestrator such as Kubernetes.

According to the authors, DBOS Cloud automatically logs every step an application takes and each change it makes in the database. Two distinct features of DBOS Cloud are reliable execution and time travel: in the event of an interruption to the code running on a DBOS program, it automatically resumes from the point of interruption without re-executing any previously completed work. Stonebraker adds:

Providing such guarantees yourself is months of work, but in DBOS, they’re built into every program (…) You can step through past executions to reproduce rare bugs and even run new code against historical state.

Furthermore, a “time travel debugger” enables developers to replay any DBOS Cloud trace locally on a laptop, observe past application states, and test code changes. In the future, the team plans to release a time travel functionality for disaster recovery, enabling developers to roll back an application and its data to any previous state. Jeremy Daly, CEO and founder of Ampt, comments:

What makes this super interesting is that the project was founded by Dr. Mike Stonebraker (he created Ingress, PostgreSQL, and VoltDB) and Matei Zaharia, the creator of Apache Spark.

Peter Zaitsev, founder at Percona and open source advocate, agrees but adds:

Mike Stonebraker does not seem to be a big fan of Open Source for his companies in recent years.

While the team has released an open-source DBOS TypeScript SDK, unlike Ingress, PostgreSQL, VoltDB, or Apache Spark, DBOS itself is not open-source. In a thread on Hacker News, Peter Kraft, co-founder at DBOS, explains:

Under the hood of our cloud platform (mostly Go + SQL), we’re building on ideas from the academic project to provide new features like reliable execution/time travel, but like a good OS, we want to hide that complexity from users.

A free tier and a programming guide covering idempotency and workflow execution on the serverless computing platform are now available. The free tier offers fixed resources per application (a Firecracker microVM with 512 MB of RAM and 1 vCPU) that scale to zero when not in use.

About the Author

Renato Losio

Show moreShow less

Uncategorized

Apple Researchers Detail Method to Combine Different LLMs to Achieve State-of-the-Art Performance

MMS • Sergio De Simone

Many large language models (LLMs) have become available recently, both closed and open source, further leading to the creation of combined models known as Multimodal LLMs (MLLMs). Yet, few or none of them unveil what design choices were made to create them, say Apple researchers who distilled principles and lessons to design state-of-the-art (SOTA) Multimodal LLMs.

Multimodal large language models are built by combining a large language model and a vision foundation model into a single model. MMLMs, which according to Apple researchers “are emerging as the next frontier in foundation models”, aim at consuming image and text inputs to generate text data in a way that outperforms the foundation models they build upon.

Apple researchers focused on two aspects of the process that lead to the creation of MLLMs: decisions about the model architecture and choices for pre-training data.

On the first front, they found that image resolution, visual encoder loss and capacity, and visual encoder pre-training data are the three most important design aspects. On the contrary, architectural decisions regarding how visual data is fed into the LLM do not seem to affect the resulting model performance.

Regarding pre-training, the researchers analyzed three different approaches —image-caption, interleaved image-text, and text-only data— in few-shot, zero-shot, and text-only contexts. Zero-shot models are trained to recognize and classify objects or concepts without necessarily having previously seen any examples of them. In few-shot training, the focus is instead on models that can make accurate predictions based on training that includes only a very small number of labeled examples.

The outcome was that interleaved and text-only training data is key for few-shot and text-only model performance, while caption data is key for zero-shot models.

To prove their results, the researchers built a family of models, dubbed MM1, outperforming current state-of-the-art models, including Emu2, Flamingo, and IDEFICS. Benchmarking was done on captioning, where the model provides a descriptive caption of an image, and visual question answering, where the model answers questions about an image and helps understand its content.

Thanks to large-scale multimodal pre-training […] MM1 enjoys appealing properties such as in-context predictions, multi-image and chain-of-thought reasoning. MM1 also enables strong few-shot learning capability after instruction tuning. These strong results demonstrate that the presented recipe for building MLLMs translates the design principles to a competitive model at scale.

As the researchers explain in their paper, to get these levels of performance with MM1, they investigated different image encoders as well as ways of connecting them to LLMs; different types of data and how to set weights; and how to train the MLLM, including its hyperparameters. Their results include insights such as the importance of image resolution, model size, training data composition, and so on, which they hope can provide a solid foundation for the community to build stronger models across multiple architectures and data strategies.

About the Author

Sergio De Simone

Show moreShow less

Uncategorized

Presentation: How to Get Tech-Debt on the Roadmap

MMS • Ben Hartshorne

Transcript

Hartshorne: I’m here to talk about a time in Honeycomb’s life. As a company, we had been around for several years and the business was slowly growing. We were getting more customers. As the traffic grew, some scaling issues come up, and they are managed. This is great. We could keep growing and a different thing comes up, and it’s managed. As the pace of growth goes along, these things happen. Each time, there are side effects that push out a little bit further. What you wind up with is a system that is still there and working well, and just creaking. It starts to push its boundaries more frequently in different ways. It’s not patterns, it’s rough. We saw as a business, we wanted to grow. We’re realizing that what got us to that part in our company’s life would not get us to the next one. I would like to talk a little bit about how we managed that.

Background

My name is Ben Hartshorne. I’m a Principal Engineer at Honeycomb. I worked on this talk with Jess Mink. They joined Honeycomb as a senior director of platform engineering when we were facing this. They started, and we said, “We would like to double the size of our business, please help.” We needed to get a lot of work that was on the periphery here, somehow scheduled in the face of growing the business and increasing the product, and everything else. The thesis is relatively straightforward. A compelling business case is what will get your work scheduled. We need to understand, what does a compelling business case look like? How do we get there? How do we ensure that we can do this for technical work? Because the company has been growing and the product organization has been growing, and there’s more sophistication around deciding what we should build. We need to match that.

Understanding Priorities

I want to start by thinking about what does it mean to understand priorities, just a little bit. It’s a truism in the startup world and probably in all of the business world, that there’s always more work than you can possibly do. There’s this herd of features out there, they are beautiful. They are product ideas. They’re things you can do, new things you can build, ways you can improve. They all must go through this gate of your engineering capacity. You can only do so much work. We’re starting from this perspective of there’s no way we can do everything. The engineer’s view on this herd is on the inside of this fence. There’s this line of beautiful work coming in at you, it is full. You have your roadmap. You have all of your work queued up. You have the new products to build, the existing ones to improve. You look at this and you think, how can I possibly do the work that I see that we need to do in order to maintain this business? When do I schedule this? Can we close the gate for a little bit so we can sweep the entrance? That’s hard. Take for a moment the other perspective, the product managers, the ones who are creating this curated list of things for you to do. This is their view. They see this enormous pile of things they would like to build, ways the business can improve, and they’re not going anywhere. Why isn’t engineering doing our work? They ask. They see this as an enormous opportunity, and need to figure out, engineering has the capacity that they do. We need to understand which sheep to offer them, which are going to make it through the gate first. This is a little bit about what they do.

Let’s imagine this sheep in the front here. This sheep in the front will reduce your hosting costs by 20%. Very good sheep. We should do this. Any project that might show up has to be considered in the context of what the business is doing, what’s currently going on everywhere. Imagine we are a primary color company who does some web search, and we have the opportunity to reduce our hosting costs by 20%. This is a career making move. This is going to happen. It doesn’t matter really. It matters a little bit how much it costs. This is a project that is absolutely going to happen. Picture this middle group here, classic startup scene, early in their business, they have the same opportunity to reduce their hosting costs by 20%. Their hosting cost is what, like $300 a month? No, you’re not going to do that. Then there’s all of the states in between where your business might be in a merger, or it might be in a reorg, or there might be something going on. You have this obvious benefit, reduce hosting costs, but it’s really not clear whether you should do it. If the reorg isn’t a complete, and the entire product that’s being hosted there is going to be disbanded, no, don’t go spend time reducing the costs to host it. The same project, the same decision, the same idea may get a very different result depending upon what’s going on in the rest of the business.

What’s the Prioritization Process?

Let’s look a little bit more at how to choose which of these things to do. It’s not always quite so simple as an enormous cost reduction. Product management has come up in this track before. There are sophisticated tools to help product managers take this enormous mountain of ideas and input and form it into the things that they’re going to present for engineering, to understand whether or not one of these things should be done. This input comes in many forms. There’s feedback from customers. There’s ideas sprouted from anywhere, really. There’s appreciation for a specific aspect of your business. All of these come together. They come in separately, but they need to be understood to be representations of a thing that the business does, a service that it provides, a problem that it’s solving. You can look at each of these different bits, whether it’s a piece of customer feedback that comes in, and try and reframe it so that it is shaped like a problem that the business is fulfilling. This process is managed for every one of those sheep we saw. What we wind up with is this enormous corpus of data about all of the different problems that we are confronting as a business, which ones affect our customers in one way or the other. Some of them are blocking new customers. Some of them are allowing the expansion of existing customers. Each one has this depth to it that has been accumulated by the product organization.

Once you get an understanding of these ideas, you can try and tease them apart into a couple of categories. There’s the obvious good, the obvious bad, the big, squishy middle. I particularly love this graph. I first came across it in a workshop by a guy named Jeff Patton. When I went to look for the image that he used, found he credited it to a graph called Constable’s truth curve, which was in fact presented here at QCon in 2013. On the left there you have the size of a bet you’re willing to make for this particular piece of work. On the top, going across is how much you expect to get from that. Let me give you an example. I have a new feature that is going to make you several millions of dollars as a business. It’s going to take quite a lot of time. It’s up there with the green dollar sign. You ask me, “So you’re proposing this thing. We recognize all development is a bet. How confident are you? You say it’s going to bring in this much money? What’s the likelihood you’re wrong?” I say, “I think it is, and if I wrong, I’ll buy you a cup of coffee.” Maybe don’t take that bet. The green is equally clear. You have some relatively easy task that has a very clear return. You’re very confident, there’s no question about whether this would benefit the business. I’m willing to give you my car if I’m wrong. Yes, we should do that. That’s in the nice green section up there. The squishy middle is like most of everything else. We need tools to help us explore these ideas and better understand where they fit. In hopes of pushing them out of that squishy green middle, into one of the other edges. There are a couple of good thought organizing frameworks out there. These is just two samples, an opportunity canvas, a learning canvas. They’re both ways of taking your understanding of the problem, of the people it will help, of the people it might not help, of the cost, structuring it in a way that facilitates some of this conversation.

Ultimately, the goal of this process is to clarify the ROI, the return on investment of any given piece of work. We need to find out who this work will influence, who it will impact, how much, and how much those users care about this particular thing. If it makes a big difference to some aspect that they never use, maybe not so important. A big part of the process of product management is understanding all of these questions. I’ve spent a while talking about this, it seems like a lot of work. We’re here to talk about tech debt. The thing is, yes, it is expensive, and software is more expensive. This has come up a couple of times in this track too. Engineers are expensive. Engineering efforts are expensive. I don’t want to just focus on us as individuals doing this work. You’ll hear somebody say, this is about one engineers’ worth of salary for a year or two for this project. If it’ll take us three months, we should do it, that sort of thing. That’s not enough. There’s a metric out there published by a number of public companies, it represents revenue per employee. We’re beyond engineering here, it’s not revenue per engineer, revenue per employee. This is what each employee of the company represents in terms of value that they’re building in order to bring the company to a good place. If we do a little quick math, engineering is usually maybe 30%, 35% of the company. Look at those numbers a little bit. We’re talking about a million dollars per engineer, in terms of the revenue they’re expected to generate through their work. For high performing companies, this is even higher, $2 million to $5 million.

Our time is expensive, it is worth a lot. We need to recognize that in order to value the work that the product organization has been putting into how they’re curating all of those sheep and to choose which ones they’re going to let through this gate. They are putting an enormous amount of respect on our teams, in order to choose the most impactful projects for us to build. When we think about tech debt, we think about a thing that we need to do that’s outside of this regular stream of work. We think that our piece of work should jump the queue. “This database upgrade is important. We have to do it.” Do we? Maybe not. What’s the business value? What’s the return? How can we measure this piece of work and this feature in the same language, the same framework as all of this effort that’s been put towards the rest of the work that is coming to us from the rest of the organization? Hang on to that question for a moment.

Recap (Understanding the Priorities)

We talked about, obviously, we can’t do all the work and the prioritization, we’re going to base strongly in a business case. Both the return we get for doing the work, also the cost of the investment we put into doing the work, and then that balance. The balance between those must exist in the context of the business and what’s important for the business at that time. The cost of the work, very good.

Selling the Impact or Your Project

Let’s talk about what we get back. What are we going to get for this project that we need to do? We’ve seen our work queues look like this. We need to upgrade the thing. We need to move the other bit. We need to refactor this. You all know what this is. This is tech debt. We can tell. We look at it, we see, yes. In order to focus it back on the business, so I want to think a little bit more about what tech debt is. It’s hard to say what tech debt is, so let’s answer the easier question instead. What do we do here? This is a wonderful graph, the VP of Engineering at Honeycomb put up in a blog post, understanding what goes into engineering work. On the left side, we have all the things that are customer facing, the features we actually build. Also, customer escalations and incident response, bug fixes. On the right side is all of the stuff that’s behind the scenes. Customers only see that part when something goes wrong: toolchains, refactors, training, testing, compliance, upgrades. Let’s split it the other way, up top. We think engineers are responsible for writing code, that’s true. The stuff up top, this is in code. This is adding new features, removing features. It’s also the upgrades. On the bottom, we have all of the attributes of running a system that are in production that are not in code. This is going to be part of your CI/CD system and your deploys, your training, your QA. Also, your incident response, the things you need to do that are part of owning your work in a modern software SaaS system. All of this is part of an engineer’s regular job.

Which one is tech debt? I think of tech debt normally in terms of refactoring of dependency upgrades, of work that you have to do in order to reform the product to fit some new business shape. Really, every category outlined in red here can be described as technical debt. I loved one of the examples from yesterday about doing a feature flag triage and leaving your feature in there just long enough so that you can reuse your feature flag when you’re putting in the next one. You delay the removing some code. It’s not just the stuff in the back. It’s not just the stuff that you think of in terms of refactoring. There’s a problem here, when we’re using a term so consistently, and we all have different definitions, it really muddies the conversation. When I’m talking to somebody, and I say, here’s this piece of tech debt, we need to spend some time on it, and they hear something that’s in a completely different portion of the circle. It does not aid the communication. Perhaps we can just stop using the term. We’re going to need to talk about work in a business context anyways. The goal of this workflow is the same as getting it on the roadmap.

Language is Important

As engineers, one of the things that we do is translate language that we see into understanding how that will affect us and our systems. There’s an observability company out there whose website has a big banner on the front, “Find your most perplexing application issues.” When I look at that, when we as engineers look at that, we’re like, how? I want to fit that into my product. I want to understand how I can rely on it, where I can rely on it. Where it will help me, where it won’t. How do you do this? What is it? We’ll make a translation in our head. Honeycomb is a tracing product. I believe it is a good example of what goes on in our heads. This translation is very useful when we’re building systems. When we’re trying to explain business value, it leads us to talk about, we need to upgrade the database. Not so helpful. We’re here to flip that around a little bit. As we are trying to understand how to prioritize and schedule the work that we see within engineering, we need to understand how to switch from talking about what it is that we need to do, upgrade the database, and start talking about why we need to do it so that we can connect it back to this business case. Sometimes that takes a couple of tries. We talked about needing to upgrade this database. Or maybe not. Do we, really? Let’s ask the team, why do we need to upgrade this database? They tell us, actually, it’s hitting its end of life. It’s an EoL database, we need to get off of it. Do we, really? It’s still serving our queries. In this mythical world, even that isn’t good enough. Let’s say that our hosting provider says, we will not run EoL software, so you have four months, and then we are shutting it off. Now we see, ok, what happens when they shut off our database? The entire product stops working. We need to upgrade that database, because we want the product to continue. If you go to your sales reps and say, would you like our product to stop working next September? They will say, no, please schedule the work you need to do. I would like you to fix that, how can I help? Who should I talk to? I will help you get this work scheduled, because it is clearly connected to ensuring that the business can continue. Hosting providers usually don’t go in there and flip off your database just because you’re running an EoL version of MySQL.

I want to tell another story. This one’s a little different, because I didn’t actually get the business case until after it had happened. We were running an old version of a LaunchDarkly library. LaunchDarkly does feature flagging. It was fine. It worked. There was one team that they had some extra time, they upgraded it, they had a thing they wanted to try. There was a shift in the model of this library from thinking about a user as the thing that manages a flag, to thinking about a context. Intentionally more broad. The context pulls in all sorts of other attributes of where this flag is being executed. This team used that to flag on their feature for a single node and get a very nice comparison graph. This was awesome. Within just a short period, many other teams were doing the same thing and getting to see, ok, I can do a percentage rollout, rollback. I can flip it on to these. We were used to only flipping it on and some of the data that is more relevant to our application, it opened up a whole new window. We saw, we can safely canary changes in a way that we couldn’t. We can do A/B testing in a way that we couldn’t. This is amazing. If we had been able to see that business case that we can make our engineering teams able to deploy their software more safely, more quickly, with more confidence around the change, yet better understanding of the impact of the change, we would have been able to prioritize that work. It was lucky that it was not so much effort and that one team had the space to make it happen. This idea of finding the business case in order to do the work, sometimes it takes some effort to understand what is going to happen with this. That can feel a little squishy.

Back It with Data

We are engineering teams here. We have a magical power. We can create data. We can find data. Our systems give us data. If they don’t have the data we need, we can make changes to those systems. We can commit code to get more data. We can use this to back up all of our arguments. When we get one of these conversations like, I think it’s going to do this, and someone else says I think it’s going to do that. We can find out. SLOs, top of the list there. They’re my favorite tool for mapping business value back to a technical metric, because in their formation, they are representing the experience of the user. They’re encoding, what does it mean for a given experience to be successful or not? How many of those experiences are we allowed to have fail? Over what period of time before we hit a threshold where our users and our customers get angry and leave?

I would love to spend the rest of the time talking about SLOs. Liz Fong-Jones did a talk, it’s published on InfoQ, on production excellence with a fantastic segment on SLOs. They’re not alone in our sources of data, we have all of our metrics showing various limits of our services. We get to do chaos experiments. When we have a service, and we’re not quite sure what that limit is, and we need to know more, we can artificially constrain that service in order to push its boundaries and find out what we would otherwise find out in an incident, six months down the road. Of course, there are those incidents. They really are two sides of the same coin. I don’t want to leave out qualitative surveys. Often a piece of technical engineering work that fits into this category of tech debt may feel like its target is not a specific amount of money for the business, but facilitating one’s experience in a certain part of the code base. There’s this idea of haunted graveyards, the section of the code where people fear to tread, because it’s dangerous, it’s complicated. Cleaning those up is valuable, to allow the business to move forward. Some qualitative surveys can really lend credence to this as you ask your engineers, how was it to build this new feature? How was your feature flag experience? With all of this data, it’s easy to fall back into those old habits. We need to upgrade the database because the CPU is too high. Who cares? You got to bring it back in business context, take that language about the CPU is too high and translate it to, what effect will that have on our ability to run our business? We’re back to our list here. We want to get this work prioritized. To do so, we understand what is important to the business here, now. We take the projects we need to do that we see are obvious and understand how to translate that into something that’s based on impact, rather than the technical change that we wish to make. We find the data we need to back it up.

I want to take you back to the Honeycomb experience of a year and a half ago, with this system that has been growing, and it’s starting to creak and fail in other interesting ways. Let’s run this playbook. Let’s see what happens. Start with understand the priorities. Honeycomb is a SaaS business. We are software as a service. ARR rules everything, this is annual recurring revenue. This is how much people are paying us, whether it’s monthly, or yearly. It includes people that decide to leave, and end their contract with Honeycomb. It is a big deal. ARR is this number that we have on our fundraising decks that we talk to the board about, that we make projections for. This is what we are organizing our business around, saying we’re going to get to this amount of revenue and we’re going to do it in this amount of time. I know there are many talks here about reducing costs and expanding revenue. For many SaaS companies earlier in their life, the balance is heavily skewed towards revenue. ARR is the thing, so let’s use it.

We get a dollar number of our ARR target, but that doesn’t translate to upgrading a database. By talking to the sales and product team, we get more detail, because this number is such a big deal, there’s a lot of research and work that goes into building it. It’s not just a number. It’s not just throwing a dart or watching a curve and saying, we’re going to make it there. No, they know we’re going to get this many customers of this type and we’re going to see these many upgrades and we’re going to see this many folks leave. All of this comes together to form that top line number. It’s wonderful that we have descriptions of who we’re going to get, still doesn’t help because I don’t know how to work with dollars but they have more answers. When I ask, ok, you’re getting 10 more enterprise customers, what does that mean? Which of our current customers can I model this after? How are they going to use our product? They came back with answers. They’re like, ok, this group of customers, they’re going to be focused more on ingestion and they’re going to use fewer of these resources. These other ones over here, they’re going to be more of a balance. Those ones there are going to be very heavy on the API. Now we’re getting onto my turf. We’re getting to numbers that I can see, the numbers that I have grasp about. I can now use this information to connect it back to the business value that’s going to come from each of those things. We need to take this data and map it to our infrastructure. Honeycomb is a macroservice architecture. We have a smallish number of relatively large services, they’re all named after dogs. We’re applying these to our infrastructure here.

How Does This Service Scale?

For each of these teams that have a dog under their care, we ask these teams, please spend some time understanding your service, writing about your service, and reflecting upon how it fits in here. Start with just an overview of the service. Then think about where the bottlenecks are, how the service scales. Then look at the sales targets that we covered that we have these numbers for, and write down which ones of those are applicable to your service. Maybe ingestion matters, maybe it doesn’t, depending on what the service consumes, who it talks to. Some are important, some aren’t. This is where we involved all of engineering. Every team did this. It didn’t take all that long. Our SRE team is already embedded in each of them, and they helped by guiding them through some of these processes to write down and give a unified format to these reports. As an example, our primary API service, this is the front door of Honeycomb’s ingestion. It accepts all of the telemetry from our customers. It does some stuff to it, authentication, authorization, validation, transformation. It sends it along to the downstream service, which is a Kafka cluster, for transmission to our storage engine. It has some dependencies. It depends upon a database for authentication tokens. It depends upon a Redis cluster for some rate limiting and things. The upstream services are our customers. The downstream services, it’s Kafka. The sales target affecting the service is clear. It’s the telemetry that our customers are sending us. When thinking about how it scales, it is a standard horizontally scaling service, stateless. These are the canonical scaling things. It can grow as wide as it wants, almost. Through some past experience, both planned and unplanned, we’ve discovered that as the number of nodes grows, the load on the database grows. When we get to about 100, the database starts to slow down, which means you need more nodes, which makes the database slow down, and it goes boom. That is clearly a scaling limit. We have it written down.

Now comes the fun question. We found the scaling limit. That’s great. We have to fix it, obviously. Maybe? Let’s take these sales targets and say, where are we going to be on this curve of our sales target? When are we going to hit this scaling limit? Is it within the next year? Do we need to fix it now? Is it a problem for future us? Let’s decide. For this one, as it turns out, we expect it to hit that volume by mid-year. The database thing was one that we had to focus on. We got this whole report, it did highlight the database load that we will hit before reaching our sales targets, also called out a couple other ones, and some that we don’t really know about. Those are a cue for running some chaos experiments, or other ways of testing our limits.

The Business Case

The business case came down to this one. It was very clear. We could go back to our sales team and say, we need to schedule some work to mitigate the load on this database as the service scales out. If we don’t, we will not hit the sales targets that you gave us. The answer is very easy. Yes, please schedule that work. How do we need to do it? We did this for all of our services and came up with a lovely Asana board showing some things to fix, some things to care about now. Some things that are probably relevant in the year. Some that aren’t clearly like yes or no, but might extend the runway, so we can keep them in our back pocket and understand that if we see something creeping up, that’s a thing we can do to make it work. There’s one tag, it’s called explody. You’ll see the label explody on some of these.

As an organization, there’s this fun side effect to this exercise that we got to talk about a couple of things that we don’t always cover in the regular day-to-day engineering. Explody is one of them. That’s a description of this API service horizontal scaling thing that I was talking about. It grows and you don’t really see much until you get to that threshold and then boom, everything comes crashing down. Those problems are scary and huge. Also, you don’t usually have a lot of time to react between once that cascade starts and when you hit the boom. It helps you understand a little bit about how to schedule that kind of work, the non-explody ones. As an example, if you have a service that runs regularly scheduled jobs, once per minute you run a job. Over time, that job takes longer, until it takes like 65 seconds. When it’s overlapping, the next one doesn’t start. That’s not explody. Some jobs won’t run, most will. It will degrade, but it will degrade slowly and it will degrade in a way that you can go back to those SLOs, and use that to help you figure out when to schedule it. Equally important problem, slightly different method of scheduling.

The last bit here, I don’t know how many of you have had a conversation with somebody when designing a system. We look at this diagram and this architecture and say, “No, that can’t scale. No, we can’t do it this way, that won’t scale.” It’s a struggle, because, yes, maybe it won’t, but do we care? Will the system work for us for now? There’s the classic adage of don’t design things the way Google does, because Google has 100 million times your traffic, and then you will never encounter those problems. If you follow that design, you’re doing yourself a disservice. These scaling problems, which ones to ignore, this mapping of sales targets to scaling limits is a very strong way to have that conversation and make it very clear that from the business perspective, we can wait for that one.

Close the Loop

We caught ourselves a strong business case, using the priorities of the business, time to fix some of these problems, and the choice to put some of the other ones off. We are now done. We have gotten this work scheduled, almost. There’s one more step here, which isn’t so much about getting the work scheduled in the first place but is about making our jobs as an engineering team, especially in a platform, a little bit easier the next time around. We need to celebrate. We need to show the success of this work that we’re doing. When we find one of these projects, and we finish it, we need to make it very clear that we put engineering effort towards a business problem in a way that has given us the capacity to grow the business in the way that we need to. Then, when we have finished it and it’s gone on, and we pass that threshold that was previously impossible, we can call it out again and say, remember this? Yes, we couldn’t have gotten here without it. This stuff is really important. It lends credence to our discussions in the future about what’s going to be happening when we have more technical work to schedule.

The last one, this was a fun experience from a near miss incident. There were some capacity issues, and we were struggling to grow the capacity of the service quickly enough. During the course of this near miss, one engineer bemoaned, it takes 3 whole minutes to spin up new capacity for this service. This engineer had been at Honeycomb for a couple of years, but that was still recent enough that they didn’t remember the time before we were in a Kubernetes cluster, when new capacity took 15 minutes to spin up. We put up a big project here. We’re going to change the way we’re doing our capacity and the way we run our services. We’re going to set a target of 10 minutes for new capacity, and new services to come up. No, let’s be aggressive, let’s set it to 5. Here we were recognizing it takes 3 whole minutes? This is great. This is amazing material because it gives us an opportunity to reflect on where we’ve come. For a lot of the platform work, for a lot of the tech debt work, that work is almost invisible. We need to find ways to bring it to light so that we can see as an organization the effect that this work is having on our longevity, on our speed, on our ability to deliver high quality software.

Summary

This is the recipe for getting technical debt work onto the roadmap. Understand what your business needs. Understand the impact of the projects you need to do to reduce that technical debt. Own the success of that business. Frame it in a way that is communicating that priority, not just the technical work. Use our power of data to back it up. Remember to ensure that you talk about the effect it had once you’re done, and that effect has been realized the business is exceeding those limits that it previously saw.

Questions and Answers

Participant 1: You didn’t bring one thing that was establishing a shared language or a shared vocabulary or framework for presenting the business benefits and things. What about scenarios where you have potential security issues? A scenario where I need to upgrade the library, because there is the potential for every bonus point on x. At which point now you’re just projecting what the potential damage should be. How do you solve that?

Hartshorne: How do we express security concerns in the language of business impact? I will admit, our security team is very good at that. I am not so good at it. There are a couple of ways. For larger businesses, there are frameworks that they’re required to operate under for security. There are agreements that one makes with both sub-processors and with customers around the response time to certain types of issues. You can use that to build an argument. If we are contracted to respond to a security incident within 30 days of being notified, we have to. You were asking about upgrading libraries because of potential flaws. That is definitely fuzzier. I was talking about how projecting projects that you’re going to make are bets. How much are you willing to bet that this is going to have that impact. Is it a cup of coffee? Is it a house? Framing security in terms of bets, it carries a little bit of a different feel to it. Yet, that is what’s going on. None of our services are 100% secure. We are doing risk analysis to understand which risks we are willing as a business to undertake and which we’re not. Do we have the recourse of the legal system if this particular risk manifests? Our business, and I think most businesses do have large documents describing their risk analysis for the current state of the world. The risks from inside, the risks from outside, the risks from third-party actors and so on. I think all of those are tools one can use to better assess the risk of a particular change. Those all come straight back again to business value.

Participant 2: I’ve often found that one of the hardest types of tech debt problems is to get to move on, to get buy-in. It’s the kind of problem where the benefit you’re going to get from solving it is freeing up your engineers to improve the velocity. You’re not solving tasks that really should be automated, but the amount of work to do and to automate it is going to take a couple weeks. We don’t do it, and in exchange for that sometimes we do it manually once every couple weeks, and it takes us few days to do it. It’s like an inducible cost from outside of the engineering organization, because you know that it’d free up a whole lot more engineering bandwidth to work on features and other things as a part of [inaudible 00:45:51]. I’m wondering if you have any thoughts on how to frame that as a business objective that would actually [inaudible 00:46:00].

Hartshorne: How do we schedule work that is toil that our engineers are doing successfully, because the job is getting done? There are a couple of ways. The first I want to highlight is the product philosophy of incremental improvements, shipping the smallest thing. There’s a wonderful diagram of how to build a car. One path starts with a set of wheels, and then a frame, and then some shell, and then the engine. At the end of it, you have a car, which, great, you can’t do anything until you get all the way to the end. The other says, maybe start with a skateboard, graduate to a bicycle, maybe a motorcycle, and finally, you’ll make your way to the car. I reference this, because a lot of those types of toil tasks can have small improvements that incrementally transform the task. As an example, you might start with a person that needs to roll the cluster by hand, and they go through and they kick each server in turn, and on they go. Simply writing that down in clear language, forming a checklist for it makes the next version faster. You can take two of those checkboxes and transform them into a script and it makes it faster. Each one of those is significantly easier to find the time to do than the entirety of the toil. That’s the first one.

The second is to bring up again, the idea of qualitative surveys. We ask our on-calls, every time they start a shift, and every time they finish a shift, a number of questions. It’s a light survey. Did you feel prepared for your shift? Do you think the level of alerting we have is too little too much, just about right? What came up during this shift? Did you feel stressed about any particular bit? What was the hardest part of this shift? Collecting those on a quarterly basis gives us some really good and unexpected insight into some of the types of toil that are coming up. Maybe it’s a flappy alert. Maybe it’s some service that needs to get kicked and need some improvements in its operational controls. Those are data. We can use those to say, our on-calls are spending 40% of their time chasing down false positives. This is wasted effort. That brings it back to expensive engineering time, and is, I think, a pretty good road into a compelling argument to spend some time improving the situation.

See more presentations with transcripts

Uncategorized

Podcast: InfoQ Culture & Methods Trends in 2024

MMS • Susan McIntosh Jutta Eckstein Craig Smith Ben Linders Rafiq

Subscribe on:

Transcript

Shane Hastie: Hey folks, it’s Shane Hastie here. Before we start today’s podcast, I wanted to tell you about QCon London 2024, our flagship international software development conference that takes place in the heart of London, April eight to 10. Learn about senior practitioner experiences and explore their points of view on emerging trends and best practices across topics like software architecture, generative AI, platform engineering, observability and secure software supply chains. Discover what your peers have learned, explore the techniques they are using and learn about the pitfalls to avoid. Learn more at qconlondon.com. We hope to see you there.

Introductions [01:10]

Good day, folks. This is Shane Hastie for the InfoQ Engineering Culture Podcast. Today I have the absolute privilege of sitting down in a global conversation with my friends and colleagues from the InfoQ Engineering Culture team and a very, very special guest. So we’ll start, I’m just going to go around and say welcome and ask each person here to introduce themselves. So Susan, please, let’s start with you.

Susan McIntosh: Hello from Denver, Colorado. I am Susan McIntosh. I am a scrum master and coach with a background in web and software development. And ages ago I was a teacher. So sometimes that pedanticism comes out.

Shane Hastie: Welcome, and a regular contributor to the culture and method staff.

Susan McIntosh: Yes, that’s right. And I forgot that.

Shane Hastie: And Jutta, you are our special guest today. So thank you so much for joining us. Would you tell us a little bit about yourself?

Jutta Eckstein: Yes. First of all, thank you for inviting me. So my name is Jutta Eckstein and I am based in Germany and I’m in the agile field maybe forever, I don’t know, started in ’97 with XP or ’96, I don’t know. And I wrote a couple of books actually. And those books are also telling my journey from looking at how to scale agile and how to distribute agile and what to do on the organizational view. So help bring agile to the organizational level. And my current passion is really about what is Agile’s responsibility in well, understanding that the planet is on fire and that we also have to take our share and look for more sustainable products. So sustainability is my topic.

Shane Hastie: Welcome. Thank you. Craig.

Craig Smith: Hi Shane. I’m Craig coming to you from Brisbane, Australia on the land of the Quandamooka people and I am an agile coach and a delivery lead and been around the agile community almost as long as Shane.

Shane Hastie: And Ben Linders.

Ben Linders: Thank you, Shane. Ben Linders from the Netherlands. These days mostly doing workshops on Agile, occasionally some advice for organizations, also doing a lot of writing, picking up on my second book on retrospectives, making some more time available to dive into aspects like psychological safety and retrospective facilitation. I think those are still very important topics if you want to get real benefits out of your retrospectives. So that’s what I’m mostly doing these days. And writing for InfoQ, of course.

Shane Hastie: Writing for InfoQ like a machine, Ben. I think you’re still one of our most prolific authors. And last but certainly not least, Raf, welcome.

Raf Gemmail: Hey Shane. So I’m Raf Gemmail and I live out in the wilderness of New Zealand, the far north, near little town called Whangarei. I have been on a journey with computers since a little lad and I was an early developer booking.com and around that sort of time I started looking at XP books. I still have a little XP book, which I carry with me almost like a good luck charm, but it was a world where once I went from there into banking, we were doing waterfall and the transformation was still happening. And so I rode that transformation through banking. I’ve worked at scale for most of my career. So I was at BBC iPlayer where I came across Katherine Kirk who’s spoken at some of our events and has written for InfoQ at some stage. And she changed my life pretty much making me realize that an engineer has other skills beyond just building software.

And she introduced me to Kanban and she continues to be an inspiration. But that took me on a slightly different path. As I went through my career, I got more involved in tech coaching and some agile coaching, DevOps coaching. I’m very much on the side of let’s ship, let’s build, let’s learn, let’s educate, let’s build the right culture. And I still bounce from code through to delivery and making sure we’re building the right thing and working with stakeholders. So I’m a manager right now with a MarTech company working at massive scale and getting some really, really awesome engineers who want to be in the code. Also think about the customer as well, and it’s a fun ride and I write for InforQ since 2012.

Shane Hastie: And I’m Shane Hastie, I herd this group of cats, officially they call me the lead editor for culture and methods here at InfoQ and heavily focused on humanistic workplaces and the people elements of getting work done well, particularly in socio-technical systems. And for InfoQ’s case, obviously software teams are my areas of passion. So because I’m leading the conversation, let me go into my passion. What’s happening with software teams in the last year and what do you see happening going forward?

What’s happening with software teams [06:19]

Susan McIntosh: One of the things that I noticed, I don’t know, maybe it was because I was paying attention to the staff engineer track at QCon last fall. One of the things I noticed though is that we’re relying on the staff engineers to basically be a coordinator for teams to communicate with other teams to make sure that all of those little dependencies are handled and managed and tracked. And it doesn’t sound like that’s something that staff engineers really want to do. So I was very curious to see why they were left with that task. And maybe, I guess that’s more just a question and an observation than anything.

Raf Gemmail: Observation in my space is I’m looking very much at how to grow engineers and I think it’s been something for a while. I love Patrick Hill’s work around what it means to progress in the field and I have engineers who are growing on that path and I’m presenting with them with the option of a technical route forward towards becoming a principal or a staff engineer or someone who has a broader impact. And I don’t like the idea of individuals becoming proxies, but I do wonder what, there’s a journey usually as you stepping into those roles, you go through the journey of I will do all the things because I know engineering really well and I’ve had to personally step away from that quite a bit to, hey, I’m an enabler and as an enabler I’m connecting people. I’m not the proxy. I am able to make teams more performant, make them take a macro view and improve the processes they work with.

And so yes, I don’t know about that specific trend, but I do know that there is a move towards how people are becoming proficient in those roles, can be successful using things like the application of observability to how we work. And that’s things like metrics around DevEx, metrics around friction. I personally measure waste in my teams, I’ve done that for years. Let’s see where the friction is, then we can do kaizen and make things better. If I have a spreadsheet, I can make a case now that we need to invest in local sandboxes or something because it has a cost. But being able to use those metrics really enables teams to put a mirror in front of themselves and say, actually these are the things we need to improve for a staff engineer or a principal to say actually strategically these are areas we need to improve.

Now I think that’s slightly divergent from what you’re saying, but the good principle, the good staff engineer in my humble opinion will then use that data to enable others and create an environment where people are more successful and can focus on their own value lines, in my opinion.

Jutta Eckstein: So if you’re asking about what happens to the teams, and maybe people think this is now boring what I’m saying, but I think that the biggest change that we have seen over the last years and also coming, is the working from home and what it does to collaboration. A colleague friend of mine Dasha, she’s researcher as well, she called said what we are seeing the flexibility and individualization paradox that on the one hand I guess we all love to be so flexible and work from home and we don’t have to commute and it just provides so many opportunities.

But on the other hand it just means we are all by ourselves. That means the ad hoc conversations are not happening anymore. And from what I heard, and I’m not a researcher so take it as such as, but what I heard and which is also my impression that as a result, on the one hand, the team cohesion is not so strong anymore, which also means the loyalty to the companies people are working for.

And what I even think is from my perspective more important is that there is less innovation happening because all the side conversations are missed and often it’s the side conversations that trigger a thought like oh yeah, well we have solved it like this here and maybe you can try that as well. Or, oh I see there’s this problem, maybe we should finally start doing something about that. And the way it is right now, we don’t know this anymore.

Raf Gemmail: I think that kind of points at the need to be more intentional in how we do remote and to make up for that. And my personal experience is all I can refer to here because I’ve seen the counter examples as well where it’s not as successful, but if you are intentional in things like making sure people communicate and that you have formal and social and non-social catchups, you have intentional bringing together of people across the business. So for instance, last quarter I took the whole concept of my big room planning, and I’m not a big quarterly planning fan, but sometimes you work within the persona of your organization and in this situation, brought together people from product, brought in engineers as well. So they were getting the insight and there were fruitful conversations.

There’d been some visits from senior staff, people getting together, intentional remote coffees, remote socials. You can be really crazy creative with remote socials and there are many things you can do to try and fill that gap. The side conversations around innovation, the coffee cooler ones may not happen as easily, but if you’re having random coffees, like I do informal conversations with people who are colleagues in other departments, people I don’t usually catch up with. And if you try and put that intentionality in, some of those ideas will trickle up, you can try and make up for that lost innovation in my opinion.

Ben Linders: I think a lot of organizations still have big challenges with teams and one of the reason might also be that they’re being too focused on the team as a kind of sole solution for the problem. Where the key aspect is that you want to have collaboration between people and they can be in the same team, but sometimes you don’t have all the people in that team and you have collaboration with people in other teams or other parts of the organization. So I think the focus should be much more on how can you establish the means and a culture where this kind of collaboration can happen and can flourish instead of trying to build strong team. Because actually some of the strong teams I’ve seen were so much inward focused that they had trouble fitting into the organization because they had difficulties communicating with the other parts of the organization because they were so strong and so good as a team altogether and they failed due to that.

So I think the focus should be much more on enabling collaboration in different ways, different formats, and yes, people being more remote makes this more challenging and I agree with you making explicit on what are you trying to communicate and how you’re trying to communicate stuff and how you want to work together is one of the key aspects to make this work, be more explicit in how you are communicating and how you’re working together to make it work because you need that because you don’t see each other on a daily basis, you don’t stumble into each other. So you have to be more explicit of what you want to do and what you expect for other people in there to make it work.

Susan McIntosh: It’s a really good point that now with the working from home option, which is becoming more standard, I’m looking at all of our backgrounds, we’re all working from home today. It just adds another dynamic that we have to pay attention to when we’re working in teams. Not only what is the team trying to deliver, but how are they working and what’s the environment that they’re working in, in order to make sure that they’re all as effective as they can be.

Raf Gemmail: I have a dedicated workspace and I think this is where I wish organizations would put their money, where their mouth is, there’s a return sometimes as you get people to work remotely, you may not need the physical space. I don’t know if that’s always reallocated, but I was looking at this survey around, like a state of remote work survey from last year and they said in there that 30% of people had dedicated workspaces, which means like 60% of people I think were working like couch-surfing with their computers and finding dining tables. And that is the thing that I think is the challenge, which worries me a lot.

You make allowances for people because there’s noise around them, there’s stuff happening and I don’t know what the specific solution to that is other than investing in giving people access to remote working spaces, the tools they need and make sure that these people are effective as you would in an office space. Also HR, physical responsibility around ergonomics and office equipment. Yeah, I’ve seen examples again in both directions, but there are definitely examples out there of people who are not necessarily working in the most effective environments.

Jutta Eckstein: I know at least in Europe that there are some companies who investigated in the workplace from their employees at home and so some of them, they just gave them a budget to set the stuff up the way they need it, which I think is a really great way. But it has also to do with trust of course, which is maybe a completely different topic.

However, I wanted to go back to what Ben was saying and there were two things for me. The one thing about, and I can’t agree more to that, the teams that are really stable are often then the ones who do the things like we are doing things here, there is nothing new and they’re not open anymore because they’re so, I don’t know, such a close group. And so I agree that we probably also need to look more at collaboration, maybe even across teams and so on.

And building on this, which is another thing that I kind of heard between the lines of what Ben was saying, I think this is really a big trend because what we are seeing more and more is that it’s not like the supplier, meaning any company building a product, defining how the user is using the product or what the customer journey is of the product. It is more and more the user defining how to match various products together and build their own customer journey.

And it might be completely different what the one person does compared to the other, which also means in order to be more flexible with that, we cannot stick to the idea of building this one individual product, but we always have to be open to understand that the product is actually just a piece of a puzzle for our clients.

Shane Hastie: I’d like to dig into a topic that has been on our trends report for the last few years and has been highlighted in QCon events and others. Developer experience, what is DevEx today and where is it going?

The importance and value of Developer Experience [17:37]

Ben Linders: Well, what it is, I think you can get thousands of definitions from what it is and one trend that I’m seeing actually is that I see more and more organization that say whatever it is, it is important and it does matter. So they’re spending more time on this, both for their own productivity, also for retention of their employees. They also see that if you take care of developer experience, people are going to stay within your company instead of leaving. So it is getting more important. I think what it is also varies from developer to developer, so there’s no single definition. And one other thing that I’m noticing is that if you really want to make this work, you have to do a kind of shift right in here instead of providing solutions, providing to providing an environment and getting people to accept it. To us, getting your developers involved early on in what is the kind of tool that you want to build in your organization up to maybe even rotating developers into platform teams and taking care of the tools that they’re using or giving them space to develop their own tools.

So organizations are very much focused on, we made some stuff and now we have to find a way to get our developers to accept it. I think they are on the wrong track. It’s not about acceptance, it’s about making stuff where everybody says, yeah, I’ve been waiting for this for weeks already. I’m happy it’s here now, of course I want to use it.

Raf Gemmail: I think this ties to friction removal as well. When you shift right, what you’re starting to do is observe what is the impact. It’s that whole build, measure, learn loop, right? And so the earlier point about measuring and metrics, ties in very much with that DevEx. I mean there are methodologies out there, but all the DORA metrics, all the things that give us insight to how we’re doing, but it’s responding to that and then putting the loop in. So I talked about capturing friction. The things I get are things which slowed my team down or multiple teams down where they’re like common things like an environment that will never work. And so once the person’s in standup and one team going, oh, I lost about two hours yesterday. I hate that thing. The other guy in the other standup, the other person in the other standup, and it accrues.

It’s a huge amount of time that the org is losing. Many developers are frustrated, their days have been made worse. And so by addressing that and measuring it, you can start improving the pain points which are causing developer friction. And that may be tooling, it may be education, it may be environments, it may be processes. But to your point about shifting right, I really agree with that. Observe and then make the improvements. And for me it’s about removing waste, which as a manager I can make a direct case for because it has an impact on our ability to deliver features for the customer because back to the old sort of thing about tech debt is also another point of waste. We can start putting numbers on it if we can see where we’re slowed down.

Shane Hastie: Teams and teamwork. There’s been huge amounts of research that talks to the great value of diverse teams and yet we still don’t see the level of diversity in our industry that is in society. Why not?

Diversity is still a challenge and there are concrete things we can do to create more inclusive environments [20:46]

Jutta Eckstein: Well, once I said kind of like Conway’s Law, an organization produces a design that reflects their organization’s biases, which is on the other hand something that’s also known that we try to have friends around us who are kind of like us, which is I believe a natural human thing to do. And in an organization that leads to less diversity of course, and often at least I would think, hope, not on intention, but it’s just also how we are. And so the key thing around that for me is to continuously paying attention to it. And this is tough, it is, and you’re never done. That’s how it feels to me.

And I’m speaking more from maybe a bit different position than some other people who might listen to that, but I’m trying to bring more diversity on stage. So at conferences and at times I thought, okay, if I focus on that for a few years and then this will be more automatically because then I have all this network and people will come back and they will invite difference and so on. But it’s not happening, every time I’m starting from scratch. And I believe it’s the same thing in organizations. It really needs continuous attention.

Susan McIntosh: Yes, continuous attention and not just in the hiring funnel or chain, but once you hire somebody and there’s so much evidence that having one person of one ethnicity or diversity channel, that one person isn’t going to feel a part of the group. If there’s just the one of them, you need more. My husband works at a university and I’ve seen this at universities dealing with first generation students. So those students who have never experienced what it’s like to be at a university, as an example, they don’t know that office hours are when they are allowed to go to the professor’s office. One of them thought that was when you should not go. So how do you make those expectations and the things that we all assume are correct? How do you make sure that those assumptions are more explicit for the opportunity for hiring more diverse staff to make sure that they’re welcome and included?

Ben Linders: I think included is exactly the word that I was thinking about. It’s not that much about diversity, much more about inclusion, making sure that when you aiming for diversity that the people feel included in the team, that they feel safe enough to join in, that they feel safe enough to speak from their own values, to speak from their own ideas within the team. Certainly when those are different from the majority of the team because that’s when it’s going to be challenging there. So you want to have an inclusive environment for diversity to work, otherwise it’s just going to fail.

Jutta Eckstein: Which pays back to the developer experience by the way, I think, because then people will feel more safe and therefore also try something and learn and be open. Yeah.

Ben Linders: Yes, and feel more included.

Raf Gemmail: One thing to watch out for from personal experience with this as well is as we are trying to learn and enable people, I talked about measuring waste. One observation I’ve had is across an org you can sometimes predict the people who are going to be more proactive and capture these things or make the loudest noise and there’s always going to be a bias in your data. So you need to counter that with make sure you get inclusion from the non-obvious. And so when you’re trying to optimize even for DevEx to make things better for bringing new perspectives, I think that again, back to the point earlier, you need to be intentional in how you bring in that diverse set of views.

Ben Linders: This is actually where working from home can be an advantage, by the way, if you have somebody who’s more introvert, if they feel safe at home, if they take time and space there to think about how to do stuff and to write it down and contribute in that way into a team, they can feel much more included into the team instead of being in a meeting in the organization as an introvert where other people are making a lot of noise and bringing forward their ideas. So actually working from home can create a more inclusive environment in that way.

Jutta Eckstein: I would think it’s creating more an inclusive environment in terms of, well, if you have health issues, if you have a familiar situation that makes it just hard to be away from home for long hours and so on. For the other stuff you mentioned, I’m not so sure if it just relates to facilitation techniques because also if you’re in a room you can ensure that people do individual work or they work in pairs and then we have the whole plenary and so on and this way inviting various voices.

Overall, I think we all need to understand that more diversity in a team is not only will perhaps attractive, trendy, anything like that, but it also makes the product better because we all also see better what different kinds of users might be out there and therefore we create the products for a bigger audience. And I think this is something we all need to understand and therefore it also pays off. And of course there are the various studies which I don’t have handy, but where they say more diversity also means more productivity, maybe yes, maybe no. Definitely it opens up perspectives for the product.

Raf Gemmail: I’ve shared this in a previous one of these years ago, but I was working previously with a startup, which was sadly a casualty of recent economic climate. But in that startup we were an educational institution and in that educational institution we tried to bring in sort of people from diverse backgrounds, people who were not in technology because they were from an underprivileged background, people who’d not come in. And we counter gender bias, we counter ethnicity bias by being very, very analytical about how we brought people in. We were looking for people who had certain ways of thinking, who displayed an attitude to be able to be effective as software engineers. And there’s some machine learning stuff behind it as well to identify that. But these people have gone off and been successful. They’re people who completely changed their industries from admin or retail work or bar work to working in software.

There’s a person out there who I recently saw became a scrum master and she’d been a BA for a little bit and she was an effective communicator. She was great at bringing people together and she’s in the industry now. And so I guess where I’m going with this is that the recruitment face, if we are being meritocratic, we quit asking how we challenge those biases, we can get a more diverse set of people in. I’m kind of trying to do that at the moment, or at least a colleague of mine is in terms of looking at recruitment and how we can be more objective about it, create feedback loops, calibrate on how we assess someone who’s coming in so that it is based on their ability. And there are a lot of people with great ability, good learning mindsets, who you can bring into an org and who will be successful.

And then there’s the other side of it which you talked to, which is like creating the culture where once they’re in, they can continue to grow and learn and develop. And that again goes back to the principles. It goes back to others who can support them on that journey, make sure that they’re continuing to learn. Managers are taking them on that path, creating educational opportunities, pairing opportunities so that once they’re in, the pot of gold is at the end and it’s up to them to, they’re supporting trying to make the weather. I have my own neurodiversity issues. Another part of inclusion here is inclusion of people with different neuro diversities, which can really bring new insights into a team and challenge the way we build software or we build solutions for the customer.

Jutta Eckstein: What you pointed out, reminded me of something else as well. So it’s not only what’s happening in the team, but also how we look for example at outsourcing. And there are these two different kinds of, at least there are maybe more, but I know about the two ones, which is on the one hand impact outsourcing, which is about training, hiring marginalized people. So this is kind of what I heard what you said Raf, as well, but more like how bringing them into the team but you can also think of a constellation with outsourcing here. And the other one is ethical outsourcing where you just (“just” in quotes) ensure that your own values are also adhered to in the outsource company. So your own social or maybe also environmental ethical standards are true for the outsource company.

Shane Hastie: Craig, you’re being quiet. I saw you try to say something.

Adopting modern leadership approaches is going slowly [29:56]

Craig Smith: So I think the thread from listening to all of those great topics for me is there is an element to how we modernize, I’m going to say facilitation and leadership because it’s a little bit of both and we can’t put the blame squarely somewhere else, but when we talk about things like working from home and the whole remote thing is that we need better ways to be able to allow people to bring their best to the workplace. And the same goes for the inclusion topic. And the same goes for the DevEx topic is that unfortunately still in many organizations, we’re still running them like an industrial complex. The amount of time in this calendar year as we record this, we’re a couple of months in, I’ve been astounded by the amount of managers that have said to me that I have to bring everyone back to the office so I know what they’re doing.

And it’s because we haven’t educated our leaders and facilitators on how to manage that. But I think also we have a challenge as a community about how we also better support that. And I think it also comes down to things like tooling as well. Unfortunately, tooling like Microsoft Teams and Zoom as good as they are, are not ways to have remote collaborative teams working effectively. But also what was also mentioned there was things like investing in technology. And Shane, you and I know with the organization we work with, the amount of people who physically can’t turn on a camera because it doesn’t work or has been disabled at their end or their bandwidth typically throttled by their organization, that is so hard. So we’ve got a lot of work to do to enable this, but I think just saying, let’s go back to the old paradigm is an extremely backward way of thinking.

And as we start to think about all the challenges we have as a world, how do we actually help leaders, but also how do we help just those people who are facilitating, those people that are self-starting and self-organizing also, I think there’s a real challenge there for us, and I don’t think we have all the answers in the toolkit that we have right now.

The opportunities and challenges with LLMs [31:59]

Shane Hastie: There’s a few big topics that I want to make sure we get to. One leads very much from this conversation about remote teams, in person teams and also weaving into this, the application of large language models and AI. I’ve heard and seen and made the statement myself that leveraging the tools like copilot, like Chat GPT, like Perplexity, the large language models that are out there now and there’s so many more of them. Last I heard there were 63,000 new AI companies in the last 12 months, which is phenomenal and 80 plus percent of them I’m sure will disappear at some point. But I’ve said, and I’ve seen it said that using these tools will make good programmers, great and bad programmers, redundant, the same for scrum masters, the same for business analysts. The thing that I see and worries me is how do we bring new people in with the base skills if we’re giving them these tools or am I being naive?

Susan McIntosh: There’s a danger, right? AI and large language models and generative AI, they can create so much content, but how good is it really? And Ben, you’re laughing, but it’s true, right?

Ben Linders: Yes, it’s true.

Susan McIntosh: It’s so easy to see the articles that have been written solely off a generative AI program. And I admit I use it, I try it because it’s so much easier to start with a draft than it is to start with a blank page when you’re writing, but it’s not the best of editors. And so it’s a good way to start instead of starting from scratch, but then we need to learn a new skill of editing and fine-tuning the information that we get out of AI and generative AI. It’s a challenge.

Jutta Eckstein: Definitely it is. So my first thought was especially the way you were stating in a chain, isn’t it the same? And of course it’s different as what we have seen for many years, which is a fool with a tool stays to be a fool. And that’s really, I would think still true here. And yes, I see that risk of well, you can get up to speed, but then to really use it and leverage it, you have to probably also cross the chasm. And the question is if you’re seeing that you have to do that or if you just then say, I lean back because I see it so easy and I’m having all that progress and therefore I’m not even seeing the chasm I have to cross. And again, so I think it is the same as we have seen with other tools and yet again, it’s very different because it’s so much more powerful and it’s just everywhere.

Ben Linders: And being much more powerful makes it even more dangerously. I would say that with AI, I think 90 or 95% of us should consider ourselves to be the fool and not the expert enough to work with these tools. And I use them myself in some situations and I was just comparing on what Chat GPT was telling me as a solution for a certain problem and what I could find out with Google and I got more or less the same stuff I did with one important difference, with Google I could see the source, I could see where the information was coming from. And for one topic, actually 95% of the information was coming from commercial parties who were trying to sell something and that wasn’t Chat GPT telling me. Google, I could see that.

And that made a huge difference on, okay, how reliable is this information? Is this something that’s proven or is this your sales talk? And it was sales talk, but Chat GPT didn’t present it like that. So I think we have to be even more careful with using AI tools right now because stuff may look very nice and may sound reasonable, but you have to be really, really well-educated and really well first into the topic to judge whether the information that you’re getting there is valid or not valid and what is the stuff you can use and cannot use.

Raf Gemmail: I think that’s going to evolve though, Ben, a bit. There’s a thing called drag, this retrieval augmented model thing where you can have an L and M and you can get extra data in and you can go off and query stuff. I’ve started playing with Bing’s chat copilot thing, which is a GPT-4 and that thing will show you some citations as well. Now the Bing garbage ones as well. I’ve gone in and it’s not exactly what I think it is. I think I had one which redirected me to the police when I was asking something about cyber bullying and it didn’t actually show me the source. The source and the link were different, but this is an evolution. We’re in the early stages of a new product and so I think it is going to improve. A friend of mine is in a startup with some of the people from Hugging Face and he’s in the University of Auckland down the road. New Zealand is a great place.

He was working on stuff to do a specialization of LLMs for your local domain and your local context using open source technology. And their thing is, let’s go to your data, pull stuff out and then you can specialize the model to give you better answers. Now because we’re in the early stages, I think we’re going to get a lot of garbage out of it and the buyer beware or user beware of like we need to self-regulate, review what’s coming out of it, also review what we put in. There are lots of people using Open AI’s model, Bing’s model, and these are things which come with clauses saying, hey, we can use your data for learning. So look at the fine print regardless of which one you are using. Also open source models, I have a machine here, people are paying fortunes for these massive models.

I have an AI rig here at home which has consumer grade GPU in it, and I can do chats, type things. I can get some information out of it, image gen. So my hope is that people will invest a lot more in some of the open technologies, in building models which are able to look at their local domains a lot better. And because we’re in the early stage, it’s like a few big players have given us some fancy toys and we’re like, wow. But I do think the sophistication will improve. It kind of ties in with years ago, I remember, I think it was, who’s the hackers and painters guy? Paul Graham, I think it was him. There was a piece I remember seeing long ago when IDEs were kicking off and I was like an Emax user at the time and it was like, hey, IDEs make developers stupid or something like that.

There was some sort of catchphrase and as you went through it, it was like, hey, you’re going to rely on this IDE. And we’ve come to a point now where the features of our IDEs, like templates and things, they don’t take away from our need to think. They give us tools to do autocompletion and we use that to build the right solution to understand the customer’s need and figure out how our architecture plays with the piece of software we’re building, at least from a software perspective. I do think it looks like magic at the moment, but I think it’ll become another tool in our tool belt, at least for a while.

Sustainability and climate impact of technology [39:11]

Jutta Eckstein: Well, with my passion on sustainability, I cannot talk about AI without bringing up that topic as well. I’m sorry, I just have to, and we still too often look at, oh, what are all the great options or also what are the risks? But really one thing we all have to know that it really comes with a higher cost to the planet. And I know those estimates, they are also varying, but one says at the moment, training a large language model is about the same demand as my country in Germany. And again, I have not done this research so it might be wrong, but definitely the energy needed is extremely high and just thinking this is all there for free. It’s not.

And the other thing which also is for me connected to sustainability is probably also not news, but of course the models affect with the data that is there and the data is biased. And so speaking of social sustainability and inclusion and all of that, also what we talked about earlier, diversity and you name it, same problem. And maybe both of the problems are even getting worse with seeing that it’s used even more and I guess we really need to solve that.

Shane Hastie: Let’s dig into sustainability, climate impact. Jutta, this was the key reason for inviting you on is the work that you’re doing there. Do you want to tell us more about the work that you’ve been doing and what are the implications for our industry?

Jutta Eckstein: So I do various kinds of stuff, but the one thing that I am doing, so I’ve created an initiative with the agile alliance, so the Agile Sustainability Initiative, and it started in September, 2022 and we just tried to increase the awareness on the topic and therefore also hopefully increasing the responsibility of people that we actually can do something about that. So it’s not the topic of other people, but we also have our part here. The way I look at it is if we are looking at agile and sustainability in particular, I see on the one hand that what we can do is using agile to increase sustainability, which is then called sustainability by agile, meaning we use all our practices, values or maybe, I don’t know, mindset principles, you name it and help others to get more sustainable. And there are a lot of people doing that in various NGOs, mostly.

It can be something like heck, your future is one of the things that I kind of like, and this is an NGO working with refugees, helping them to on the one hand practice their developer skills and also show their developer skills so it’s easier for them to get a job and I know some people who’ve helped them as a scrum master for example, to do exactly that. So this is using whatever skills you’re having and they’re really various ones. Another example is also what I did together with Steve Holier. We worked for NGO in the climate sector and we just use what we know, which is open space, events storming or story mapping or whatever we felt was needed for them to make their next step. So it’s using all that stuff that we know and tell. That’s the one part. The other part, and I sometimes refer to this as our homework, the other part is sustainability in agile, which is looking what we are actually doing when we develop, well at least my clients mostly develop software.

Carbon footprint as a quality attribute of software [43:30]

So when we develop software and can we do this better? So one basic thing, well, which is basic but maybe not so much so, is to monitor the carbon footprint as part of a quality attribute in the Definition of Done so that it’s really embedded in the work we are doing. Maybe one last notice for me it’s important also to understand that sustainability really is a thing that looks at various different things. So the one thing is the environment and we hear stuff like green computing, green software and things like that. So we look at the carbon footprint of the system we are using. There’s also a lot we can do. Another part is a thing we have talked about already. So it is the social, all the people aspect. So it has to do, if I translate it directly into software, it has to do with diversity, inclusion, safety, security, then me being German, privacy is really to our heart. I guess it’s in our DNA.

And the third one is economic or prosperity, where we think more of that the whole product is responsible or that we are talking about sustainable economies. So the reason why I think it’s so important to look from these three perspectives is that it’s so easy to focus on one at the cost of the others. We can do it really in a great way here, speaking of Germany, at the cost of that, well this is actually what we are seeing right now. So very often I see these waste monitors and then Germany is really great. Well why? Because we ship our waste to other countries and then they have the problem and we are fine. It’s unfortunately, but this is what’s happening. I think I leave it there, could go on of course.

Shane Hastie: And we’ll make sure that there’s links in the show notes to that work that you’ve been doing on the Agile Alliance. Anyone want to pick up on that? Yes and to this.

Raf Gemmail: I love this, just throwing it out there, the idea of the carbon footprint in your DOD just made me feel like that’s awesome.

Jutta Eckstein: You know what the interesting thing is and well I discover so much on my way of working on that is that a lot of the things, if you speak about the environmental aspect goes actually back to really good principles and coding habits that well, people who are maybe my age have learned that at one point, maybe we forgot it and the reason was because we didn’t have the memory, we didn’t have the bandwidth and all of that. And if we rediscover all these patterns, principles, that would really help also the software to be more environmental friendly. So it could be just perhaps one thing is if we think about jobs that can run asynchronously, to give a real example, what we can do now is we look at when is the time where the energy mix in our area or where the job is running is renewable and then have that job run asynchronously at that time versus in the past we did also something like that.

We just looked at, well actually it’s similar because we looked at are the servers really under power like crazy and therefore we cannot run that job so therefore we postponed it that it’s run during the night. Well now maybe the night is not so good, well depending, well yeah, but if you are more looking for solar power, then maybe the night is not good. But it depends what kind of energy is used. But yeah, so again, I think that that’s really interesting for me to understand that the good old principles, patterns, they should be revived because they help us.

Raf Gemmail: I wonder how you measure things like the carbon footprint there as well because I love this because it ties in also with, we talked about cost savings earlier and having observability of your fleet and does an EC2 instance need to be up at all, right? At a certain point in time. Can we spin things down, use spot fleets, can we use containers or serverless or be more clever with that? And to your point, just going back to engineering, like we’re optimizing our software to be efficient in a way that has an impact, but how do you compute the carbon footprint?

Jutta Eckstein: Very good question. I don’t have a good answer. And the reason is that the tools are changing all the time, but there are tools available and I would point people to the Green Software Foundation who has really developed and makes that available. A lot of open source software which you can use to actually find out about that. Maybe one other thing, because you were talking about the cloud, what you now can do as well, for example, if you use Google as your cloud, the region picker allows you not only to look at latency or cost or anything like that, but also at the carbon footprint. And so you can also make this one of your criteria for where are you going in the cloud. And again, what I’m trying to do with the work that I’m doing is just to increase the awareness, what’s possible. And on the other hand, I also have to admit it’s really hard to keep up because there is so much going on, which is excellent, which is great. So it’s really a completely new field where stuff is happening.

The passion for making a difference [49:21]

Craig Smith: And I think this is a bit of a golden age. I liked how you talked about we have to relearn some of the original things. Like many of us on this call, we talked about, we’ve been around this since the start of agility, for example, a lot of the issues that we’re having in the community as people have forgotten why we did things the way we did them, and it’s been overloaded by lots of frameworks and processes and practices and things like that. I know sometimes you have to remind people and go, well, the reason we do this is actually you are new to this and you miss the 16 steps of things that we went through to get here. I feel the sustainability thing is now the same thing is that many organizations are talking about the fact that they want to be carbon-neutral, green, zero, all those type of things.

But we haven’t been very good at bringing that back to just the everyday person, the engineer who’s sitting there cutting code in a bank or something like that. And what my hope is through your work, Jutta, is that for people listening here is it’s a bit like agility, that didn’t start by a manager or a paradigm in an organization saying we need to be better. It started by the practitioners themselves going, we need to do better. And this to me feels like that rebirth of innovation that I felt when agility started and that’s why I’m so excited about the work that you’re doing. It’s going to take a while, but we had to start somewhere. And as you say, it’s awareness, but that awareness means that it’s not going to happen because your manager’s going to come down and say that, people find it hard to link their work to their organization’s targets. It’s up to us to start to have those conversations both as individuals but also as teams.

Ben Linders: One thing I’ve seen is that there’s a lot of people out there who are by nature environmentally aware, this is something that’s in their DNA, this is something that’s been important for them for many years already and they’re now trying to combine this with the technical work that they’re doing. So these are people, software engineers, maybe people working as architects and they’re looking for ways to do it in their work. And if you see somebody in your organization who’s having this awareness and is looking for ways to do it, give them space and try to find a way to leverage this and to help them to look for ways to do your software development better because they are driven from the inside, actually.

Jutta Eckstein: They have the passion, right?

Ben Linders: Exactly. They have to have the passion. They have some ideas, they don’t have the solutions, but they’re open to solutions that are out there. They’re looking for ways to apply that in their daily work and do that with their teams. So if you have somebody like that, please give them space, support them, make clear that you do find this important in your organization and support them in any way you can to be there and to do this work.

Jutta Eckstein: I like how you both, Craig and Ben, mentioned that it’s probably needs to be more bottom up and kind of more like agile came into place and I think that way as well. But I see in companies that, at least in Europe, it’s often also starting top-down, however, then it is stalling very soon because what’s done top-down leaves out our software development. They think about, well, let’s change our fleet and we have e-cars or no cars or bikes or anything like that, but then they’re done. Versus if you embed it in your daily work, then on the one hand you have more awareness throughout the organization. So everyone sees it as their own topic and it’s not a topic from somebody. And you can also make a difference on the long run and not only kind of the low hanging fruit. So yeah, bottom-up, I agree.

Awareness and greenwashing [53:16]

Raf Gemmail: This ties in the parallel with something I heard earlier in this week, which is slightly tangential, but there’s a lesson in it possibly for sustainability, which is that there was a talk by Laura Bell around DevSecOps and she was talking about how they were with DevOps and many of these other initiatives, it was a really good idea to have an outcome which was very humanistic, which involved collaboration. And then a term comes around, it gets taken into the org and it falls into the anti-pattern, which is the one it was trying to counter. You end up with the DevSecOps team, you end up with the DevOps team, you do not end up with the collaboration in the team. Agile, same thing. We have frameworks like Scrum that came in, we had values and principles. We are about ceremonies and the next sprint and the meaning is lost from it.

And in this space of environmentalism, we see a lot of greenwashing across products. People are going, “Hey, we’ll do the minimal thing. Look, we’ve got a picture of us on bicycles.” And I don’t know what the solution is there, but often well-meaning initiatives which start from the ground up can sometimes be taken out of context intentionally or unintentionally and then fall into some antipattern which doesn’t address the thing that you were trying to address in the first place. So I do wonder how we can try and avoid something like that for green technology because our world, we don’t have long, we’ve already gone over our emission targets. So it’s like it is imperative. And I agree with you, I don’t think we worry about this as much as we should or give it as much credence as we should.

Jutta Eckstein: I have a maybe very specific opinion about greenwashing, which is that, well, maybe I’m just an optimist. I think perhaps it is good because it says that the companies are really thinking this is an important topic, therefore they have to fake it. And my hope, and again I’m an optimist, is that this is their first baby step before they do the next one. So the fake it till you make it kind of approach. And yes, probably I’m wrong, but I’m just hopeful.

Ben Linders: Some make it, some will stay fake.

Susan McIntosh: Yes.

Jutta Eckstein: Yes, that’s true.

Susan McIntosh: But if organizations emphasize their desire to want to go green and they talk about it with the organization, with everybody in the company, then it’s possible that the software engineer might say, oh, well maybe I should pay attention to how much energy I’m taking to write this 16 line piece of code.

Ben Linders: That’s again given space.

Susan McIntosh: Yes.

Raf Gemmail: This is why I drew the analogy to security. Just because security is something that we know has a high risk. When it goes wrong, when our planet is past the point of no return, the stakes are very, very high. Yet security is something which is often skipped. Here are the ACs, what about security criteria? The non functionals, they’re not always the top and foremost. Sometimes it’s because of lack of knowledge. But the same thing with sustainability. The org may value it, but we need to then make a case that this story will take X times longer or this feature will take X times longer because we’re being environmentally aware.

Now, I don’t necessarily think that’s the case. I think we can get into a place where it’s the way we work. We can start using low energy servers, building for arm architectures. There are all sorts of things we can start doing to get there. We need to value it and we need to value it more than we value shipping the feature, which is where maybe I have a little cynicism about the way we work in general, but I think if we keep championing it, we can make a difference there.

Jutta Eckstein: Absolutely. And I think security is a great example because if I think about security 10 years ago, not much has happened at the time, people were talking about it but not really paying attention. But then over the years, quite some stuff has happened and now it seems at least in my world that people are considering it and it’s part of development and it’s not like a thing that’s postponed to Neverland. And so this is also what makes me hopeful that this might happen with sustainability as well. So that the more we bring it to our awareness, and of course also the more we learn about it, what we can actually do, the more likely it is we will actually do something. And talking about the doing, I feel like I want to share at least one more concrete example that people can do, which is based on the fact that very often it is actually software that is creating the e-waste, the electronic waste.

So if you think about your phone, whenever you substitute it with a new one, probably it’s because the OS is not updated anymore or the apps are not running smoothly anymore. Well, there might be some exceptions that I don’t know, you have smashed it too often or anything, but most often it’s really the software, the reason why people buy a new phone. And the same is actually true if we think about the products we are creating in our companies for our clients, that very often we expect the clients to have the latest hardware. And if we are just starting there to come up with a responsive design, having, I call them sustainability toggles, which are kind of feature toggles where we discover what client is using the product and then therefore offering these features and deactivating others because they are not supported on that client, that would already save a great bunch.

So that’s also probably an example of going back to the old principles, patterns and so on. But it just felt like, I want to give at least one more concrete example so people can have a takeaway here. This is something I can do. Yes, you can.

Shane Hastie: Concrete advice for engineers working on real world products. Thank you so much. Folks, our time is almost up. As always, it’s a wonderful opportunity to get together around the screen with this team. We don’t often get together in person, but virtually we get a lot done together. We’ll make sure that all of the links, the references from this conversation are in the show notes, and of course, all of our contact information, our info queue profiles and Jutta, your contact information will be there as well. I’d like to end with one point from each. What is your hope, wish, or desire for software engineering in 2024?

Hopes and wishes for software engineering in 2024 [60:03]

Jutta Eckstein: I think we should not only always look at the right solution to whatever problem we have at hand, but instead look for better questions and let those guide us.

Susan McIntosh: I like that, Jutta. I keep thinking of the multitude of things that we’re trying to juggle these days, work from home and artificial intelligence and sustainability and all of these different mindsets that we have to think about. Perhaps with all of these different ideas floating around in our heads, we’ll come up with a very simple solution that will solve everything, that would be a dream. But maybe with so many very smart, talented engineers and other people in our industry thinking about it, we can come up with something.

Craig Smith: I think we’ve spoken a lot about the tools and the things, and my hope is that we can help people find better ways of working to support these tools, whether it be sustainability or AI or other things. We have to keep moving forward, and I think sometimes we, we’ve seen this for a long time, people still resist. We’re still trying to apply sometimes, if not 20-year-old solutions, 120 year old solutions to the problems. So we have to move both in at the same pace, and I don’t think that some of our ways of working are keeping up, and I hope that’s a challenge for some of the people listening, that we can have those conversations and talk about them here on InfoQ as they start to emerge.

Ben Linders: My wish is that we take time to stop occasionally, to go slower, to reflect, to listen to each other and use that in the end to go better and to go fast, but it is something that we all need to do naturally. Things are going faster and faster and everybody’s going with the flow, with the current, so unless somebody occasionally interrupts in there and whether that’s a reflection or retrospective or somebody from lean says like, okay, let’s stop the line because something go wrong here. Whatever term or mechanism that we want to call it, it doesn’t really matter, but take the time sometimes to just stop a moment think and then continue.

Raf Gemmail: For 2024, I’d like us to continue building on the feedback loops we have everywhere. I think we’ve talked about frameworks and things, which can become dogmatic, but I think we’ve got to a stage where we can be really evidence driven. We can build learnings, take those learnings and fit them into how we work, our processes, our tools, our ability to deliver customer value. We were already doing that. We’re measuring so much more than we were, and this has gone early majority and beyond, the whole build, measure, learn thing, experimenting with customers, communicating with them hopefully is something we’re doing a lot more, but let’s apply that everywhere, right? We’ve got the DevOps life cycle. Same thing, right? We’re learning, we’re optimizing. I think there’s a lot more data we can collect and we can hold a mirror to ourselves and look at it, and that’s something that’s become part of my journey a lot more is look at the data, try and understand what’s there, talk to the people, understand the context of the data, but at the same time, we have a lot more ability to make decisions that are informed.

I think we should use that. On another note, there’s a lot going on in the world at the moment. I was a child who grew up during the Cold War and my fears with ones I thought my kids would never have, but now there’s a lot of stuff going on and you’ve got younger developers in the industry, people working across the industry who may be dealing with new things, new threats, new notions that I’ve, maybe I’ve reverted back to an old model, but I think there are younger generation out there who are like, whoa, what’s all of this stuff going on? Because the environment wars and that instability on the planet, there’s all sorts of stuff going on at the moment, which are new threats and new fears for people, and let’s be compassionate about that.

Shane Hastie: For me, it’s all about the people. We build software that changes people’s lives. Let’s make sure we do it in a way that is kind to the people and kind to the planet.

Thank you all so much. It’s been fun.

Mentioned:

About the Authors

Susan McIntosh

Show moreShow less

Jutta Eckstein

Show moreShow less

Craig Smith

Show moreShow less

Ben Linders

Show moreShow less

Rafiq Gemmail

Show moreShow less

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Uncategorized

HashiCorp Released Version 2.3 of Terraform Cloud Operator for Kubernetes

MMS • Claudio Masolo

HashiCorp recently released version 2.3 of Terraform Cloud Operator for Kubernetes with a new feature: the ability to initiate workspace runs declaratively. The Terraform Cloud Operator for Kubernetes was introduced in November 2023 to provide a Kubernetes-native experience while leveraging Terraform workflows.

The Terraform Cloud Operator allows users to manage Terraform Cloud resources with Kubernetes Custom Resource Definitions (CRD). This operator allows the users to provision infrastructure internal or external to the Kubernetes cluster directly from the Kubernetes control plane.

The Key Benefits of Terraform Cloud Operator are:

Flexible resource management: Version 2 introduces multiple custom resources with dedicated controllers for various Terraform Cloud resources, enhancing flexibility and performance for managing custom resources concurrently in large-scale deployments.
Namespace management: Tailor the operator’s watch scope to specific namespaces using the --namespace option, enabling fine-grained resource management tailored to your organizational needs.
Configurable synchronization: Adjust the synchronization frequency between custom resources and Terraform Cloud with the --sync-period option, ensuring timely updates and operational smoothness.

In previous iterations of the Terraform Cloud Operator v2, initiating a run was limited to patching the restartedAt timestamp within the Module resource. This method proved non-intuitive, lacked universality across workspace types, and offered no control over run types, hampering migration efforts. However, version 2.3 addresses these challenges by enabling users to declaratively commence plan, apply, and refresh runs on workspaces, thereby empowering developers with enhanced self-service capabilities across all Operator-managed workspaces, including VCS-driven ones.

Version 2.3 introduces three new annotations within the Workspace custom resource to facilitate workspace run initiation:

workspace.app.terraform.io/run-new: Set to true to trigger a new run.
workspace.app.terraform.io/run-type: Control the run type (plan, apply, or refresh).
workspace.app.terraform.io/run-terraform-version: Specify the Terraform version for speculative plan runs.

This is an example of Workspace Resource and Annotation:

apiVersion: app.terraform.io/v1alpha2
kind: Workspace
metadata:
  name: this
spec:
  organization: kubernetes-operator
  token:
    secretKeyRef:
      name: tfc-operator
      key: token
  name: kubernetes-operator

To immediately initiate a new apply run for the above workspace resource using kubectl:

kubectl annotate workspace this 
    workspace.app.terraform.io/run-new="true" 
    workspace.app.terraform.io/run-type=apply --overwrite

After successful execution, the annotation is reflected in the Workspace resource for observability:

apiVersion: app.terraform.io/v1alpha2
kind: Workspace
metadata:
  annotations:
    workspace.app.terraform.io/run-new: "true"
    workspace.app.terraform.io/run-type: apply
  name: this
spec:
  organization: kubernetes-operator
  token:
    secretKeyRef:
      name: tfc-operator
      key: token
  name: kubernetes-operator

Post-run completion, the operator automatically resets the run-new value to false.

With Terraform Cloud Operator v2.3, initiating and managing workspace runs becomes more intuitive, empowering teams to efficiently manage infrastructure while embracing Kubernetes-native experiences.

About the Author

Claudio Masolo

Show moreShow less

Uncategorized

MongoDB (NASDAQ:MDB) Price Target Increased to $500.00 by Analysts at Tigress Financial

MMS • RSS

MongoDB (NASDAQ:MDB – Get Free Report) had its price objective hoisted by analysts at Tigress Financial from $495.00 to $500.00 in a research note issued on Thursday, Benzinga reports. The firm currently has a “buy” rating on the stock. Tigress Financial’s price target suggests a potential upside of 38.45% from the stock’s current price.

Several other equities research analysts also recently issued reports on MDB. UBS Group restated a “neutral” rating and set a $410.00 price objective (down from $475.00) on shares of MongoDB in a research report on Thursday, January 4th. KeyCorp boosted their price objective on MongoDB from $500.00 to $543.00 and gave the company an “overweight” rating in a report on Wednesday, February 14th. Mizuho raised their target price on MongoDB from $330.00 to $420.00 and gave the company a “neutral” rating in a report on Wednesday, December 6th. Royal Bank of Canada upped their price objective on shares of MongoDB from $445.00 to $475.00 and gave the stock an “outperform” rating in a research report on Wednesday, December 6th. Finally, Stifel Nicolaus reiterated a “buy” rating and set a $435.00 price target on shares of MongoDB in a report on Thursday, March 14th. Two equities research analysts have rated the stock with a sell rating, three have given a hold rating and nineteen have assigned a buy rating to the company. Based on data from MarketBeat.com, the stock currently has a consensus rating of “Moderate Buy” and a consensus target price of $449.85.

View Our Latest Stock Analysis on MDB

MongoDB Stock Up 0.6 %

Shares of NASDAQ:MDB traded up $2.33 during midday trading on Thursday, hitting $361.13. The stock had a trading volume of 321,265 shares, compared to its average volume of 1,411,965. The company has a market cap of $26.07 billion, a P/E ratio of -145.82 and a beta of 1.24. The company has a quick ratio of 4.40, a current ratio of 4.40 and a debt-to-equity ratio of 1.07. The business’s 50-day simple moving average is $416.10 and its 200-day simple moving average is $390.35. MongoDB has a 12 month low of $198.72 and a 12 month high of $509.62.

Insider Transactions at MongoDB

In other MongoDB news, CFO Michael Lawrence Gordon sold 10,000 shares of the stock in a transaction on Thursday, February 8th. The stock was sold at an average price of $469.84, for a total value of $4,698,400.00. Following the completion of the transaction, the chief financial officer now owns 70,985 shares in the company, valued at $33,351,592.40. The transaction was disclosed in a filing with the SEC, which is available through the SEC website. In related news, CFO Michael Lawrence Gordon sold 10,000 shares of the company’s stock in a transaction that occurred on Thursday, February 8th. The stock was sold at an average price of $469.84, for a total value of $4,698,400.00. Following the sale, the chief financial officer now owns 70,985 shares in the company, valued at approximately $33,351,592.40. The transaction was disclosed in a document filed with the Securities & Exchange Commission, which is accessible through this link. Also, CEO Dev Ittycheria sold 33,000 shares of MongoDB stock in a transaction on Thursday, February 1st. The shares were sold at an average price of $405.77, for a total value of $13,390,410.00. Following the transaction, the chief executive officer now owns 198,166 shares of the company’s stock, valued at approximately $80,409,817.82. The disclosure for this sale can be found here. Over the last ninety days, insiders sold 54,607 shares of company stock worth $23,116,062. Company insiders own 4.80% of the company’s stock.

Institutional Inflows and Outflows

Hedge funds and other institutional investors have recently made changes to their positions in the company. Jennison Associates LLC boosted its stake in MongoDB by 87.8% during the 3rd quarter. Jennison Associates LLC now owns 3,733,964 shares of the company’s stock valued at $1,291,429,000 after purchasing an additional 1,745,231 shares during the last quarter. 1832 Asset Management L.P. boosted its position in shares of MongoDB by 3,283,771.0% during the fourth quarter. 1832 Asset Management L.P. now owns 1,018,000 shares of the company’s stock valued at $200,383,000 after buying an additional 1,017,969 shares during the last quarter. Norges Bank bought a new stake in MongoDB in the 4th quarter worth about $326,237,000. Axiom Investors LLC DE bought a new position in MongoDB during the 4th quarter valued at about $153,990,000. Finally, T. Rowe Price Investment Management Inc. boosted its holdings in shares of MongoDB by 77.4% during the 4th quarter. T. Rowe Price Investment Management Inc. now owns 568,803 shares of the company’s stock valued at $111,964,000 after acquiring an additional 248,133 shares during the last quarter. Institutional investors own 89.29% of the company’s stock.

About MongoDB

(Get Free Report)

MongoDB, Inc provides general purpose database platform worldwide. The company offers MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premise, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

How to Tame Technical Debt in Software Development

MMS • Ben Linders

According to Marijn Huizenveld, discipline is key to preventing accumulating technical debt. In order to be disciplined you should make it difficult to ignore the debt. Heuristics like fixing small issues immediately, agreeing on a timebox for improvement, and making messy things look messy, can help tame technical debt.

Marijn Huizendveld spoke about taming technical debt at OOP 2023 Digital.

To prevent the accumulation of technical debt and foster discipline, Huizendveld suggested using a highly visible area to show the debt inventory:

In the past, I’ve made enemies in the short term, both within and outside of the engineering community of organisations, by featuring the technical debt board prominently in the hallway. It was positioned in such a way that you could not get around it, literally. That is confrontational, because it shows where the organisation is failing, but all you do is make a problem apparent that is already there.

It takes some time to adjust for various roles that feel a responsibility towards the underlying causes, Huizendveld said. In the long term, high visibility results in frequent conversations about the tech debt inventory and what you are doing about it as a software organisation, he added.

Good technical debt is intentional, enables benefits for the organisation, and is controlled. Teams can use a disciplined approach for managing and repaying technical debt, Huizendveld said.

Not having any technical debt means liquidity, and liquidity means optionality, Huizendveld mentioned. People have a bias towards preventing losses, called loss aversion in bias literature. You can leverage this bias to provoke change.

Huizendveld provided some heuristics that have helped him tame technical debt:

If you can fix it within five minutes, then you should.

Try to address technical debt by improving your domain model. If that is too involved, you could resort to a technical hack. In the event that that is too involved, try to at least automate the solution. But sometimes even that is too difficult; in this case, make a checklist for the next time.

Agree on a timebox for the improvement that you introduce with the team. How much time are you willing to invest in a small improvement? That defines your timebox. Now it is up to you and the team to honour that timebox, and if you exceed it, make a checklist and move on.

Don’t fix it yourself, if it can be fixed by machines.

If it is messy because you have a lot of debt, then make it look messy. Please don’t make a tidy list of your technical debt. The visual should inspire change.

Only people with skin in the game are allowed to pay off debt, in order to prevent solutions that don’t work in practice.

Huizendveld suggested leaving alone areas that are stable. You don’t need to improve something that does not get changed, he argued.

InfoQ interviewed Marijn Huizendveld about dealing with technical debt.

InfoQ: How much technical debt do we typically see in software development?

Marijn Huizendveld: There was research done by Stripe Inc. in September 2018, described in the research paper The Developer Coefficient, which stated that “given an average workweek of 41,1 hours, developers spend circa 13,5 on technical debt, and including bad code it’s 17,3 hours.”

If you consider that the remaining time of 23,8 hours has to cover all the meetings and ceremonies that relate to planning work to be done, then you could infer that there is really not a lot of time left for actual work. Even if the numbers reported by developers on the impact of technical debt are exaggerated by 50%, it’s still a sizable amount.

InfoQ: How can we measure success in dealing with technical debt?

Huizendveld: There are a bunch of indicators:

Team happiness should improve: track how people feel with a simple smiley rating, and use the answers to figure out if you’re on the right track. If people are frustrated because of the toil due to technical debt, then you know what to do. If people are happy because of the reduction of this toil, then you know doing more of it will bring even greater happiness.

Unplanned work should go down: there is a reason why a shortcut is called a shortcut: it is not a thorough solution. In some cases, unplanned work is the consequence of shortcuts having a meeting with reality. This hinders the throughput of the team, because planned work has to be halted, to deal with the issues that are occurring in the production reality.

Mean time to bug resolution should drop: there is a large category of technical debt that could be classified as technical debt in work processes. It’s the debt that hinders making changes to the system in the quickest way possible. For example, manual, error-prone steps to get the software into the hands of the users. When this category of technical debt is reduced, fixes are in production orders of magnitude quicker. On top of that, there is the fact that a lack of technical debt in the business logic makes it easier to address bugs, because the system behaves more predictably.

And finally, your time to market should shorten: In the end, it is the cumulative effect of all of these small improvements that make the big improvement reality. Faster time to market is the compound effect of happy teams that can focus on the planned work without distractions, in a system that works for them, not against them.

About the Author

Ben Linders

Show moreShow less

Uncategorized

xAI Opens Grok as an Open-Source Model

MMS • Daniel Dominguez

Elon Musk announced that xAI would make its AI chatbot Grok open source, and now the release is accessible on GitHub and Hugging Face. This move enables researchers and developers to expand upon the model, influencing how xAI evolves Grok in the face of competition from tech giants like OpenAI, Meta, Google, Microsoft, and others. This milestone marks a significant turn in the field of AI, allowing other developers and experts in the field to access Grok’s code and related data for analysis and development.

The release of Grok as open source is a bold step that will open up new opportunities in AI research and development. Previously, industry-leading models like Mistral AI’s Mixtral and Meta’s Llama 2 dominated the AI research landscape. However, Grok stands out for its colossal size, boasting an impressive set of 314 billion parameters, nearly four times larger than its closest competitor, Llama 2.

This massive size suggests promising possibilities in terms of model accuracy and interaction capability. Grok’s weights, which are essential for its operation, are available for download, enabling developers to experiment with its structure and behavior.

@Gradio shares an X Post on all things essential about xAI’s Grok-1 release:

Now that Grok1 is open-sourced, its time we learn more about the model. All things essential about XAI’s Grok-1 release: 314B params – 8x33B MoE – 25% weights active on a token Base Model (Little) Better than Llama2 & GPT3.5 Apache2 Built in JAX & RUST 8-bit weights.

Elon Musk’s decision to take an open-source approach with Grok responds to the growing demand for transparency and collaboration in the field of AI. By sharing the code and data, Musk not only fosters innovation but also promotes accountability and public evaluation of the model.

Seeking an alternative to OpenAI and Google, Musk launched xAI with the aim of developing what he described as an AI focused on maximizing truth-seeking capabilities.

Open-source software, like Grok, offers a range of benefits for both developers and the community at large. Firstly, it allows for greater transparency and auditing, contributing to the trust and reliability of the software. Additionally, it fosters collaboration and knowledge sharing among developers worldwide, which can accelerate the pace of innovation.

About the Author

Daniel Dominguez

Show moreShow less

Improving GitHub Deployments with Merge Queue

MMS • Aditya Kulkarni

About the Author

Aditya Kulkarni

Subscribe for MMS Newsletter

Did you know...

Databrix Announces DBRX, an Open Source General Purpose LLM

MMS • Daniel Dominguez

About the Author

Daniel Dominguez

Subscribe for MMS Newsletter

Did you know...

Transactional Serverless Computing: PostgreSQL Creator Announces DBOS Cloud

MMS • Renato Losio

About the Author

Renato Losio

Subscribe for MMS Newsletter

Did you know...

Apple Researchers Detail Method to Combine Different LLMs to Achieve State-of-the-Art Performance

MMS • Sergio De Simone

About the Author

Sergio De Simone

Subscribe for MMS Newsletter

Did you know...

Presentation: How to Get Tech-Debt on the Roadmap

MMS • Ben Hartshorne

Transcript

Background

Understanding Priorities

What’s the Prioritization Process?

Recap (Understanding the Priorities)

Selling the Impact or Your Project

Language is Important

Back It with Data

How Does This Service Scale?

The Business Case

Close the Loop

Summary

Questions and Answers

Subscribe for MMS Newsletter

Did you know...

Podcast: InfoQ Culture & Methods Trends in 2024

MMS • Susan McIntosh Jutta Eckstein Craig Smith Ben Linders Rafiq

Subscribe on:

Transcript

Introductions [01:10]

What’s happening with software teams [06:19]

The importance and value of Developer Experience [17:37]

Diversity is still a challenge and there are concrete things we can do to create more inclusive environments [20:46]

Adopting modern leadership approaches is going slowly [29:56]

The opportunities and challenges with LLMs [31:59]

Sustainability and climate impact of technology [39:11]

Carbon footprint as a quality attribute of software [43:30]

The passion for making a difference [49:21]

Awareness and greenwashing [53:16]

Hopes and wishes for software engineering in 2024 [60:03]

Mentioned:

About the Authors

Susan McIntosh

Jutta Eckstein

Craig Smith

Ben Linders

Rafiq Gemmail

Subscribe for MMS Newsletter

Did you know...

HashiCorp Released Version 2.3 of Terraform Cloud Operator for Kubernetes

MMS • Claudio Masolo

About the Author

Claudio Masolo

Subscribe for MMS Newsletter

Did you know...

MongoDB (NASDAQ:MDB) Price Target Increased to $500.00 by Analysts at Tigress Financial

MMS • RSS

MongoDB Stock Up 0.6 %

Insider Transactions at MongoDB

Institutional Inflows and Outflows

About MongoDB

See Also

Subscribe for MMS Newsletter

Did you know...