Cloudflare Workers Introduces connect() API to Create TCP Sockets

MMS Founder
MMS Renato Losio

Article originally posted on InfoQ. Visit InfoQ

During the recent developer week, Cloudflare announced a Worker API to create outbound TCP sockets. The new socket API allows developers to connect back to TCP-based infra directly from a Worker, including databases.

Available as a Runtime API, the connect() function returns a TCP socket that allows developers to read and write data until the connection remains open. Workers could already interact with HTTP endpoints and other Cloudflare services, but the vast majority of databases require clients to connect by opening a direct TCP socket. Brendan Irvine-Broque, product manager at Cloudflare, and Matt Silverlock, director of product at Cloudflare, explain:

With Workers, we aim to support standard APIs that are supported across browsers and non-browser environments wherever possible, (…) but for TCP sockets, we faced a challenge — there was no clear shared standard across runtimes. We’ve tried to incorporate the best elements of existing APIs and proposals, and intend to contribute back to future standards.

Last autumn Cloudflare, together with Vercel and Shopify, started WinterCG, a new community group, focused on the interoperable implementation of standardized web APIs in non-web browser, javaScript-based development environments.

The new API is accessed by importing the connect function from cloudflare:sockets. One of the common use cases is to create a connection to a database, for example:

import { Client } from "pg";

export interface Env {
  DB: string;
}

export default {
  async fetch(
    request: Request,
    env: Env,
    ctx: ExecutionContext
  ): Promise {
    const client = new Client(env.DB);
    await client.connect();
    const result = await client.query({
      text: "SELECT * from customers",
    });
    console.log(JSON.stringify(result.rows));
    const resp = Response.json(result.rows);
    // Close the database connection, but don't block returning the response
    ctx.waitUntil(client.end());
    return resp;
  },
};

Source: https://blog.cloudflare.com/workers-tcp-socket-api-connect-databases/

While pg, the JavaScript database driver for PostgreSQL, is already supported, the MySQL drivers mysql and mysql2 are not supported yet. Irvine-Broque and Matt Silverlock warn:

A new connection is created for every request. This is one of the biggest current challenges of connecting to databases from serverless functions, across all platforms (…) we’re already working on simpler approaches to connection pooling for the most popular databases.

The content delivery network expects to add more features in the future, including support for inbound TCP and UDP connections, as asked by some developers, as well as application protocols based on QUIC.

The connect() API was not the only new feature announced during the Developer Week 2023: Cloudflare introduced Secrets Store, a solution for managing application secrets securely, improvements to D1, Cloudflare’s serverless database, and consumer concurrency for the messaging service Queues. Furthermore, Cloudflare announced database integrations for Neon, PlanetScale, and Supabase on Workers. Karl Horky, founder at UpLeveled, tweets:

No proxy like Neon or other serverless/edge providers, you just connect normally over TCP. This sounds great, potentially way bigger than the other recent edge database announcements.

Each open TCP socket counts towards the maximum number of open connections that can be simultaneously open in Workers and TCP connections cannot be created on port 25 to send email to SMTP mail servers.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


AI, ML & Data News Roundup: Generative Fill, Copilot, Aria, and Brain Chips

MMS Founder
MMS Daniel Dominguez

Article originally posted on InfoQ. Visit InfoQ

The most recent update, covering the week starting May 22nd, 2023, encompasses the latest progress and declarations in the fields of data science, machine learning, and artificial intelligence. This week, the focus is on prominent figures such Adobe, Microsoft, Opera, and the University of Lausanne.

Adobe Has Introduced a New Feature Called Generative Fill In Photoshop

A new generative AI tool for Adobe Photoshop will enable users to swiftly extend photos and add or delete objects using text prompts. Generative Fill is now available in beta, but Adobe claims that a full version of it for Photoshop will come later this year.

Generative Fill utilizes its analysis of the surrounding elements and textures in an image to intelligently generate new pixels and seamlessly integrate them into the existing composition.

This produces visually pleasing and lifelike results, eliminating the need for manual reconstruction and saving users valuable time and effort. An impressive application of Generative Fill is its capability to remove unwanted elements from images. Users can conveniently select distracting objects in a photo and leverage Generative Fill to create visually coherent replacements.

This feature is especially beneficial to photographers, designers, and content creators as it streamlines the editing process while maintaining the overall integrity of the composition.

Microsoft Is Bringing the Power of AI to Windows 11

At Microsoft’s Build 2023 conference, several exciting announcements were made.

Windows Copilot was introduced, making Windows 11 the first PC platform to provide centralized AI assistance. This AI-powered feature aims to help users easily accomplish tasks and increase productivity.

Additionally, Bing Chat plugins are being extended to Windows, allowing developers to integrate their applications within Windows Copilot. This integration opens up new possibilities for better customer service and enhanced engagement with native Windows apps.

Another significant development is the introduction of a Hybrid AI loop, which supports AI development across platforms. This innovation enables seamless integration from Azure to client devices, with added silicon support from industry leaders such as AMD, Intel, Nvidia, and Qualcomm.

To further support developers, Microsoft is launching Dev Home, a resource designed to enhance their productivity on Windows.

Lastly, Microsoft is bringing new AI features and immersive experiences to the Microsoft Store on Windows, offering users even more value and convenience.

Opera Reveals Aria, the Integrated AI Browser Feature

Opera introduced Aria, their new browser AI, offering users access to a leading generative AI service at no cost. According to Opera, Aria is seamlessly integrated into the browser, revolutionizing the browsing experience.

Built on Opera’s Composer infrastructure, Aria leverages OpenAI’s GPT technology and incorporates additional functionalities, including real-time web results. Aria acts as both a web expert and a browser companion, allowing collaboration with AI while searching for information, generating text or code, or obtaining answers to product-related queries. Opera’s Composer infrastructure is designed for seamless expansion.

Aria can connect with multiple AI models and, in the future, integrate additional capabilities such as search services from various Opera partners.

Swiss Researchers Utilize AI to Reconstruct Spinal Cord

Swiss Researchers used artificial intelligence to help Gert-Jan Oskam, who had been paralyzed from the waist down for more than ten years, regain control over his lower body.

Following a motorcycle accident in 2011, Gert-Jan Oskam has been paralyzed from the waist down. However, there has been a recent breakthrough that has enabled him to regain mobility. Scientists have successfully developed a “digital bridge” that connects Oskam’s brain and spinal cord, effectively bypassing the damaged areas.

Through the use of an AI thought decoder, the researchers were able to capture Oskam’s thoughts and translate them into spinal cord stimulation, thereby restoring voluntary movement. Notably, Oskam has demonstrated signs of neurological recovery, as he is now capable of walking even when the implant is deactivated. This remarkable progress signifies a significant advancement in the field of spinal cord injury rehabilitation.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


AWS Adds Multi-AZ with Standby Support to OpenSearch Service

MMS Founder
MMS Renato Losio

Article originally posted on InfoQ. Visit InfoQ

OpenSearch Service recently introduced support for Multi-AZ with Standby, a new deployment option for the search and analytics engine that provides 99.99% availability and better performance for business-critical workloads.

With Multi-AZ with Standby, OpenSearch Service reserves nodes in one of the AZs as standby making deployments resilient to potential infrastructure failures and simplifying configuration and management. Prashant Agrawal, senior search specialist solutions architect, and Rohin Bhargava, senior product manager, explain the advantages of the new option:

When an issue arises, such as a node becoming unresponsive, OpenSearch Service recovers by recreating the missing shards (data), causing a potentially large movement of data in the domain. This data movement increases resource usage on the cluster, which can impact performance. If the cluster is not sized properly, it can experience degraded availability, which defeats the purpose of provisioning the cluster across three Availability Zones.

According to the cloud provider, the new configuration option improves availability to 99.99% and ensures that domains follow recommended best practices, simplifying configuration and management.

OpenSearch Service is a managed option to deploy OpenSearch clusters, supporting OpenSearch and legacy Elasticsearch OSS, up to Elasticsearch 7.10. Multi-AZ with Standby requires a domain running on OpenSearch 1.3 or above, deployed across three availability zones with three (or a multiple of three) data nodes. Furthermore, only GP3- or SSD-backed instances and a subset of instance types are currently supported. While the service distributes the nodes and data copies across three AZs, Agrawal and Bhargava warn:

During normal operations, the standby nodes don’t receive any search requests. The two active Availability Zones respond to all search requests. However, data is replicated to these standby nodes to ensure you have a full copy of the data in each Availability Zone at all times.

OpenSearch Service still supports multi-AZ without standby, offering 99.9% availability and lower costs as all the cluster nodes can serve read requests.

The new capability of the successor of ElasticSearch Service rotates the standby AZ every 30 minutes to ensure the system is running and ready to respond to changes, with the AZ Rotation Metrics exposing the status of the cluster, showing active reads and active writes.

The Multi-AZ with Standby is not the only feature recently introduced for the managed search and analytics engine: AWS recently announced the GA of the controversial OpenSearch Serverless option, OpenSearch Ingestion, a serverless data collector, and Security Analytics. The new failover option is currently available in most but not all AWS regions where OpenSearch Service is supported.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Tidy First?

MMS Founder
MMS Kent Beck

Article originally posted on InfoQ. Visit InfoQ

Transcript

Beck: One T years ago, I took a look back at my career at that point, and tried to make sense of all these disparate weird things that I’d done, whether it was patterns, or JUnit, or TDD, or XP, like what does all of that have in common? It took me a while and I finally came up with a personal mission statement. I said, what I do is help geeks feel safe in the world. That’s about the smallest statement I can make that encompasses all of the things that I’ve done in software development. What I’m talking about is a particular aspect of helping geeks feel safe in the world. It’s a question that comes up all the time for us. I’m working with some crummy code, how much time should I spend cleaning it up versus just working with it? This is a fraught question. It’s a complicated question. There’s no stock answer that makes much sense. There are people who will tell you things that don’t make sense about how you ought to approach that question. It’s a source of a lot of conflict. A lot of conflict between people who are making software design decisions, between business people and software people. It’s a source of that feeling of lack of safety. That’s why I’ve been working on this particular project, which is around software design.

Updating Structured Design

In 2005, I was invited to be on a panel to celebrate the 25th anniversary of the publishing of this book, “Structured Designed,” by Ed Yourdon and Larry Constantine. This is actually my college textbook, from way back then. I had this as a college text. Twenty-some years later, I was invited to be on a panel with Ed, and Larry, and a few other people to talk about the impact this book had had. I thought, it’s about time I read the book. I started reading it. The more I read, the more excited I got. At that point, I’d developed a lot of software. I was talking a lot about software development. In this book are Newton’s laws of motion for software development. These are the basic forces that shape how we develop software, and nobody was talking about it.

We’re getting ready for the panel. I read the book cover to cover, really excited, even the parts about the debate between assembly language and these newfangled higher-level languages, and what are the tradeoffs between them, and designs using paper tape. I said, I would like to update this material. This is in 2005, 17 years ago. I did the math. I actually did it twice the first time, it was only seven years ago. I realized that couldn’t possibly be correct. I did it again and realized I had been working on this project for 17 years. At the conference, I was able to have breakfast with Ed, who’s now passed, and Larry, and just had a delightful time because they’d been classmates at MIT. They’d done a whole bunch of work together. I have it inscribed by these two jokers. The first one says, “Don’t believe anything you read in this book, Ed Yourdon.” The second one says, “…including the above, Larry Constantine.”

I had the laws of motion of software development, as described here. The two key concepts are coupling and cohesion. This is the book that introduced those words. What I’d noticed, even back then, was that those words had drifted very far from their original meanings. They’d come to me in lots of different things, but “Structured Design” gives a very precise definition, which I will go through with you, about what coupling means and why it’s important. I thought, I’m going to update this material. If you go back, you’ll see some hotel in San Francisco, I gave a talk called, Responsive Design, that was my first attempt to get this across. I had some stuff right, but I had some stuff badly wrong. In preparation for today’s talk, I was talking with my oldest who’s now a staff engineer in software at One Medical. I said, I’ve spent 17 years figuring out how to explain cohesion. That’s really the sticking point in all of this. That concept, the definition is very clear, but trying to explain it in a way that people can take it on board and understand it and make good use of it. That part I just went over again. The only way to learn how to explain something well, is to explain it badly, many times. It means that some of my friends when I sit down and offer to buy them a beer, just leave, because they know it’s going to be another one of those bad explanations. Eventually, I get enough feedback. Now, finally, I feel like I’m ready to explain this, at least to a greater degree than I was before.

Three-and-a-half years ago, I decided, ok, I’m finally ready to write the book. I sat down to write the book. The first sentence that I typed was, software design is an exercise in human relationships. Exactly. I said, what does that mean? Why did it come out of my fingers? How am I going to explain this? This is one of the beauties of writing for me is I say things that I don’t know that I think. If you can stop pressing tweet right after you do that, that’s a really good skill to learn. I’d written that and I’m thinking, what in the world does that mean? I believe it. It feels true in my gut, but really, what all does that mean? What I’m going to do, I’m just finishing the first draft of the first third of the book. The secret about book writing, you’re going to think, this topic is way too small to fit into a book. Then you’re going to start writing it, and then you’re going to realize, no, this topic is way too big to fit in a book. You’re going to cut it down. You’re going to think, that topic’s way too small to fit in a book. Then you’re going to write some more and you realize, no, that’s way too big to fit in a book, and you’ll cut it. The third time, you have about a book’s worth of stuff. I’ve gone through that process, and I’ve drafted the last chapter of the first book.

Overview of Software Design

The topic of software design for me, as an exercise in relationships, divides nicely into three stages. What I’m going to do is give you an overview. If you want to know more about the specific stuff I’ve been writing, look for tidyfirst.substack.com. I’m doing an experiment in geek incentives. I thought, if I did this as a paid newsletter, then I would have the motivation to keep writing. Because what slows me down as a writer is not how slow I write, it’s how often I don’t write at all. It turns out that people giving me $7 a month is a social obligation enough that I’m going to keep typing. I have the first book about drafted. The book is called “Tidy First?” for reasons that I’m about to explain. I’m going to give you an overview. Then if you want to dive in more, that book is there. I’m going to continue writing the two subsequent books in the same place, so if you’re interested you can go there.

Software design, what are we talking about? There’s a basic loop in software development. Somebody has some idea, ok, the software does some stuff, and we want it to do some new stuff. We want to add a widget. Somebody has the idea, and now somebody has to change the behavior of the program, so used to calculate. Now there’s a new option for this, and so we calculate some new stuff. This is conveniently a loop. We got an idea. We change the behavior of the program, and that gives us more ideas for what else we could do with the program, and all is well. Software is this magic stuff because it scales so well, unlike any other commercial endeavor. It scales to more of the people on the planet, to more of the planet, which brings with it obligations. It brings some obligations. It also brings with it scale, which is one of the exciting things about software to me, is I can take the output of my brain and spread the effects of that over wider areas.

We got this basic loop where we got ideas. We change the behavior of the system, that gives us more ideas. We change the behavior of the system, and away we go. Everybody is happy. Because we all know that underneath here, if we just keep doing this loop, idea to behavior to idea to behavior, we’re going to start going slower, and bugs are going to come up, and programmers will get frustrated and leave. The new programmer is going to be even slower than the old programmers were. The structure of this system has a profound effect on how quickly we can go through the loop up above. As developers, sometimes we think, here’s an idea, but before I just change the behavior, I’m going to change the structure first. Then I’m going to change the behavior and it’ll be so much easier after I’ve improved the structure that I win already.

That is exactly the tidy first workflow, where you say, I have to change some messy code. I’m going to tidy it first. Then I’m going to change it. If I add the structure changes, and the behavior changes after the structure changes, that takes less time than just going and changing the messy code and leaving things even worse for the future. It’s a dilemma. Sometimes it is. Sometimes it isn’t. Sometimes you change the behavior, and then, “This is so ugly. Now if only it was designed like this, it would be easy.” We’ve got this loop going on. One of the really magical things about software is sometimes the structure itself generates ideas for how to change the behavior. Now that it’s easy to add a new one of these things, why don’t we just add a bunch of them. You start doing the things that are easy to implement, because you’ve made them easy to implement.

The first lesson for me in software design is this hard split between behavior changes and structure changes. I want to make a very clear distinction between those two. Which hat am I wearing? I’m always wearing one hat or the other, just because the sunscreen and stuff. Which of those hats am I wearing? If I’m ever wearing them both at once, I’m making a mistake. I’m going to try and make this presentation practical as much as I can. If you wanted to act on this insight, that we should split behavior and structure changes, start making your commits, one or the other, but not both. Just try that for a week and see how it goes. Try it out maybe with your team, see how that goes. For lots of interesting reasons, structure changes can be treated much more lightly than behavior changes, because structure changes tend to be reversible. You extract some function, you inline the function. If you change the numbers, you report to the IRS, changing them back is a little squeegee.

Waiters and Changers

The loop is more complicated, and it becomes even more complicated. Here’s where I’m going to finally get to this, software design is an exercise in human relationships. Because we got two kinds of people, I call them waiters and changers. I have the waiters up here. The waiters, maybe had the idea for what the software does next, but they can’t do anything to change it. They have to wait. They’re patiently, impatiently, a little foot tapping. They’re waiting for the software to change. There are a different kind of people in this picture called the changers. The changers, this is the people mostly sitting here, where we know what to do to go in and change the behavior of the software. Already, we’ve got conflict, because you have waiters who want to see that next behavior change, they want the next feature. You got the changers, though, who know that if we just leave the structure to deteriorate, things are going to get worse. It’s going to be less fun to work on, and more bugs, and more annoying. We’re going to get further behind. We have a misalignment of incentives. Changers wanting to invest in the structure. Waiters in the short term, they don’t even see the structure. It doesn’t affect their daily life. They just want that feature as quickly as possible. We come to the first relationship. This waiter-changer relationship is fraught because of the divergence of the incentives, and the different vocabulary, different value systems, different wardrobes. Although I’m working on that. The third book in the series is going to be about using software design to encourage positive waiter-changer relationships.

We can zoom in now. There’s another set of relationships that software design either contributes to or inhibits. That’s the relationships between the changers. If we have a bunch of changers, and they’re all related to each other, and somebody is producing an API that somebody else is consuming, and they want to change the API, or they want to change the implementation in some way that would cause a change for somebody else. Now, all of a sudden, you’ve got, again, a divergence in incentives, where one person’s best interest is to make the change, and another person’s best interest is for the change not to be made, or not yet, or not in exactly that way. The second book is about exactly this set of relationships. We all have a greater alignment of value systems, and vocabulary, and clothing choices. Yet, there is divergence of incentives. Oftentimes, the things that hang teams up, it’s not, this was refactored in in this way, and so technically, it doesn’t work anymore. It’s more, things were refactored in this way by this person who was acting like a jerk. Then you have real problems. Which is why I say software design is an exercise in human relationships. I realized, as soon as I typed that, that people were going to freak out. People who don’t have a lot of confidence in their abilities in human relationships. This is just what it is. Software design has a critical role to play in changer-changer relationships. Software design skills applied in a certain way have a critical role to play to keep these relationships strong, to mend these relationships when they’ve been frayed, and to keep everybody moving forward.

Tidy First

I told you I started out with this big topic and thought it was too small, and then I chopped it, and then I chopped it some more. Where I finally got to this book, “Tidy First,” is really focused on individual programmers, or pairs, or a mob. It’s all the same thing. The question that comes up 20 times a day for everybody who is touching code is, this code is messy. Changing it is going to be harder than it needs to be. Should I tidy first? That’s the basic question, comes up over again. You’re going to get some dogmatic answers, “Of course, you always tidy first.” Because that’s a simple answer I don’t have to think about anymore. I think probably that’s the explanation of it. Or you also get the, should I tidy first? Absolutely not. Tiding is a waste of time. Why would you do that? We have waiters screaming for the next feature, get it done as quickly as possible. Don’t bother wiping the knife between cutting meat because salmonella happens outside of the restaurant. It’s kind of like that, but that’s that dogmatic answer. What I discovered when I looked at this tiny little grain of sand question, should I tidy first? The answer of course is it depends. What it depends on is more or less all of software design. All of the factors that play into software design at the largest scale, also play into this question of, “I got this code, it’s a little bit messy. I have to change it. Should I tidy first?” That’s what this first book is about.

We have this question, we have some messy code, should we tidy it first? It depends. What does it depend on? We’ve got these dogmatic answers: always and never. They don’t make any sense. They don’t make any sense for particular economic reasons. I was not somebody who naturally understood money. I have friends who are traders, and they have a real gut feel for money. It took me a long time to get to that same place where I’m comfortable with money. There’s some of the effects of money, like the laws, Newton’s laws of motion for money. They really do exist. I didn’t know them. Once I learned them, then I took a different look at software development. I’m going to talk about two of the whys of money that affect software development, and these two conflict. If somebody comes and says something like, here’s something, may you live long enough that people start doing the dumb things you did when you were young, unironically. Because that’s what’s happening to me now.

The good news is, I don’t have to invent any new topics. I can just go into XP Explained, open up a random chapter. Give a talk based on that chapter, and people will go, “How’d you learn that?” It’s really the same thing. This, I hear people now explaining to me patiently, “We have to make all these design decisions. If we just made them at the beginning of the project, all the rest of the project would go much more smoothly.” It is everything I can do not to just go full, get off my lawn, grumpy old man. I used to hate it when, as a young engineer, some old engineer would go, I’ve been doing this since before you were born, kid. It feels so good to say that. Waterfall is back. Controlling time, scope, cost, and quality is back. Just the whole thing. You’re already smarter than most of the people out there, just guaranteed by not being pulled in by these discussions. Comprehensive documentation, just watch me go into orbit.

Design Upfront

My point was about design upfront. There are perfectly good economic reasons why design upfront is a bad idea. I had to learn about discounted cash flows, and really internalize that knowledge, and then design upfront just makes no sense, economically. I’m not saying what we do all boils down to economics. We are engaged in a profoundly human activity that pushes us to the limits of our abilities to relate to other human beings. If it doesn’t make economic sense, it doesn’t matter how well you get along. That’s why, for me, the foundation of this is, how can we make software projects that make better economic sense? We got the time value of money. What that means is, if you tell me about how much we’re going to spend, or how much we’re going to make, I have to ask you when. Here’s why. If we have time going along this axis, and we spend some money here in order to make some money there, the magnitude of these cash flows, we can’t evaluate them just by comparing the sizes. Because, this money that we spend, looked at from today, it’s actually a little bit smaller. Future money gets smaller when looked at from the present. If we look at this revenue from the future, it’s going to have longer to decay. Yesterday, I blew my mind, I realized, discounted cash flows is just half-life as applied to money. You got some money in the future, and then you want to look at it today, it just gets smaller as it gets closer.

The problem with a design upfront project, is you’re spending all this money now, which doesn’t get discounted very much, in order to make a bunch of money at some distant future date. That’s going to be discounted, much more substantially. It can look like it’s a really profitable project, and turn out to actually be a disaster. For example, if we have a project, this is the upfront project, where we’re going to design, design, design, and then, everything’s going to be fantastic and we’re going to make a whole lot of money. If we can transform this project into this project, we’ve made progress. I came up with a great phrase as I was preparing for this: spending less and spending later are exactly equivalent. In this second version of the project here, because we moved our spending out into the future, we’re already making more money. It’s not just about the magnitude, it’s about the timing of the expenses and the revenue. The problem with design upfront isn’t that we’re going to make a bunch of decisions on speculation, which are going to turn out to be wrong. Then we have to carry the burden of all these decisions along with this, or we have to get rid of them and remake them over again. All that’s true, and not the point. The point is, we’re spending too much money too soon. If we can defer some of those expenses until later, we’ve created economic value. The purpose of the style of software design that I’m advocating here, is to make money sooner and with greater certainty, and to spend money later, and with less certainty. It’s not just about, we’re going to make more. Because the absolute magnitudes aren’t so important as the timing of the software design decisions that we make.

Optionality

Time value of money tells you, spend money later, earn money sooner. There’s another force at work though because we work in an area with great uncertainty, and we don’t know what we’re going to ask our software to do next. If we can increase the optionality of the software, then we have also created economic value. If I have software and it can go this direction, this direction, or this direction, and this one makes a lot of money, and these two only make a little bit of money. When I get to this decision point here, I’m going to say, now I can see I’ll make more money here. If I don’t have this option to go in this direction, this software is worth a whole lot less. The counterbalancing force to discounted cash flows is optionality. These two come into conflict, because the optionality we create today, we haven’t exercised it. We’ve spent money. If only we had pulled the software apart this way, so it’s easy to replace this thing with other things, that’s optionality. We’re spending money today to make more money later. Discounted cash flows pulls us in one direction. Optionality pulls us in a different direction. We haven’t even gotten to coupling yet, no wonder this is hard.

Coupling

Let me talk about coupling. Software systems are constructed out of elements. I just say elements generically, because it doesn’t matter to me. My oldest, when they were learning how to program, came to me after a while and says, architecture, design, coding, isn’t it all design? It is. When I talk about design, I just talk about elements. The elements can be itsy-witsy, like expressions in a statement, or they can be gigantic, like services in some mesh. You have elements, and the elements are related to each other. A thing that happens frequently in software, is I go to change this element, and I realize, “If I change this, I have to change this and this too.” What just happened? What we thought was going to be a cheap change just became more expensive. If that’s as far as it went, it’d be annoying, but it wouldn’t be disastrous. It gets disastrous because those ripples continue to flow further out. This is the observation made in the “Structured Design” book, that there were certain systems which were cheap to change, this is the early days of IBM, and there were other systems that were really expensive to change. The difference between the two was that the elements transmitted change, and the expensive systems, the elements of the system transmitted change to each other. They were coupled.

The definition of coupling, if I have two elements, E1 and E2, and I have a specific change that I want to make, some delta, this is defined as if I change element one, that implies I have to change element two also. That’s the definition of coupling. Colloquially, people will use the word coupling to mean all kinds of different things. “This service calls that other service so they’re coupled.” We can talk about that relationship and it’s a different one. This is a very specific one. Coupling is a very specific relationship that says, if I change this element, I have to change that element too. If I change the name of this function, I have to go to all the callers and change them because they’re coupled with respect to changes to the name of the function. They aren’t coupled with respect to the formatting of it. If I go put a comment in the middle of the call function, nobody cares. They’re not coupled with respect to that change. They are coupled with respect to changes to the name. Why does this matter? It’s because of these ripples, and you get these jackpot effects. I’m a huge power law nerd. I love finding power law distributions. It’s a jackpot situation. If you go and you make changes to the behavior of the system that seem about the same size, and you look at the distribution of how much work each of them is going to be. Most of them will be about the same size, and half of them will be twice as much work, and half of those will be twice as much work. Then way over here is the one that caused the CTO to quit. We live in a natural world in software, a natural world we’re not particularly aware of till now. That’s what’s going on.

I call what follows, Constantine’s equivalent. What Ed and Larry observed in “Structured Design” is that the cost of software, approximately equals the cost of change. That is, we don’t really care about the “initial development” of the software, because if it’s at all successful, it’s going to live for a very long time. We’re going to spend almost all the money changing the software, not on that initial tiny slice of an initial development. We can refine this further and say, the cost of change is approximately equal to the cost of the big changes. This is this power law, long tail distribution, where if we add up the handful of really expensive changes down here on the long tail of it, and we compare the cost of those together with the cost of all the rest of the cheap changes, most of the cost is going to be in these big jackpot changes. The cost of those big changes is really the cost of the coupling of the system.

We can say that the cost of the system approximately equals the coupling of the system. Plus, we’re software designers. You can decouple stuff, but it costs money. The cost of the system is approximately equal to the coupling in the system plus the cost of the decoupling that we do. Now we’re in a tradeoff space, where we can say how much this coupling cost us, how much this decoupling cost us. Can we get into some sweet spot between the two? That’s the primary message of this work for me, is that we all have to make this tradeoff between coupling and decoupling. If we just ignore decoupling, if we ignore software design, we’re going to have more of these jackpot changes. The problem with jackpot changes is they destroy trust between the waiters and the changers, because the waiters don’t want to wait. They said, “You’ve added 14 widgets, each one of them has taken a week, and this widget is taking me 8 months to add, like, are you idiots?” No, we’re just working in a natural system. If we say, to find out, we’re going to refactor everything. Now the waiters are like, ok, and then what do I get at the end of all this refactoring? You say, exactly the system you have today. That’s not a relationship positive move.

Conclusion

To bring it back, full circle, we all have the option of doing software design in ways that enhance our relationships with ourselves, with our immediate peers, and with people with different perspectives. If there’s one message I would invite you to take away, it’s that you can always make big changes in small, safe steps. There’s one skill to master, it is that. The more you can make your big changes in small, safe steps, the less interruption you have in your relationships with other people. The more it looks like those features just come out, even though you know that under the water, you’re continuing to evolve the structure as you go along.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: DevSusOps – Bringing Sustainability Concerns to Development and Operations

MMS Founder
MMS Adrian Cockcroft

Article originally posted on InfoQ. Visit InfoQ

Transcript

Cockcroft: My name is Adrian Cockcroft. I’m going to talk about adding sustainability concerns to development and operations. I’m calling this DevSusOps.

Why Sustainability Matters

Why does sustainability matter? You may have various different thoughts about this. I think I’ve got most of them here. The definition of sustainability is really around leaving the world habitable for future generations. That’s the essence of the United Nations definition of sustainability. The reason it matters to most companies is regulatory compliance. Particularly in Europe, the regulators are saying, you’ve got to measure and report your carbon footprint and other sustainability risks. In the U.S., we’re moving in that same direction, and around the world, wherever you are, you’ve got regulation, saying you’re going to have to figure out what is the sustainability posture of your organization. Then there’s market transition risks. If you think of companies that own gas stations, or make parts for gas driven cars, that market is transitioning to electric cars. That’s a market transition. If you own a coal fired power station, you’re in a market transition. It’s any time that the market is changing as a result of the transition to a more sustainable future. You have a market transition risk, if you’re selling products that are affected by that market transition. Then there’s physical risks to business assets. If you own property at sea level, in very high temperature areas where it’s getting hotter outside, you try to move freight down rivers that are drying up. There’s all kinds of physical risks that you might have. Sea level rise is one that’s affecting the coast. There’s a lot of weather and high temperature related physical risks to business assets from climate change.

The other thing that you may have is products which are branded in some way to be green. The market positioning, you need to stand behind them, and walk the talk, and actually build green products in a green way. One thing that you can’t really underestimate, I think, is employee enthusiasm. Many of us have children that come home from school and say, what are you doing to save the world so that I have something to live in? That’s a big motivator. I’m a little older, but my kids are grown up. It really matters, what are we doing for the future. There’s a lot of personal investment in this, way beyond what you’d expect for just a cost optimization exercise. People get much more invested and excited about a carbon optimization exercise. As we move into a more sustainable future, we’re actually starting to see costs come down either now or you’re seeing future cost reductions. The cost of renewable energy is now lower than the cost of fossil fuels, fossil-based electricity. We’ve got current payback right now. The cost of operation of an electric car is less than the operation of a gas car now, although the upfront price is a little more. Over time, you see in the future, we’ll see electric cars being cheaper than gas powered cars even to buy in the beginning. Then, for some organizations who’ve got a bad history of working in climate change, like fossil fuel companies, they’re concerned about the social license to operate. Are people still going to want to go and work there, want to buy products for them if they’ve got a bad reputation around sustainability?

What We Can Do About Sustainability

What can we do about it as developers and operations people? On the development side, we can optimize our code, choose faster languages and runtimes. Pick more efficient algorithms. Use faster implementations of those algorithms. Things like simdjson, for example, is a super-fast way of processing JSON. Reduce the amount of logging we do. Then, reduce retries and work amplification by getting our retry algorithms worked out. That reduces the overall workload. That’s on the developer side, the way you set up your application. Then, on the operation side, whether you’re a DevOps engineer doing development and operating it, and you run what you wrote, or whether there’s a separate ops team, there are more operational concerns. You want to run the code you’ve built at a higher utilization. That might require tuning the code so that it can run at higher utilization, but you want to run at higher utilization. You want to use a lot of automation for things like autoscaling. You want to relax any over-specified requirements. You may have pushback on things that look either expensive or high carbon, and say, do we really need to retain data for forever? You might want to archive and delete data sooner, deduplicate data. Then the thing that you don’t normally think about is, choose times and locations carefully. If you run something at midday, there’s a lot of excess solar capacity, the grid is very green. If you run stuff at midnight, then you’ve got a lot more fossil fuel in the mix. That’s really the time-based thing. Then, location. Some locations, say, if you run things in France, there’s lots of nuclear energy. It’s a very clean grid from a point of view of carbon emissions. If you run things in Singapore, there’s a lot of carbon in the grid there. You end up with a very high carbon location. You go pick those things. Those are things that we’ll talk a little about later, but are not the normal concerns.

Almost all of these things are directionally the same as saving dollars. If you just make your system more efficient, you’re saving money, as well as saving carbon. I say almost all cases, because there are a few situations, say some regions which have lower carbon may cost a little more than other regions, if you look at a cloud provider. Migrating a workload, you might spend a little bit more but reduce carbon. That’s one of the few corner cases that you might see. The main thing is, even though you’ve got directionally saving money saves carbon, but the actual amount of carbon per dollar varies a lot. You have to understand whether saving a dollar of compute, or a dollar of storage, or a dollar of a service, what’s the carbon impact of that?

How to Measure Carbon

I’m going to spend most of this talk focusing on how to measure carbon, and see if I can figure out how to get the right mental models into your heads about how to think about it, the terminology, the way these things operate, so you can figure out in your own work, how to measure carbon, how to work in this space. We’re used to reporting and optimizing throughput, latency, utilization, capacity. We’ve started adding cost optimization to those kinds of things. Carbon is really just another metric for monitoring tools to report. We’re starting to see a few monitoring tools with carbon as a metric within them. What we’re really seeing is regulators and market pressure causing companies to have to report carbon and decarbonize their products and services. These are two separate things. Reporting carbon is something that the board has to do a report and sign off, and auditors come in. Every quarter, you do a financial statement, you’re going to have to put carbon in that financial statement alongside the financial numbers in the future. Those are audited numbers. What you need to do to do that is a little different than the information you need to decarbonize products and services.

Economic Model for Carbon

Do you know how much carbon your company emits for data processing and hosting per year? Most people don’t really know. How can we get to an answer? What you may know is your company budget. What’s your IT budget? Most people have some idea, whether it’s a million dollars, $10 million, or $100 million, or whatever. It’s somewhere in that ballpark. What you can do is for every million dollars you spend on IT, or whatever you define as data processing and hosting, it’s 159 metric tons of CO2. That’s the economic model. That’s just a rule of thumb you can do. If we’ve got a $10 million budget, we’re probably doing 1000 to 2000 metric tons of carbon. You’re in that range. Where does that number come from? If you go and search for carbon factors, you’ll find places like this, climatiq.io, you can go on this website, you can search. You can find that there is a standard number which is provided by the EPA. This is a government provided number, for 2020. If you are doing your carbon report for 2020, you go to the EPA number, and you find it’s 0.159 kilograms per dollar, which is the same as 159 tons per million dollars. You also notice this is calculated from 2016 data, which is a bit worrying. It’s a little bit out of date. Also, that it includes the entire supply chain, cradle to shelf, meaning all of the materials used to make it as well as the energy used to operate it, all the way through. This is quite an interesting number. It’s a very average number for the entire industry. If you have nothing else, you use this number.

The economic model for carbon, basically, you take your financial report, you break down the spend categories, find the right emissions factor, multiply and accumulate. The good thing about this is really everybody uses these models. Most of the carbon numbers you see out there that people are reporting are largely based on economic models. It’s just the way the world is. It gives you a big picture, approximate view. You can always get the input data. The industry standard emissions factors are available, some of them are openly available. There are some where you might license them to get some more specific ones. Auditors can trace the calculations easily. It does estimate the entire life cycle. That’s the good. The bad is it’s a really coarse model. Some of the factors are based on old data, in fast-moving areas like IT, they may not be that accurate. It can be hard to actually pick the right factor. This is one of the hardest things is finding, how do you categorize things and pick the right factor? It’s not trivial. Then this other problem is if you spend more, let’s say, you go and you buy some green product, because it’s greener, but you spent more on it. If you don’t have a different factor for it, then your reported carbon will go up just because you’re spending more. Effectively, carbon optimizations have no effect. The ugly thing here is if you do this for one year, economic model, and then you come up with a more accurate measure the next year, it could be quite a lot higher.

Recommendation here. Start here, use spend when no other data is available, use economic models to figure out where to focus efforts. If you look at your entire business, you’re going to see, what do you work on? Are you working for a SaaS provider or a bank where almost all your costs are people and IT and office buildings. In that case, maybe most of your carbon footprint is IT. I did a talk with Starbucks, earlier this year, or late last year, most of their carbon footprint is dairy, it’s milk. Their IT component is a very small piece. If they can use a little bit more IT to use that to convince people to maybe use oat milk instead of dairy, that actually makes a bigger difference, like a small percentage change in Starbucks’ end users from dairy to real milk to oat milk, will actually make more difference than their entire IT budget. You have to think about, what is the real carbon footprint of your company? Don’t just focus on the IT component.

Scopes of Carbon

I’m going to now explain something called the scopes of carbon with some slides I made while I was at AWS. The first thing is the fuel you consume. You count whoever owns the fuel when it’s being burnt. You buy gas, you put it in your car, you drive it around, that you’re now doing scope 1 emissions for whatever you put in the car. Anything you burn in your fireplace or gas used to heat your home, anything used for cooking, that’s scope 1. The way to get rid of scope 1 is to electrify everything, and then figure out how to get renewable electricity. Scope 2 is really the energy used. This is the energy where it’s consumed, but you’re not burning the fuel yourself. Somebody else burnt the fuel, sent you electricity. You have the grid mix, which is the mix of power, which is either renewable or fossil fuel. You could break it down. Those are the two main categories. What you want is carbon emitting and non-carbon emitting. Nuclear is good, because it doesn’t emit carbon. Nuclear isn’t really renewable but it counts as zero carbon. That’s the way you look at the grid mix. Then you look at your location, so your house, you’ve got a heat pump, solar panels, batteries, electric car. You’ve managed to convert that house to be all electric. Then if you can run the house off the batteries and the solar panels, you’re basically getting your scope 2 to be very low carbon. Then, if you’re doing cooking, you should switch to induction and electric cooking, because also gas ranges cause pollution and things like that. Some of the benefits aren’t just carbon, there’s other things that make it better to move off of fossil fuels.

Scope 3 is where it gets really complicated, depends an enormous amount, what business you’re in, it’s your supply chain. The point here is that say you had part of your business that was emitting a lot of carbon. You said I’m just going to buy that as a service from somebody else so I don’t have to count that as my carbon. That’s part of your scope 3, so you can’t get out of your carbon emissions by just offshoring it and sending it out and pushing it out to suppliers and say, I don’t own this thing. Any business that you don’t own, which is supply chain, and also the carbon your consumers emit to use your product also can count. In some cases, the inventory, the things you own, those things count as scope 3 as well. Think about this as pretty complex and extremely dependent on what business you’re in.

If we look back at data centers, really the only scope 1 you’ll see in a data center environment is a backup generator. They run diesel, mostly. What we’re doing to decarbonize those is put in larger battery packs, so that diesel is really a secondary backup, doesn’t get used very often. Also, move to biodiesel and fuel cells and other things like that, which can be fed by cleaner energy sources. It’s possible to work on that. It’s a tiny component of data center energy. Backup generators really don’t run that often. Scope 2 is this mix of electricity sources coming into the building. For a data center, scope 3 will be the building itself, the racks coming in. Anything used to deliver the components to physically put up the building, that’s all going to be your scope 3. If you look inside the data center, you can see there’s cooling, power distribution, heat, there’s a lot more going on there. Ultimately, your suppliers need to report scope 1, 2, and 3 to you, and then you need to report scope 1, 2, and 3 to your consumers. This is all defined by the Greenhouse Gas Protocol. You can have many happy hours reading this website to figure out for your particular business, what you have to do.

Process Models for Carbon

We’ve talked so far about the economic models, just using financial data to feed the carbon data. If we want to get a bit more accurate, what we need is a process model where the units are the materials you’re using, or the kilowatts of energy you’re using. The process model, you measure the materials. You find the right emissions factor, multiply and accumulate. For example, diesel, and gasoline, and jet fuel will have different emissions factors, even though they’re all measured in gallons, or kilograms, or whatever. This is called a life cycle analysis. That’s what that cradle to shelf component of that metric I highlighted earlier was talking about. What you’re really wanting to do is look at manufacture and delivery, which is your supply chain, your energy, which is your use phase. Then, what do you do once you’re done with it? How do you dispose of it? Do you recycle it? What’s the energy consumed there as well? The good thing about these models, they exist. It’s a well-defined methodology. LCA has standards around it. People are trained in it. You can hire people or consult with people that can generate these models. They’re reasonably detailed and accurate. Particularly for the use phase, energy data is fairly easy to get. When you do an optimization, it really does reduce the reported carbon, so you’re in the right area. Auditors can trace those calculations.

The bad thing is models are really averages for a business process. This is not going to tell you the actual carbon emitted by one machine in your data center running one workload, it’s going to give you an average for the whole thing. It’s still quite a lot of work to build an LCA model. It may be more expensive than you’d want to spend to hire the right people to do it. The supply chain and recycled data is particularly hard to get. Quite often, people build these models, and they just do the energy part. They don’t think about the scope 3 part, or they just don’t get to it because it is hard. Then they just report carbon. The ugly part here is some people just really focusing on energy use and report, we’re green because we’re 100% green energy. Then get surprised when saying, but all of that hardware you’re using emitted an enormous amount of carbon when it was being made, and you have to count that too. What you should do is use the economic models to tell you where most of your carbon is, then build process models for those. Start with energy, but don’t forget the rest of the life cycle.

Measuring Carbon Emitted by a Workload – How Hard Can It Be?

How hard can it be to figure out how much carbon is coming out of a workload? Here’s a zoomed in look at that data center. You get grid mix from your local utility bill, which you usually get a month or so after you’ve used the electricity. That’s annoying because that means you don’t know what today’s energy mix is, you just know what last month’s was. Then, there’s power purchase agreements, or renewable energy credit purchases, which is basically affecting your grid mix. Then you have power usage efficiency, which accounts for losses and cooling overhead as the power that comes into the building is delivered to the rack. The machine may use a kilowatt, but there might be 10% extra energy which went to the cooling and distribution area, so 1.1 kilowatts had to come into the building to provide that energy. Scope 2 carbon is going to be the power mix, multiplied by the PUE, multiplied by how much capacity you use, and then the emissions factor per capacity. The amount of CPU you used or storage. Then you’d have a different emissions factor for CPUs and storage of different types, networking, whatever.

A few problems here. First of all, these utility bills are going to be a month or more late. The further you get to the developing world, the harder it is to get this data, and the more delayed it’s likely to be. Then, I mentioned power purchase agreements. These are contracts to build and consume power generation capacity. Amazon has over 15 gigawatts of these. Basically, Amazon contracts with somebody else to say, we’re going to give you some money and you’re going to build a wind farm or a solar farm. We’re going to buy the energy from that farm at a certain price, usually lower than the typical price that you’d get if you just went to PG&E, and say PG&E is whatever it is, 20-something cents per kilowatt hour. You could get much less than that if you build your own solar. It saves money, and you’re funding the creation of a new solar, wind, or battery plant that would not otherwise exist by entering into this contract. This is good, because it’s what’s driving a lot of the rollout of solar, wind, and battery around the world.

If you just fund building your own solar array, or wind farm, and you’re just selling that capacity to PG&E, or to whoever comes by, one of the things you can do is sell additional, the fact that it’s renewable, you can charge a little surcharge, so say this is renewable energy and I’m going to allocate that to somebody. Companies can go and say, I want to buy some renewable energy credits, and they’re going to get used once. Once everyone’s bought all the credits for all of the solar that’s kicking around in the market, you’re out. There’s only a certain amount of this capacity available, because it’s the open market for otherwise unused power. It’s not quite as good as a PPA, but it’s still good. You’re still funding somebody with a bit of extra money for the fact that they built a solar farm, and it makes it more profitable. It means that, ultimately, more solar farms get built or whatever. It’s not as direct as a PPA. What typically happens is companies do as many PPAs as they can. Once they’ve got all of those in place and this is ok, we’ve done everything we can possibly do, we have a bit more money, we’ll buy a few RECs on the side. It’s a much smaller proportion than the PPAs for the large cloud vendors. You’re buying a few RECs to just top it up and help things out.

The grid mix itself, another problem, it’s not constant. It’s going to change every month. Whenever you look at a carbon number, you have to say, when was that carbon number? There isn’t just a carbon number, you have to say, was that in what year, what month, even down to what hour was it? Because now we’re starting to see hourly 24 by 7 grid mix coming from some suppliers in some parts of the world. This is going to take a long time to get around everywhere. Google Cloud Platform really started diving in on this. Azure are working on it. AWS have not said anything about it yet, but hopefully working towards it. There’s also a company called FlexiDAO working in this area, you could go look at what they’re doing. They’re working with GCP, and Azure, and other people. One of the problems here is that the cloud carbon footprint tool doesn’t include PPAs, it really only runs off the public grid mix. It can’t really know how much you get from the PPA, because that’s private information that isn’t shared. It’s all difficult.

If we look at PUE, not well standardized. The different architectures of different data centers mean that you can come up with differences that really aren’t comparable. Like, do you have a centralized power distribution system or a distributed one? Are you delivering high voltage AC or DC all the way to the racks? Do you have your batteries in every rack or centralized? It actually makes a difference to the way PUE is calculated, how you do that. Lots of discussion about that in this blog post. If you look at the capacity, dedicated capacity is relatively easy to account for maybe in a data center environment. Once you get into the cloud, it’s really hard to figure out how much you’re using of a shared service network equipment, slices of instances, virtual machines, really tricky. You can get roughly in the right ballpark, and those are the factors we’ve got, but you can’t really know what you’re doing. Then there’s this other problem. If you’re operating a multi-tenant service and want to report the carbon to your customers, then you have a problem of figuring out how much capacity to allocate to each customer, how much is overhead that you should own yourself. All the cloud vendors basically are having to solve this problem for their customers, and all of the SaaS providers running on the cloud, are then having to do it downstream.

Then, we get to the emissions factor per capacity. You need to know how much power every instance type, storage class, or service uses. Depends on utilization, overheads. It’s actually very difficult to figure this out. You can get some averages. There’s some data for some systems. The cloud providers roll all this up, and some provide some estimates. All of these things are estimates that are in roughly the right ballpark, is probably what you’ll get. I wouldn’t worry about, if you get a number for an instance that looks vaguely similar on one cloud provider to another cloud provider, I just use the same number for the emissions factor.

Carbon Audit Reports

If you’re going to report carbon, and it’s going to be reviewed by the board and the auditors and all this stuff, you really need to be using the cloud vendor reports. AWS and Azure and Google have those kinds of reports available. If you’re on-prem, you need to use economic models, probably, or build your own models. Auditable data has to be traceable, and defensible, but, as I mentioned, is too coarse and it arrives too late to be useful for optimization.

Methodologies for Reporting Carbon, Location vs. Market Based

Then on top of that, there’s two different ways of reporting carbon. I’m going to try and explain the difference between location and market-based metrics. If we look at location based, this looks at a particular building, and it says, what is happening at this building? There’s the grid mix coming in. If I use an incremental extra kilowatt hour of energy, how much additional incremental carbon will be consumed, supplying me that extra incremental kilowatt hour of energy? That’s the idea for location based. It comes up with higher numbers for carbon than the market-based method. It’s typically what’s happening. The Google data, hourly, 24/7 data is based on this. The problem with it is that the way it’s defined it does not take into account the power purchase agreements, which the cloud vendors are spending enormous amount of money on. What’s used in that case, is a market-based method that says all of the electricity in a market is all the same. You pump electricity, you push electrons into the grid, they pop out of the grid. If I have 100 megawatts I’m putting into the grid from this power station here, I can use it over here, and it’s fine. It just flows through the grid. I want to take into account the fact that I am generating this power, and it’s my power. I’ve got the PPA or the REC in place. The market-based method includes the PPAs and the RECs, and you effectively are creating a custom grid mix for your data centers. It’s usually averaged over a month.

The reason this matters, is that basically your AWS and Azure data is regional market based. They take into account PPA generation and RECs that are within the same grid, so that electricity flows together. Google’s had a claim since 2017, that said that they generate more energy than they use. They’re taking that on a global basis, which was a bit dodgy really, because they were basically saying we generate more power than we use in the year. They were basically over-generating extra power in USA and Europe and saying that’s good, but in Singapore, you’re still emitting a lot of coal powered and oil powered fossil fuel generation. They were saying on average, across the whole world, it made sense. It doesn’t really. It makes sense if you do it at the regional market level where the grid is connected. Then Google went to this more current data, is location-based API, and 24 by 7, their API work. You can’t compare the numbers between AWS, Azure, and Google because of this. The Google data is more useful for tuning work, and their API is really the most useful API if you’re trying to build some tooling on this right now. Over time, as the utility grid decarbonizes, it matters less. If you’re in France, where it’s almost a completely carbon-free grid, it really doesn’t make much difference. It’ll make a bigger difference if you’re in Asia.

What You Can Do Today

What can you do today? For workload optimization, we need directional and proportional guidance. The Cloud Carbon Footprint tool up next is open source, uses billing data as input, maintains a set of reasonable estimates or guesses for carbon factors. Your mileage will vary as to the actual data you get out of that. I wouldn’t put it in an auditable report, but it’s useful for tuning. The Green Software Foundation has come up with the Software Carbon Intensity standard. This is a model for reporting software impacts per business operation, like grams of carbon per transaction, or per page view, or whatever. They’re defining that standard. It’s worth going to look at that. AWS has the well-architected pillar for sustainability, which I helped write and get out about a year ago. It’s the guidance on how to optimize development operations for carbon. It’s like some good advice in there.

Looking to the Future

Where’s all this going to be in a few years’ time? I think most monitoring tools are going to report carbon as the data becomes more available. Eventually, all the cloud providers are going to have detailed metrics. Google’s got detailed metrics now. Microsoft has some if you know where to look. AWS really doesn’t have detailed metrics at this point, but I think we’re all going to get there. Then, the European, U.S. cloud regions are pretty close to zero carbon now, if you take the PPAs into account. There’s a lot of generation offsetting the energy used. Everyone has the same problems. The Asian regions probably in the next few years are going to get to zero carbon. It’s just regionally been very difficult to get solar and wind projects in place.

Questions and Answers

Synodinos: One of the questions has to do with resources that people can use to follow up. You mentioned some websites. Is there any literature, like how we have the definitive books for performance, for scaling? Is there any book about this topic yet?

Cockcroft: I think the best place to look right now is the well-architected guide that AWS put out, and more recently, Microsoft has also put out a well-architected guide for Azure. The Microsoft one is based off of the Green Software Foundation recommendations, which are very similar. These are all very close. People are basically looking at the same things. There are some references to some of the way to think about it. There’s obviously stuff that’s specific to particular environments, like some cloud vendors, like AWS have Graviton processers that use less power. Also, I recently noticed that AWS is now listing the regions that have 95% or better clean energy, which are basically what I said is Europe and Asia. Europe and U.S. have better than 95% for 2021. It’s been improving since then. The Asia regions are not included in that list. If you just go to Amazon’s website, and search for sustainability, there’s a section there about cloud.

If you’re worried about the energy use of your system, you should be worrying about any energy use you have in Asian regions, or outside Europe and the U.S. There’s Africa and Arabian ones, which are not on that list at this point. Over time, the number of regions that are 95% or better green, is going to go up. By 2025, the goal is to have everything at 100%. That’s what AWS is doing. You can go get data off their website. The Green Software Foundation’s collecting useful data, and making sense of it. Then I’m just looking around for really efficient libraries. Things like simdjson, is a good reference there. The Zstandard compression algorithm saves you about 30% on stored capacity. It’s more efficient on reading data back, decompressing in particular. There’s a sort algorithm that Google came up with recently and posted, that’s built a little bit like simdjson. It’s just the fastest possible implementation of Quicksort or whatever, if you want to sort things. Run using the languages that are in your environment, just go and find these default libraries in your language, maybe look for some more efficient ones, for the key things that you use a lot.

Synodinos: To some extent, for these things to matter, there needs to be more of an organizational change towards practices like that. I believe a question that frequently comes up when we talk about sustainability is, how can an engineer or a technical leader either in their team or as part of a larger organization, convince that organization to adopt more sustainable practices? Have you seen any ways that can help with that transition?

Cockcroft: You’re going to have management saying, we need to figure out how to measure and optimize this because we got to report it to the Street as part of our quarterly numbers. That’s driving it from one end. There’s individuals who are interested, bottom-up, and a bunch of people are going to be building data lakes, and trying to gather this data together. That’s a workload which is starting to appear. We’re in the middle of the AWS re:Invent conference, and there’s a lot of talks there on sustainability. If you go to YouTube and look at AWS Events, and just search for sustainability, most of them start with SUS something, sort of SUS302, and things like that. There are some good talks. Some of them are companies talking about how they’re measuring their carbon. Then there’s some other recommendations in there. A lot more this year than there was last year. Last year, we had four or five sustainability talks. We had a fairly basic track that I helped organize. I’m no longer working at Amazon, but the track has two or three times as much content in it. There’s a good body of content there. If you look at similar events, from the Google Cloud, and Microsoft, all of the major cloud providers now have a lot of examples and content around sustainability as part of their deployments. The efficiency and the ability to buy green energy is really driving this, and the cloud providers are much more efficient than on-prem. You have to do a lot of work on-prem to get close. That work generally hasn’t happened.

Synodinos: Do you think cloud providers could open or move operations to countries where clean energy was more available, such as Africa and South America, where the availability of solar and wind energy is greater?

Cockcroft: Yes, they’re doing that. What they’re doing is they’re putting in solar, wind, and batteries. AWS recently launched a region in Spain, but it took quite a long time to get the region up and running. They announced it two or three years ago, and they immediately put in solar and wind. The energy environment went in and came online before the region came online. Same thing happened in Melbourne, Australia. The Melbourne region has a lot of local solar, and the Sydney region in Australia doesn’t. It’s got a lot more coal. It’s being done jointly, as the cloud providers are planning their regions and rolling around the world, takes a while to build a building, takes a while to put up a wind farm. They’re doing them all in parallel and all going as fast as they can.

Amazon is a little different, because it’s actually solving for the whole of Amazon, not just for AWS. AWS is a minority of Amazon’s energy use. They use a lot more energy in the delivery and the warehouses and that side of the business than they do with AWS. Whereas Google and Microsoft are primarily online businesses, most of their carbon footprint is from IT. If you look at the totals, Amazon is like about four times the carbon footprint of Microsoft and Google. That isn’t AWS, that’s primarily the rest of the business. They’re solving for the big picture. This is one of the things to think about, don’t solve for your little piece of the picture, because you may be suboptimizing something outside. You’ve got to take the scope 3 big picture view of like, what is the entire environment? How do you solve for less carbon in the atmosphere for everybody, rather than making the numbers go down for a particular workload somewhere.

Synodinos: Do you think there are other perspectives than cloud architecture? Thinking about sustainability, also in terms of waste, maybe.

Cockcroft: Energy use is the one that’s really driving carbon, which is driving a lot of climate change, climate crisis. There’s also water. AWS just did an announcement this week about a water commitment, something like clean water by a certain date everywhere in the world, 100% clean. A lot of the AWS regions, they take in wastewater that is not suitable for drinking and irrigation, they use it for cooling. Then they process it before they put it back, and it goes straight into the fields as agricultural water that’s cleaned up. There’s examples like that, where they do use a lot of water in data centers. There’s another one, I think it’s Backblaze. They have a data center that’s on a barge in a river, and they take water in from the river and they put slightly warmer water back into the river. As the river flows past, they just warm it up a little bit. Pat Patterson was telling me about that. Then there’s other examples like that. That’s water.

Then you’ve got plastics, trying to reduce the amount of plastics, amount of garbage that’s generated. They call it the circular economy. You want to have zero to landfill. There are some sites, like the new Amazon headquarters is set up to be a zero to landfill. The Climate Pledge Arena in Seattle, part of their setup is to be zero to landfill. It requires quite a lot of work to get rid of all of the things that normally just get trashed. People are working on those things. They’re not affecting sea level rise through that, but they’re affecting the amount of garbage in the sea and things like that. You can think about the whole big picture.

The other thing that’s important is really more about a just transition. The benefits of modern technology, and the wealth of the world tend to go to a few. The problems caused by climate change and pollution tend to be disproportionately to people who are underprivileged. There is this very specific effort around making this a just transition. We’re building a nice big power plant, but we’re going to put it in the poor neighborhood, because the rich neighborhood can lobby to not have it there. Those kinds of things are happening around the world. I wouldn’t say we’ve stopped doing those things, but it’s getting more visible, and there’s more pushback on them. That’s the other part of sustainability really is about making it a just transition so that everybody gets a better world in the future, rather than it being disproportionately for the more wealthy people in the developed nations.

Synodinos: Recently, I came across a New York Times article that was talking about something called Tree Equity, and about how rich neighborhoods actually have lower temperature during heat waves, because they can afford more trees, people are watering gardens, they have plants.

Cockcroft: There’s a whole lot of greening cities where there’s a lot of concrete and it gets very hot. If you plant trees along the roads, and get rid of the cars, it actually lowers the temperature of the city, like your air conditioning doesn’t have to work as well. Trees are enormously good air conditioning systems, it turns out. They absorb the energy. They keep it cool. They keep the humidity up. A lot of examples there as well.

Synodinos: Talking about the big picture, do these cloud footprint audits clash with AI foundation models that may make use of a lot of power to train models.

Cockcroft: You have to take those into account. If you’re doing very large AI modeling, you’re training for Siri, or Alexa, or Google Voice, you’re running some very big heavyweight jobs. Facebook runs some very big jobs too. The large companies that are doing a lot of image classification or video classification, are burning a lot of compute power, and they are buying clean energy to run it, and trying to use more energy efficient systems. There are more specific CPUs, like Trainium, or the Google Tensor processor are more efficient than some of the GPUs, the standard off-the-shelf GPUs that were really designed for gaming and supercomputer type applications. We’re seeing more specific processor types, which give you better power consumption for the energy. The supercomputer world has a green benchmark, the top 500 most green supercomputers based on the amount of compute they get per energy usage. There’s some focus on it, and it really applies a lot. Big AI systems definitely look like that. It’s a concern. The people that are running those big systems generally are also concerned about the energy and are taking care of it pretty well.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Podcast: Continuing the Culture and Methods Trends Conversation

MMS Founder
MMS Ben Linders

Article originally posted on InfoQ. Visit InfoQ

Subscribe on:






Shane Hastie: Hey folks, QCon New York is returning to Brooklyn this June 13 to 15. Learn from over 80 senior software leaders at early adopter companies as they share their firsthand experiences implementing emerging trends and best practices. From practical techniques to pitfalls to avoid, you’ll gain valuable insights to help you make better decisions and ensure you adopt the right organizational patterns and practices, whether you are a senior software developer, architect, or team lead.

QCon New York features over 75 technical talks across 15 tracks covering MLOps, software architectures, resilient security, staff plus engineering, and more to help you level up on what’s next. Learn more at qconnewyork.com. We hope to see you there.

Good day folks. This is Shane Hastie for the InfoQ Engineering Culture podcast. Today we are taking a slightly different slot. We recently recorded the Culture & Methods trends report and due to a technical glitch, we lost all of the good opinions and points of view from my colleague Ben Linders.

Introduction [01:17]

So Ben and I are sitting down today and we are just going to chat. We’re going to talk about what we see as the trends and we’ll let the conversation go where it goes. Ben, great to see you. It’s been far too long.

Ben Linders: Thank you Shane. I was hoping to meet up at QCon. But unfortunately I couldn’t make it this year, but for sure I’m going to be back next year and hope to see you in person again.

Shane Hastie: Indeed. I don’t feel too sorry for you. I gathered that you were scuba diving in Indonesia, so it wasn’t exactly a hard loss.

Ben Linders: No. That was great.

Shane Hastie: So Ben, at the high level, what are the big things that are happening that you are seeing in the work that you’re out there doing and in the InfoQ articles that you’re pulling together and the news that you’re writing because you are one of our most prolific news writers and article contributors or editors in the Culture & Methods tracks. What’s happening?

Psychological safety across the whole industry has been impacted over the last year [02:13]

Ben Linders: I think there’s a couple of topics and trends that are happening in there. One of them being psychological safety, which is getting more and more attention. And I’m very happy to see that because I think it’s an important topic which has taken a hit due to the pandemic and everything that happened after the pandemic. So a lot of people are feeling less safe in their daily work, have more or less lost their trust in the company. Also, due to the layoffs that’s happening right now. Not just the layoffs but the way that they’re happening. So psychological safety, safety in general has taken and hit and this is hurting teams all over the world. So it’s an important topic to work on.

Another thing I still see is also retrospectives and people trying to get more benefit out of the retrospective and where retrospectives have been there a long time, I’m seeing a revival of people looking into better ways to do them and making them more productive. So there’s a couple of trends that I’m seeing next to stuff like working on the culture in an organization, which is also very important. And finding ways to set up and to improve hybrid distributed remote team, which tends to be the norm right now in organization.

Shane Hastie: Some pretty big points there. So let’s dig into psychological safety. It is a buzzword. It’s a really important element. But I will say that I see it in many organizations almost as lip service. I don’t see as much as I would like of that genuine blameless culture where it is safe to take a risk where I can make a mistake and know that my colleagues are going to support me rather than attack me. Where when something goes wrong, the manager or the leader or the VP or whomever doesn’t turn around and say, who did this? They turn around and say, what can we learn from this? How can we prevent this? And we got a long way to go I think, to get there.

Ben Linders: Yeah, and that’s my main worry. I think we were on the right track. If I look back at 2019, 2020, that was also when this topic started to get a lot of attention and teams working on that. It took a hit in 2020 and years past that. So I think the topic itself has only become more important right now given the situation that we are in. So there’s even a higher need for teams and organizations in general to develop this.

And I agree with you when you talk about the lip service, because this has really come from the inside out. This is something that has to be a true belief to make it work in organizations. You can’t just come in and do a couple of meetings and then expect people to be safe. It’s a very complex behavior. So all the aspects that are in there that have to fit together to make this work and the only way that you can make them fit together if you really truly believe in it. At least that that’s my opinion in this.

Shane Hastie: For our listeners who are in positions of influence, the technical influence, the technical leaders, how do they create that? How do they live that psychologically safe space?

Ways to improve psychological safety in your sphere of influence [05:17]

Ben Linders: Well, I think one of the things that they can do is start small with it. Start within your own group, your own team, somewhere of the meetings that you do yourself. And when you are convinced that this safety is important and you really want to live the safety, start with just a small group, and start working on it and start building on what you have already, instead of trying to initiate some kind of grant skills, skilled up safety because that’s not going to work anyway. So explore what you have, build on that and pull out any successes that you have due to psychological safety.

If you see like, okay, here’s a couple of learnings that we wouldn’t have had if we didn’t have that safety. Pull that out, celebrate it and show it to the people that this is what’s really happening in your company. And I think it’s also a topic that needs continuous attention in there. And one of the ways that I found that can help to work on this topic is to use things like coaching or gamification to work on certain aspects of psychological safety. Try to explore what’s happening there in the company more or less as a way to make it safe to talk about psychological safety, to talk about some of the parts in there, to work some of the part in there and to step by step improve that safety within your own team, within your own group. And then look for ways to expand it or to spark a lot us to work on this. I think that’s the way to approach this.

Shane Hastie: Work within your circle of influence. And I would add to that the honesty and vulnerability. So if I’m a team leader and I do slip and something’s gone wrong and I turn around and say, “How could this have happened?” Acknowledging that no, that was not appropriate. I’m a human being as well. I make mistakes and I’m trying to get better at making this safe environment. So the vulnerability of the leader and showing that even if it’s not something that they’ve built, the muscle, yet that they’re working towards it, creates that safety within the team and then working beyond that.

Ben Linders: Yeah, and I think this can be valid because this also depends on what kind of behavior, what kind of culture is there in the company. Is it allowed for somebody who’s in a leadership role, whether that’s management leadership or technical leadership or an architect role, is it allow for somebody to say, “Hey, I fucked up.” So we can learn from that. Is that okay in the company and in a lot of companies that will still be a challenge to do that.

Shane Hastie: I come back to the Gandhi quote, “Be the change that you want to see in the world.”

Ben Linders: Exactly, exactly. Be an example.

Shane Hastie: Having given people the high bar to go and do this. And honestly, I know from working with many organizations, in my own personal situation, it takes time and it’s an amazing thing to see when a team has that psychological safety as the foundation and they’ve become that high performing effective team. But one of the tools for moving towards high performance is retrospectives. And you mentioned people bringing in new ways, better ways of running retrospectives. This obviously is a topic that is close to your heart. What does a great retrospective look like?

Ways to have great retrospectives [08:32]

Ben Linders: Well, I think a great retrospective is something that more or less floats naturally, that has a lot of interaction of the people who are there in that retrospective, whether they’re in the same room or online in a retrospective. But there’s a lot of interaction which is inclusive for everybody who’s there, which includes everybody’s ideas, everybody’s opinion is in there. And where the facilitator guides the team to understanding one or more issues at hand, one or more issues that they would like to work on, sensing the energy onto group in there and leading this towards maybe just one or two improvement action, one or two steps that the team wants to take and truly wants to do, to improve. So this is  a flow process. It’s not a meeting. It’s not something that you can do instead. It’s something there where, as a facilitator of the team you go along on the flow looking for what can we learn and what can we get out of that and what can we improve.

And making it a flow means that this heavily depends on whoever is facilitating that retrospective to making this for whole team, making this happen together with the team, to sense what’s happening there in the group in the room and to work with that to get the best possible outcome out of that. And I know that sounds very abstract to people, who want to just going to have the five steps and go in there and do the exercises. But I think at the end, it’s much more about facilitation and having of course those tools, those different exercises, but how you facilitate the meeting to get true benefit out of that.

Shane Hastie: What would some examples of those tools, those exercises be?

Ben Linders: Some of the examples are to start for instance with the safety check to see how people feel in the room. Having them do a vote on that and reflect back to the team on what the safety there in the room. There could be tools in there to exploit the problem at hand, aware that could be using some kind of metaphor. One of the popular ones for instance being the sailboat, but there are many others. But using a metaphor to bring out ideas in there. Looking at, okay, what’s the most important thing that we really want to improve? Using some kind of dot voting or consensus to align on just one or two topics in there and then agree on what you want to do next and that what you want to do next, make it as small and as specific as possible.

So there’s different kind of exercises around there and there are plenty of exercises that you can use. So it’s not just one exercise, it’s actually a combination of exercises that you’ll use to get too good result out of that retrospective.

Shane Hastie: I think you and I sit in the space of retrospectives are such an important part of team evolution, team formation and so forth. But for those who perhaps are not as aware of these practices, why should somebody, a team leader or a team member encourage the retrospective? How is it different from, I don’t know, the post-implementation review that we are also familiar with?

Retrospectives are not the only improvement tool for teams [11:30]

Ben Linders: Well, this is actually something where I would wonder should you really impose or maybe even encourage it. What you’re looking for is a way for teams to deal with the issues that they are facing. And it could be a retrospective meeting, which is the way that that’s now been popularized in Agile. But basically it can be any point in time where you step back from the situation, take a little bit of distance and look at, “Okay, what’s happening here? What can we learn from this and how can we better deal with the situation that we are facing right now?”

So this is much more about a mindset of, Okay, we’re not going to accept the situation as this, but we as a group, as a team, are in control and if there’s something that’s bothering us, we want to work on that. And taking the time for that, rather than saying “you should be doing retrospectives.”

And what you see in more mature teams actually is that they still do the retrospectives but they do also a lot of improvement and reflection sessions and putting a stop somewhere halfway meeting like, “Hey, what’s happening in here and can we learn from that?”

They do it in a much more different way, but what you actually seeing there are things like the lean approach and Toyota and somebody pulling an andon line, “Hey, we’re having a problem here, we should work on that.”

So it’s a different kind of thinking that you would like to encourage. If you want to breathe it the retrospective and then again, do it small in the beginning, do it within your team, within that cell of safety that you have already can help you to create this culture. But at the end it’s much more than just doing the meetings. It’s going to happen much more naturally in high matured team. That’s what I see.

Shane Hastie: Yeah, retrospect is incredibly valuable in terms of creating that environment that Senge talks about, the learning organization. What else makes a good culture and how do we encourage great culture?

Ways to encourage great culture [13:19]

Ben Linders: I think one of the ways to encourage a great culture is to first align on what would great meaning for your team or your organization. So what are the kind of things that you would like to see in your culture? What are the kind of things that are important? And one of the exercise that I often do with team when I start working with them on culture is to give them a couple of cards that I pull from a deck on culture cards and just ask them to pull out cards in there. Those are cards with just one word, which for them are important. Which they will say, yeah, this is something that I would like to see in my team. This is some kind of behavior, some kind of thing, mindset, that important for accomplishing our goals because if you want to create a great culture, I think it’s important to align on what are the important things in there that you really want to see in there.

And this can be different things for different teams, for different organizations. It’s much more important to have some alignment and then to see, “Okay, this is something that we want to improve, let’s work on that.” Than trying to define big culture thing and trying to implement that.

Shane Hastie: Again, we’re in the be the space that you want to be, be the change that you want to see.

The importance and value of focus [14:29]

Ben Linders: Exactly. And the other thing is, and this is something that I see it in retrospectives, I  see it in a lot of things I work with teams, is focus. You want to explore the issues. So you want to go broad initially to get an overall picture and overall view to start. But then you want to narrow it down to, “Okay, what is it that we’re going to work on right now? What’s most important to us right now? Which is actually when you think about it, pretty agile. You’re doing the same stuff with your team as you would like to do for your customer. What’s the most important value that we want to deliver? Okay, let’s make sure that we work on that stuff. Let’s make sure that we have those user stories  prioritized to deliver this to our customer.

The same way of thinking applies to your team. What’s the most important thing that we’re struggling with right now? What it is that we would like to see different tomorrow or next week and how can we make that change possible? ‘Cause again, focusing that energy and I think what’s important in there is just to start working that there’s no right or wrong. And this is where I see a lot of agile coaches and scrum masters who are struggling because they have an idea like, this is what’s important for this team, this is what they should be doing, this is what they should be struggling with. And I think in the end it’s much more important what the team feels that they’re struggling with than what the scrum master thinks or what the agile coach thinks that the team should be improving.

I tend to say the team is always right even if they’re wrong, because they’re going to figure out along the way that they improve the wrong thing and they’re going to find out what’s the right thing to improve. But even if they start working on what turns out to be the wrong thing, it’s still going to be a learning experience. As long as you keep small enough, you’re not wasting a lot of time.

Avoiding hybrid hell [16:06]

Shane Hastie: That gives us a bit of a segue. The team. Our teams today are hybrid and remote. The unit of value. The unit of value delivery for most organizations now, is a team. It’s not the single hero, it’s not the individual contributor, it’s the work of a team that brings great products to life in the world. And I’ve certainly seen with the shift through the pandemic and beyond that shift into remote and now into hybrid where sometimes I would look at hybrid and say it’s hybrid hell.

People don’t want to commute to the office to spend seven hours on Zoom calls. The stats, the researcher is saying most knowledge workers want to get together in person occasionally, but they want to be able to work remotely most of the time. This specific numbers are different and different people have different contexts of course. But how do we make this hybrid stuff and the remote work? How do we retain the great teamwork when we start working in this remote and hybrid way?

Ben Linders: It starts with laying the foundations for your teams, which is mostly in the relationship and people getting to know each other. And what I see as a difference here is that traditionally, these are things that are done. For instance, in project by organizing one or two kickoff days, getting everybody together working on, “Okay, what is this project about?” Also working on, “What is this team about?” So trying to build this team culture in a couple of days and then the project was taking off. If you look at the remote and hybrid team, you need to take a different approach on that. First of all, half day or even longer sessions online are simply not going to work. They exhaust people, it’s going to be too much. So you need to take this much more gradually and pick out different aspects over a longer period to build up that team.

That makes it a little bit more challenging. It’s easier to plan just two days at the start of your project, build your team. “Okay. Now we’re a team, now it’s all going to happen.” That’s not the way that it works remotely. So team building is going to be a continuous aspect of your organization, of your team, not something you can just do with the start and then you’re done. So you have to think about activities that you can do to do this team building and do them along the way. Create things in your environment that would support us. I see things like virtual coffee machines or chat channels that people are having open, separate chat channels that are not on the technical work but people can share anything they would like to share. So that’s the kind of stuff that you can think about.

People who say we start off a meeting but we always use the first five to 10 minutes, whatever suits us, to just have some chats instead of diving into the topic immediately. And we’re not going to end after an hour but we’re going end after 45 or 50 minutes max because people going to be exhausted in that meeting and they need time to recover if they have a next meeting. And preferably you don’t want to have back to back meetings. Changing your meetings instead of having a meeting and discussing stuff to co-creating sessions where you working together on something. And it could be as easy as working together in a Google Doc to creating an outline or crafting a document or setting up a statement that you want to use for your team that could work already, but making it into an exercise for people or actively participating instead of discussing stuff.

Remote and asynchronous teams can be very effective [19:36]

Where the big advantage, by the way, is you don’t need to make notes, but you got notes already at the end of the meeting. So it’ll also save you time. These are the kind of things that you can do and be aware that building your team, building your remote and hybrid team is going to take longer, but if you do it well, it can even be a stronger team. I often refer to the example of InfoQ and how we are spread out. Let’s be honest, we’ve seen each other now two times, three times in person, in 10 years.

Shane Hastie: Yes. I think that’s it. We’ve worked together for 10 years.

Ben Linders: Exactly. And I would say that we are a team, we feel how we think about stuff. I know if I sent you proposal, I know that I got a fair chance to get it accepted or not and I know what’s going to be the struggle in there. We know what’s important. We’ve built up a relationship in our team and we only seen each other three times during those 10 years. So I think we’re the living example that you can build teams while working remotely on different sides of the globe. It can be done if you take a time for it, you can do it.

Shane Hastie: And interestingly, one of the trends that we identified was the asynchronous element of the work. And if I think of the way that we as the Culture & Methods team on InfoQ work, it is largely asynchronous. We’re communicating using the asynchronous tools. Email is most common, but we also use Slack. And the fact that we are time zone offset by 12 hours or more doesn’t matter. We’ve got a clear understanding of what our,I would almost call it an informal SLA, between us in terms of how long we take to respond.

Ben Linders: Shared purpose.

Shane Hastie: Shared purpose, common goals. And then we have the opportunities for synchronous work like this. We have a few of these calls a year and every now and then, again, and as you say, I think it’s been three times in the 10 years that we’ve worked together, we’ve actually got together in person. And when we do that, it’s fantastic. Sitting down and sharing food and sharing the same physical space. There’s something special about it, but we are an effective team.

Ben Linders: Exactly. And this is something also which has been built over time. Because you work on this relationship, you share your thoughts, you share your ideas, like this is something I’m wondering about. I share what my thoughts are and you respond to that and we both feel safe enough to say how we feel about stuff. And the safety for me, starting from trust by default. So instead of saying I need to build up trust with people within the institution, I started like, I assume that trust is there. I assume that we’re working on a common goal. I just share my thoughts, I share my ideas and I feel safe enough to say how I feel about stuff. And I trust that you are the other persons also there to do a good job.

And if you start from that assumption, what I’ve seen, and this works for working together with the InfoQ editors and team, but also with the authors that I work with and the interviews that I do, basically, that trust is actually there, that’s 99 out of 100 times that works out fine, probably even more. And it saves you a lot of work or worries just assume that the trust are there and start working from that.

Supporting remote participants in hybrid working sessions [22:39]

Shane Hastie: So yeah, teamwork and teamwork can and does work effectively in that remote environment and in the hybrid. The hybrid space I will say is often harder than the fully remote because one of the hardest things is when you’ve got a group of people coming together in person and some maybe two or three or more are not there with them and they’re trying to connect often through poor technology or low bandwidths.

In my experience, that quickly devolves into a bit of an us and them situation, unfortunately.

Ben Linders: Yes

Shane Hastie: Now, I’ve got some ideas about things that we can do in teams to try and overcome some of that. My general advice is don’t do it at all. Have everyone remote or everyone in person. But if you have to do something like that where you’ve got a few people remote, then make sure that the remote people have an in person buddy who is there to make sure not just to participate on their own behalf, but they are tasked with making sure that their buddy’s voice is heard consistently.

Ben Linders: So it’s much more facilitation role.

Shane Hastie: Much, much more facilitation and it’s harder. These hybrid sessions in my experience are a lot harder to do. I had a request recently to actually teach a class like that and I refused. We just couldn’t do it. I would rather do the whole learning experience remotely because we’ve figured out how to do those well now. There’s the tools, the technology, we understand how to build a good remote learning experience and we understand how to build a good in-person learning experience, but trying to do a hybrid just don’t do it.

Ben Linders: Yes. I had the same request for a workshop also at a conference and I came up with the same solution. Like, okay, that’s not going to work. Either we’re going to go fully remote or fully on site. And since fully on site is not an option because a couple of people are remote, then we go fully remote. So I actually did the workshop for my hotel room and went into lunch and saw a couple of people who were attending on my workshop during lunch. Then we all went back to our room again and did it again fully remotely. But that’s the way it works. It doesn’t work hybrid.

Shane Hastie: Yeah, absolutely. So what have you got coming up? What are the interesting things in your pipeline? Because this is going to go out probably sometime in May. What are the hints that we can give our listeners who watch out for the articles that Ben’s got in his pipe?

Articles that Ben has coming out soon [25:19]

Ben Linders: If you look at the articles, there’s stuff coming up on sustainability, which is also a very important topic. I see more and more people who are working on this. There’s initiatives on the standardization on a lot of support from the open source also on sustainability. So there’s going to be a couple of articles on that.

There’s stuff ongoing on culture change, partly Agile, partly DevOps, where people are looking for, okay, it’s not just the technical stuff that we are changing, but if you want to have a true adoption of Agile, of also DevOps techniques, we also need to work on the culture to make that succeed. So that’s the kind of things that are ongoing. I think these are the main topics right now.

I’m trying to solicitate some articles on leadership. Hopefully, something on leading without blame. So there’s different topics also in the leadership area to developing. The thing in there, which makes it differently from a lot of stuff that we are doing already or a lot of stuff that’s known already.

By the way, there’s a big gap between what’s known and what we’re doing. There’s a lot of things which we know but have not been adopted generally already in the industries. So you’re looking for a way to bring out those golden nuggets to the people like, “Hey, this is something that can be done and by the way, this is how people are doing it. This is something we learn from.”

That’s the kind of stuff that I think that really provides value to the InfoQ readers. If you can show them how others have been doing it and what they tried out, what they learned and where they failed. Also, where it went wrong and how they responded to that. So that’s the kind of stuff that I’d like to bring out.

Shane Hastie: Well Ben, it’s been great to be together synchronously, have this great conversation. Really just want to acknowledge the great work that you do on InfoQ. The articles that you’re bringing. The news that you’re writing, and just to say thank you and encourage our audience to follow Ben Linders. He’s got good stuff. Thanks so much.

Ben Linders: Thank you.

Mentioned

About the Author

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Curiosity and Self-Awareness are Must-Haves for Handling Conflict

MMS Founder
MMS Ben Linders

Article originally posted on InfoQ. Visit InfoQ

When you’re in a team, collaborating with others, it’s crucial to embrace diverse opinions and dissent to get to the best solutions; according to Marion Løken you need to have good conflicts. Conflicts have bad reputations, but with curiosity you can harvest more positive outcomes and build trust and psychological safety. Self-awareness of your emotions and reactions can help prevent saying or doing something that you might regret later.

Marion Løken spoke about handling conflicts in teams at NDC Oslo 2023.

Good conflicts foster trust and shield both individuals and the entire team. Løken gave four telltale signs to spot good conflicts:

  • They focus on topics rather than targeting people
  • They prioritise the success of the team over individual interests
  • They get addressed promptly
  • They are solution-oriented

Building trust is crucial, Løken said; it’s always better to have a foundation of trust when entering a conflict. Investing time and effort in building trust with your team members is never wasted, she mentioned.

To deal with conflicts, Løken suggested treating curiosity as a superpower. Curiosity shifts the conflict from a focus on being right to finding common ground, and fosters positive emotions:

It’s important to exercise “being curious” like a muscle. Make it a habit by practising it frequently, starting with low-stakes situations, and gradually increasing the complexity.

You can facilitate conflicts better by ensuring a fair process, Løken said. Such a process should include creating an environment where all voices are heard, and decisions are made collectively:

If you establish a fair process right from the start and stay focused on the desired outcome, things are unlikely to go wrong.

Løken mentioned that she has learned that conflicts can actually test and nurture psychological safety. It’s like a litmus test for gauging how safe the environment is. When you dare to share an unconventional thought with your colleagues, their reactions become the telltale signs, Løken said. It either confirms that it’s safe to take interpersonal risks and build trust, or it reveals the opposite, she concluded.

InfoQ interviewed Marion Løken about handling conflicts in teams.

InfoQ: What’s your view on conflicts in teams?

Marion Løken: I think of conflicts as a health metric! If there’s too little conflict, you might be facing a “groupthink” scenario that stifles innovation. On the other hand, too much conflict may indicate the need to clarify roles, goals, and expectations to ensure productivity. So, finding the right balance is key to a successful and harmonious team dynamic!

InfoQ: Any tips for finding that balance?

Løken: I’m not sure about finding the perfect balance, but here’s a helpful indicator to consider. Pay attention to the number and nature of topics discussed in a team retrospective. In my experience, if not much comes up, the pressure for conformity is usually the problem.

When it comes to dealing with conflict, it’s important to recognize that approaches can vary greatly across different cultures. Some cultures embrace direct and open conflict, while others prefer a more non-confrontational approach, handling conflicts behind closed doors. Therefore, finding the right balance and approach requires adaptation to cultural norms.

As with many cultural aspects, expectations on good conflicts need to come from the top, and to be reinforced everytime it happens. And the top is not only the team lead, it is any figure of authority: tech lead, product manager, senior developer, etc.

InfoQ: What suggestions do you have for good conflict resolution?

Løken: One thing is to cultivate self-awareness. Being aware of your own emotions and reactions is key. Just taking a short moment to pause and reflect on your feelings can help prevent saying or doing something that you might regret later.

It’s important to acknowledge your biases. We all possess biases that can lead to erroneous beliefs and harmful choices. It’s natural to trust individuals who resemble us more, so a helpful exercise is to envision how you would have responded if the conflict involved someone you have immense trust in.

As with any soft skill, it’s normal to make mistakes along the way. Stay humble and be willing to apologise sincerely when things go wrong. Apologising genuinely not only shows accountability, but also helps in building trust, two pillars of psychological safety.

InfoQ: How can we turn a not-so-good conflict into a good one?

Løken: That’s a tough one. To break a bad dynamic, you need to make some adjustments. Is the conflict’s goal unclear, or is there someone with a lot at stake who needs coaching to contribute positively to the resolution? Taking a short break where people can reconnect and focus on their commonalities is a great idea. How about an ice cream break?

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Why the Document Model Is More Cost-Efficient Than RDBMS – The New Stack

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

<meta name="x-tns-categories" content="Data / Software Development / Storage“><meta name="x-tns-authors" content="“>

Why the Document Model Is More Cost-Efficient Than RDBMS – The New Stack

Modal Title

2023-05-25 09:24:22

Why the Document Model Is More Cost-Efficient Than RDBMS

sponsor-mongodb,sponsored-post-contributed,

A proper document data model mirrors objects that the application uses, and stores data using the same structures already defined in the app’s code.


May 25th, 2023 9:24am by


Featued image for: Why the Document Model Is More Cost-Efficient Than RDBMS

A relational database management system (RDBMS) is great at answering random questions. In fact, that is why it was invented. A normalized data model represents the lowest common denominator for data. It is agnostic to all access patterns and optimized for none.

The mission of the IBM System R team, creators of arguably the first RDBMS, was to enable users to query their data without having to write complex code requiring detailed knowledge of how their data is physically stored. Edgar Codd, inventor of the RDBMS, made this claim in the opening line of his famous document, “A Relational Model of Data for Large Shared Data Banks”:

Future users of large data banks must be protected from having to know how the data is organized in the machine.

The need to support online analytical processing (OLAP) workloads drove this reasoning. Users sometimes need to ask new questions or run complex reports on their data. Before the RDBMS existed, this required software engineering skills and a significant time investment to write the code required to query data stored in a legacy hierarchical management system (HMS). RDBMS increased the velocity of information availability, promising accelerated growth and reduced time to market for new solutions.

The cost of this data flexibility, however, was significant. Critics of the RDBMS quickly pointed out that the time complexity, or the time required to query a normalized data model was very high compared to HMS. As such, it was probably unsuitable for the high-velocity online transaction processing (OLTP) workloads that consume 90% of IT infrastructure. Codd himself recognized the tradeoffs. The time complexity of normalization is also referred to in his paper on the subject:

“If the strong redundancies in the named set are directly reflected in strong redundancies in the stored set (or if other strong redundancies are introduced into the stored set), then, generally speaking, extra storage space and update time are consumed with a potential drop in query time for some queries and in load on the central processing units.”

This would probably have killed the RDBMS before the concept went beyond prototype if not for Moore’s law. As processor efficiency increased, the perceived cost of the RDBMS decreased. Running OLTP workloads on top of normalized data eventually became feasible from a total cost of ownership (TCO) perspective, and from 1980 to 1985, RDBMS platforms were crowned as the preferred solution for most new enterprise workloads.

As it turns out, Moore’s law is actually a financial equation rather than a physical law. As long as the market will bear the cost of doubling transistor density every two years, it remains valid.

Unfortunately for RDBMS technology, that ceased to be the case around 2013 when the cost of moving to 5 nanometers fab for server CPUs proved to be a near-insurmountable barrier to demand. The mobile market adopted 5nm technology to use as a loss leader, recouping the cost through years of subscription services associated with the mobile device.

However, there was no subscription revenue driver in the server processing space. As a result, manufacturers have been unable to ramp up 5nm CPU production and per-core server CPU performance has been flattening for almost a decade.

Last February, AMD announced that it is decreasing 5nm wafer inventory indefinitely going forward in response to weak demand for server CPUs due to high cost. The reality is that server CPU efficiency might not see another order-of-magnitude improvement without a generational technology shift, which could take years to bring to market.

All this is happening while storage cost continues to plummet. Normalized data models used by RDBMS solutions rely on cheap CPU cycles to enable efficient solutions. NoSQL solutions rely on efficient data models to minimize the amount of CPU required to execute common queries. Oftentimes this is accomplished by denormalizing the data, essentially trading CPU for storage. NoSQL solutions become more and more attractive as CPU efficiency flattens while storage costs continue to fall.

The gap between RDBMS and NoSQL has been widening for almost a decade. Fortune 10 companies like Amazon have run the numbers and gone all-in with a NoSQL-first development strategy for all mission-critical services.

A common objection from customers before they try a NoSQL database like MongoDB Atlas is that their developers already know how to use RDBMS, so it is easy for them to “stay the course.” Believe me when I say that nothing is easier than storing your data the way your application actually uses it.

A proper document data model mirrors the objects that the application uses. It stores data using the same data structures already defined in the application code using containers that mimic the way the data is actually processed. There is no abstraction between the physical storage or increased time complexity to the query. The result is less CPU time spent processing the queries that matter.

One might say this sounds a bit like hard-coding data structures into storage like the HMS systems of yesteryear. So what about those OLAP queries that RDBMS was designed to support?

MongoDB has always invested in APIs that allow users to run the ad hoc queries required by common enterprise workloads. The recent addition of an SQL-92 compatible API means that Atlas users can now run the business reports they need using the same tooling they have always used when connecting to MongoDB Atlas, just like any other RDBMS platform via ODBC (Open Database Connectivity).

Complex SQL queries are expensive. Running them at high velocity means hooking up a firehose to the capex budget. NoSQL databases avoid this problem by optimizing the data model for the high velocity queries. These are the ones that matter. The impact of this design choice is felt when running OLAP queries that will always be less efficient when executed on denormalized data.

The good news is nobody really cares if the daily report used to take 5 seconds to run, but now it takes 10. It only runs once a day. Similarly the data analyst or support engineer running an ad hoc query to answer a question will never notice if they get a result in 10 milliseconds vs. 100ms. The fact is OLAP query performance almost never matters, we just need to be able to get answers.

MongoDB leverages the document data model and the Atlas Developer Data Platform to provide high OLTP performance while also supporting the vast majority of OLAP workloads.

Group
Created with Sketch.

TNS owner Insight Partners is an investor in: Pragma.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Applying Test-Driven Development in the Cloud

MMS Founder
MMS Ben Linders

Article originally posted on InfoQ. Visit InfoQ

In the cloud, application development can be treated end-to-end with its accompanying infrastructure. This makes it possible to use test-driven development (TDD) and refactoring on the full application, which can bring down maintenance costs.

Michal Svoboda will give a talk about test-driven development of cloud apps at XP 2023. This conference is held June 13-16 in Amsterdam, the Netherlands.

For cloud apps, applications can be developed and deployed along with their accompanying infrastructure as one coherent piece of code. According to Svoboda, removing the “infrastructure” as a separate element allows us to apply agile engineering techniques such as TDD and refactoring on the scope of the whole app, including its cloud resources.

The latency and asynchronous nature of the cloud can be a problem. Waiting for resources to provision or timeouts to elapse obstructs the rapid TDD cycle. Svoboda suggests switching to an incremental update model, i.e. to not destroy resources at the end of each test, and to have a clean deployment when integrating only:

Test speedup techniques were drawn from our TDD bag of tricks. Tactically using state-based tests or testing only modified parts of the code would be a few examples. It pays to remember that hurdles in testing provide useful feedback to the whole development cycle. This feedback made us weigh carefully our architecture and procedural choices.

According to Svoboda, TDD brings down the app maintenance costs which are by far the major part of software TCO. Using TDD, it is easy to add features or refactor anywhere, be it your own code or use of the cloud resources, even years later.

InfoQ interviewed Michal Svoboda about cloud development using TTD.

InfoQ: How has the cloud impacted the way we provision infrastructure?

Michal Svoboda: Through APIs, cloud resources can be created and destroyed in a fully automated way. (Strictly speaking, this isn’t just the cloud. Cloud providers just make this function extremely accessible.) We don’t need to think about “infrastructure”, as in servers and networks that exist independently of applications. “Infrastructure” no longer requires a special approach.

On top of the classic infrastructure, the cloud provides single-specialty services, such as storage, functions, or streams. Many cloud apps don’t just run in the cloud, they consist of the cloud.

InfoQ: How do you do test-driven development for cloud applications?

Svoboda: TDD of cloud apps is similar to TDD of other apps. Instead of calling constructors and functions to create objects in memory, we call APIs to create resources in the cloud. An “arrange, act, assert” test for a stream resource is illustrated in pseudo-code below:

[Test that stream can be written and read]

  1. Deploy stream and read/write roles
  2. Put data to stream using writer role
  3. Poll stream using reader role, asserting the correct object is received or timeout
  4. Remove stream and roles

This is a very simple functional test. State-based tests can be performed using API calls that “query configuration” of resources. More complex setups of resources can be tested using the same principle.

As per TDD, we let the test be written first, fail, and follow with the implementation. Importantly, we listen to feedback and let any difficulties of testing drive our development. Our technology, architecture and procedural choices are based on ease of testing.

InfoQ: What challenges have you faced and how did you deal with them?

Svoboda: Available tooling was a problem. For this TDD on cloud approach to work well, resource deployment code must be first-class citizen in the programming language of choice. Contemporary tools provide command-line interfaces over a model in their own languages in a “cloud Makefile” fashion. Because these tools follow the “separate infrastructure” paradigm, it can get cumbersome to communicate with them. This was great feedback as well early in our development and steered our tooling and provider decisions.

InfoQ: Besides lower costs, what benefits did you get from doing TDD for cloud apps?

Svoboda: The tests make it possible to account for edge cases. The applications are stable and we know what to expect. We even worked out a few rough edges with our cloud provider!

Because our approach made it very easy to set up and test resources, we benefited in the prototyping phase as well. The cloud is a complex environment and we failed more times than I can count due to programming errors and wrong functionality assumptions. Using this approach, we failed fast.

Many important questions were pragmatically answered early in the development. What technologies will we use? How will we deploy and operate our application? How will we manage long-term state and sensitive data?

InfoQ: What’s your advice to people who want to try out TDD for their cloud application?

Svoboda: Start slow with a smaller-scale project first. There are a few things that will need bootstrapping before the first test can pass. Get used to the mechanics. Make sure to refactor aggressively. Learn from the feedback. Good luck!

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Azure Container Storage Now in Public Preview

MMS Founder
MMS Steef-Jan Wiggers

Article originally posted on InfoQ. Visit InfoQ

Microsoft recently announced the public preview of Azure Container Storage, a volume management service built natively for containers.

Azure Container Storage provides a consistent management experience across different storage offerings, including a managed option (backed by Azure Elastic SAN), Azure Disks, and ephemeral disks on container services – intended to simplify the deployment of persistent volumes. Previously, customers had to use individual container storage interface (CSI) drivers to offer cloud storage for containers, causing various operational issues regarding application availability, performance, cost, usability, and stability.

With Azure Container Storage, customers can now easily create and manage block storage volumes for production-scale stateful container applications and run them on Kubernetes, ensuring consistent experiences across different environments. In addition, it is optimized to enhance the performance of stateful workloads on Azure Kubernetes Service (AKS) clusters by accelerating the deployment of stateful containers with persistent volumes and improving quality with reduced pod failover time through fast attach/detach.

In an Azure blog post, the authors explain that the service aligns with open-source container-native storage approaches:

Azure Container Storage runs microservices-based storage controllers in Kubernetes to abstract the storage management layer from pods and backing storage, enabling portability across Kubernetes nodes and the ability to mount different storage options.

Azure Container Storage components include:

  • A storage pool: A collection of storage resources grouped and presented as a unified storage entity for your AKS cluster.
  • A data services layer: Responsible for replication, encryption, and other add-on functionality absent in the underlying storage provider.
  • A protocol layer: Exposing provisioned volumes via NVMe-oF protocol to application pods.

Source: https://azure.microsoft.com/en-us/blog/transforming-containerized-applications-with-azure-container-storage-now-in-preview/

Some of the other key benefits of the service besides the consistent management experience are:

  • Cost optimization: Azure Container Storage allows for dynamic sharing of provisioned resources on a Storage Pool, optimizing storage utilization and price-performance, resulting in a projected 20% total cost of ownership (TCO) saving when running a stateful Kubernetes cluster on Azure with AKS.
  • Easy scaling: Azure Container Storage provides the ability to quickly scale storage according to customers’ application needs, with optimized latencies for PV creation and increased scalability limits, allowing for a more significant number of PVs to be attached to a pod, providing more flexibility in designing your application architecture without storage limitations, even for pods hosted on small AKS nodes.
  • Integration with Kubernetes: Azure Container Storage offers seamless integration with Kubernetes, allowing users to leverage familiar kubectl commands for deployment, management, and automation of volume management flows while also providing Azure native user experience support through Azure Portal, CLI, and PowerShell.

Leandro Carvalho, a cloud solution architect – Support for Mission Critical at Microsoft, tweeted:

#Azure Container Storage is a game-changer for stateful apps on #AKS.

In addition, Dr. Ian McDonald, an EMEA cloud solution architect director, tweeted:

Great to see – making storage available to containers that can scale more quickly and solve issues such as high IOPS needed for small storage.

More details on Azure Container Storage are available on the documentation landing page. Additionally, it is available in the East US, West Europe, West US 2, and West US 3 regions by signing up through a short survey.

Lastly, the pricing for Azure Container Storage comprises two components: the cost of the underlying storage the customer uses and a service fee for Azure Container Storage orchestration. The details of the pricing are available on the pricing page.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.