Optimize AI Workloads: Google Cloud’s Tips and Tricks

MMS Founder
MMS Claudio Masolo

Article originally posted on InfoQ. Visit InfoQ

Google Cloud has announced a suite of new tools and features designed to help organizations reduce costs and improve efficiency of AI workloads across their cloud infrastructure. The announcement comes as enterprises increasingly seek ways to optimize spending on AI initiatives while maintaining performance and scalability.

The new features focus on three key areas: compute resource optimization, specialized hardware acceleration, and intelligent workload scheduling. These improvements aim to address one of the primary challenges enterprises face when deploying AI at scale—balancing innovation with cost management.

In the announcement, Google Cloud’s VP of AI Products said:

Organizations are increasingly looking for ways to optimize their AI costs without sacrificing performance or capability, these new features directly address that need by providing more efficient ways to run machine learning training and inference.

Google Cloud’s approach begins with strategic platform selection. Organizations now have multiple options ranging from fully-managed services to highly customizable solutions. Vertex AI offers a unified, fully managed AI development platform that eliminates infrastructure management concerns, while Cloud Run with GPU support provides a scalable inference service option. For long-running tasks, Cloud Batch combined with Spot Instances can significantly reduce costs. Organizations with existing Kubernetes expertise may benefit from Google Kubernetes Engine (GKE), while those requiring maximum control can utilize Google Compute Engine.

A key recommendation focuses on optimizing container performance. When working with inference containers in environments like GKE or Cloud Run, Google advises keeping containers lightweight by externally storing models using Cloud Storage with FUSE, Filestore, or shared read-only persistent disks. This approach dramatically reduces container startup times and improves scaling efficiency—critical factors in managing both performance and costs.

Storage selection emerges as another critical factor in optimization. Google Cloud recommends Filestore for smaller AI workloads, Cloud Storage for object storage at any scale, and Cloud Storage FUSE for mounting storage buckets as a file system. For workloads requiring lower latency, Parallelstore provides sub-millisecond access times, while Hyperdisk ML delivers high-performance storage specifically engineered for serving tasks.

To prevent costly delays in resource acquisition, Google Cloud emphasizes the importance of Dynamic Workload Scheduler and Future Reservations. These tools secure needed cloud resources in advance, guaranteeing availability when required while optimizing the procurement process for popular hardware components.

The final strategy addresses deployment efficiency through custom disk images. Rather than repeatedly configuring operating systems, GPU drivers, and AI frameworks from scratch, organizations can create and maintain custom disk images that allow new, fully-configured workers to deploy in seconds rather than hours.

AI cost management has become increasingly critical across industries, in response to the growing demand for more efficient and cost-effective AI infrastructure, both AWS and Microsoft Azure have also ramped up their efforts to support enterprise AI workloads. AWS has introduced new cost-aware tools within its SageMaker platform, including Managed Spot Training and model monitoring capabilities to help users optimize both performance and budget. Similarly, Azure is enhancing its AI offering through Azure Machine Learning with features like intelligent autoscaling, reserved capacity pricing, and seamless integration with Azure Kubernetes Service (AKS) for better workload orchestration.

Like Google Cloud, both AWS and Azure are emphasizing hybrid flexibility, storage optimization, and GPU acceleration to give enterprises more control over how they scale and spend. This convergence signals a competitive push across cloud providers to address the pressing challenge of AI cost management while still empowering innovation at scale.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Alliancebernstein L.P. Sells 5,503 Shares of MongoDB, Inc. (NASDAQ:MDB) – MarketBeat

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Alliancebernstein L.P. lessened its position in shares of MongoDB, Inc. (NASDAQ:MDBFree Report) by 5.9% in the 4th quarter, according to its most recent filing with the Securities & Exchange Commission. The fund owned 87,001 shares of the company’s stock after selling 5,503 shares during the quarter. Alliancebernstein L.P. owned approximately 0.12% of MongoDB worth $20,255,000 at the end of the most recent quarter.

Several other large investors have also modified their holdings of the business. Norges Bank acquired a new stake in shares of MongoDB in the fourth quarter worth $189,584,000. Raymond James Financial Inc. acquired a new position in shares of MongoDB in the 4th quarter worth approximately $90,478,000. Amundi raised its holdings in shares of MongoDB by 86.2% during the fourth quarter. Amundi now owns 693,740 shares of the company’s stock worth $172,519,000 after purchasing an additional 321,186 shares during the period. Assenagon Asset Management S.A. grew its position in shares of MongoDB by 11,057.0% during the 4th quarter. Assenagon Asset Management S.A. now owns 296,889 shares of the company’s stock valued at $69,119,000 after buying an additional 294,228 shares during the last quarter. Finally, Pictet Asset Management Holding SA boosted its position in MongoDB by 69.1% during the 4th quarter. Pictet Asset Management Holding SA now owns 356,964 shares of the company’s stock valued at $83,105,000 after purchasing an additional 145,854 shares during the period. 89.29% of the stock is currently owned by institutional investors and hedge funds.

MongoDB Stock Performance

Shares of MDB stock traded up $25.49 during trading hours on Wednesday, reaching $171.34. The company had a trading volume of 3,611,865 shares, compared to its average volume of 1,790,746. The firm has a market cap of $13.91 billion, a PE ratio of -62.53 and a beta of 1.49. MongoDB, Inc. has a one year low of $140.78 and a one year high of $387.19. The business has a fifty day moving average price of $228.92 and a 200-day moving average price of $259.09.

MongoDB (NASDAQ:MDBGet Free Report) last announced its quarterly earnings data on Wednesday, March 5th. The company reported $0.19 EPS for the quarter, missing the consensus estimate of $0.64 by ($0.45). MongoDB had a negative net margin of 10.46% and a negative return on equity of 12.22%. The business had revenue of $548.40 million during the quarter, compared to analysts’ expectations of $519.65 million. During the same quarter in the previous year, the firm earned $0.86 earnings per share. Equities research analysts forecast that MongoDB, Inc. will post -1.78 EPS for the current year.

Insider Activity at MongoDB

In other news, CAO Thomas Bull sold 301 shares of the company’s stock in a transaction dated Wednesday, April 2nd. The shares were sold at an average price of $173.25, for a total transaction of $52,148.25. Following the transaction, the chief accounting officer now owns 14,598 shares in the company, valued at $2,529,103.50. This represents a 2.02 % decrease in their ownership of the stock. The transaction was disclosed in a legal filing with the SEC, which is available through this hyperlink. Also, insider Cedric Pech sold 1,690 shares of the company’s stock in a transaction that occurred on Wednesday, April 2nd. The shares were sold at an average price of $173.26, for a total transaction of $292,809.40. Following the transaction, the insider now owns 57,634 shares in the company, valued at $9,985,666.84. The trade was a 2.85 % decrease in their position. The disclosure for this sale can be found here. Insiders sold 58,060 shares of company stock worth $13,461,875 in the last 90 days. 3.60% of the stock is owned by corporate insiders.

Wall Street Analysts Forecast Growth

MDB has been the topic of a number of research analyst reports. Citigroup lowered their price objective on shares of MongoDB from $430.00 to $330.00 and set a “buy” rating on the stock in a research note on Tuesday, April 1st. Daiwa America upgraded MongoDB to a “strong-buy” rating in a research note on Tuesday, April 1st. JMP Securities reissued a “market outperform” rating and issued a $380.00 target price on shares of MongoDB in a research note on Wednesday, December 11th. Oppenheimer reduced their price target on shares of MongoDB from $400.00 to $330.00 and set an “outperform” rating on the stock in a research report on Thursday, March 6th. Finally, Needham & Company LLC reduced their price objective on MongoDB from $415.00 to $270.00 and set a “buy” rating on the stock in a report on Thursday, March 6th. Seven equities research analysts have rated the stock with a hold rating, twenty-four have given a buy rating and one has assigned a strong buy rating to the stock. According to MarketBeat.com, MongoDB has a consensus rating of “Moderate Buy” and an average target price of $312.84.

Read Our Latest Analysis on MDB

MongoDB Company Profile

(Free Report)

MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

Further Reading

Institutional Ownership by Quarter for MongoDB (NASDAQ:MDB)

Before you consider MongoDB, you’ll want to hear this.

MarketBeat keeps track of Wall Street’s top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis. MarketBeat has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on… and MongoDB wasn’t on the list.

While MongoDB currently has a Moderate Buy rating among analysts, top-rated analysts believe these five stocks are better buys.

View The Five Stocks Here

The 10 Best AI Stocks to Own in 2025 Cover

Wondering where to start (or end) with AI stocks? These 10 simple stocks can help investors build long-term wealth as artificial intelligence continues to grow into the future.

Get This Free Report

Like this article? Share it with a colleague.

Link copied to clipboard.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Redis Launches Vector Sets and a New Tool for Semantic Caching of LLM Responses

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Redis, the company behind the eponymous in-memory key-value database, mostly made news in recent months because of its license change, which resulted in the launch of the Valkey project. Now, Redis is hoping to change the conversation a bit with the launch of two new AI-centric products ahead of the launch of Redis 8 on May 1. The first of these is a new caching tool, LangCache, which allows developers to bring large language model (LLM) response caching to its applications. The second is the launch of a new data type, vector sets, for…

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Why Use a NoSQL Database for AI? There Are Many Great Reasons – The New Stack

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

<meta name="x-tns-categories" content="AI / AI Agents / Databases“><meta name="x-tns-authors" content="“>


Why Use a NoSQL Database for AI? There Are Many Great Reasons – The New Stack


<!– –>

As a JavaScript developer, what non-React tools do you use most often?

Angular

0%

Astro

0%

Svelte

0%

Vue.js

0%

Other

0%

I only use React

0%

I don’t use JavaScript

0%

2025-04-08 08:00:28

Why Use a NoSQL Database for AI? There Are Many Great Reasons

sponsor-couchbase,sponsored-post-contributed,

NoSQL databases play a key role in facilitating AI adoption. A flexible platform with memory, persistence and traceability is needed to power AI agents.


Apr 8th, 2025 8:00am by


Featued image for: Why Use a NoSQL Database for AI? There Are Many Great Reasons

Image from whiteMocca on Shutterstock.

With AI increasingly becoming table stakes for organizations, let’s dig into the role NoSQL databases play in facilitating AI adoption, and why a flexible developer data platform with memory, persistence and traceability is needed to power AI agents.

Starting With the Basics on NoSQL

NoSQL databases, short for “Not only SQL,” were developed to address modern data storage and scalability needs that traditional relational databases struggle with.

Unlike relational databases, which were designed to minimize data duplication and scale vertically, NoSQL databases use flexible data models such as key-value, document, column, time series and graph formats to accommodate web, mobile and IoT applications. These databases operate as primary content stores, allowing flexible data access and high availability through horizontal scaling across distributed systems.

Organizations choose NoSQL for its ability to support dynamic, real-time and personalized user experiences, adapting quickly to changing application requirements. NoSQL databases, particularly document-oriented ones, use the JSON format, enabling agile development without rigid schemas.

Additionally, modern NoSQL systems incorporate relational database features, including ACID (atomicity, consistency, isolation and durability) transactions and SQL-like querying, while maintaining scalability, high availability and efficiency. This convergence of relational and NoSQL capabilities simplifies database management, making NoSQL the preferred choice for modern, flexible cloud computing and distributed data applications.

AI Agents Are Operational Applications

AI agents, which automate traditional software and human workflows, require real-time data access for task execution and to support reasoning.

Unlike traditional analytical databases, which are often relational, highly structured and process data in delayed batches, operational databases enable low-latency, high-frequency read and write operations, which are essential for AI-driven applications. In the retail industry, for instance, AI agents can use diverse operational data such as user profiles, inventory, promotions, product vector embeddings and more for powerful semantic search.

To function effectively, agents must integrate multiple data formats, engage with models, cache conversations and maintain those interaction histories. The database needs to support high-velocity workloads, ensuring AI agents remain responsive and scalable.

AI Needs Access to a Variety of Data in a Flexible Way

AI agents require fast data access and a diverse range of data to operate effectively, especially in real-time decision-making scenarios. They need both structured data (such as databases and spreadsheets) and unstructured data (such as text, images and audio) to generate powerful insights and responses. The ability to quickly pull relevant data enables AI to produce responses that are the most contextually relevant to the user and make predictions with minimal latency.

Additionally, real-time data sharing through APIs and functions allows AI systems to integrate seamlessly with other platforms, ensuring up-to-date information flow and facilitating dynamic, automated decision-making. Without rapid access to varied data sources, AI agents risk providing outdated, incomplete or inaccurate responses, limiting their effectiveness, whether supporting internal or customer-facing applications.

Multiagent AI Systems Need To Work Together

​In enterprise environments, multiagent AI systems can efficiently handle dynamic workloads and deliver prompt responses but will need real-time performance and scalability. By collaborating through distributed shared memory, these agents can swiftly access and update shared data, enhancing coordination and reducing communication overhead. Implementing low-latency, event-driven synchronization mechanisms ensures that agents remain aligned and can react promptly to changes, thereby maintaining system coherence and responsiveness.

Techniques such as array-based queuing locks can be employed to manage access to shared resources, minimizing contention and ensuring fairness among agents. Additionally, communication protocols like the message passing interface facilitate efficient data exchange and synchronization across distributed systems. Collectively, these strategies enable multiagent AI systems to operate effectively in complex, large-scale enterprise settings.

Memory and Persistence Together

Maintaining short-term, long-term, procedural and shared memory is critical for AI agents to ensure contextual awareness, continuity and efficiency in decision-making. Short-term memory (caching) allows AI to rapidly retrieve recent interactions and computations, reducing redundant processing and improving responsiveness. Long-term memory (persistence) ensures AI agents retain historical context, enabling them to learn from past interactions and refine their outputs over time.

Having both in a unified platform streamlines performance, as agents can seamlessly transition between fast temporary access and deep retained knowledge. Additionally, AI agents need structured storage for critical information such as API definitions, function calls and prompts, allowing them to interact efficiently with data, execute the correct actions and ensure consistency across different sessions. By integrating these memory types, AI systems can provide more intelligent, context-aware and adaptive interactions while optimizing computational efficiency.

Governance and Traceability

Governance and traceability are essential for AI agents, particularly in enterprise environments where compliance, accountability and safe AI behavior are critical. Organizations must ensure that AI-driven decisions are transparent, auditable and explainable to meet regulatory requirements, mitigate risks and build trust in AI systems. Traceability allows enterprises to monitor how AI models reach conclusions, making it possible to detect biases, errors or security vulnerabilities.

By implementing robust governance frameworks, businesses can enforce ethical AI use, prevent unauthorized access or misuse, and maintain consistency in decision-making. Additionally, enterprises need auditable logs of AI interactions, ensuring that every decision can be reviewed, verified and improved over time. Without proper governance and traceability, AI systems may pose compliance risks, erode trust and fail to align with business objectives and legal standards.

The Challenge of Point Solutions

Reliable and unified data architectures are key to successful AI projects. Using multiple database and data cache systems for AI agents create significant challenges by complicating data access, hindering collaboration, disrupting memory integration, limiting flexibility, increasing operational expenses and undermining governance. Organizations that deploy multiple single-purpose database solutions also introduce data sprawl, risk and complexity, making it difficult to effectively use AI, minimize AI confusion, trace the source of AI hallucinations and debug incorrect variables.

Data complexity is AI’s enemy because AI is imprecise to begin with. Using AI within a complex, multi-database architecture produces unreliable results because the risk of feeding AI models inconsistent or incorrect data is too high.

AI agents require fast, seamless access to diverse data for real-time decisions, but drawing data from disparate systems introduces inefficiencies, backtracing issues and delays. Collaboration falters as multiagent systems face compatibility issues, slowing communication and coordination. Memory management suffers from fragmentation, breaking the continuity needed for contextual awareness and performance. Flexibility is curtailed, delaying adaptation to new needs or features, while governance and compliance become harder to enforce due to inconsistent monitoring and traceability.

By simplifying the data management activities that surround AI, a unified, multipurpose database resolves these issues, enabling reliable, scalable and compliant AI operations.

A NoSQL Data Platform To Support Agentic AI 

Tens of thousands of organizations have adopted NoSQL, making it their choice for modern applications. AI agents are the next logical step on that path to be supported by fast and flexible NoSQL data.

To run critical applications, many enterprises choose Couchbase to improve resiliency, performance and stability while reducing risk, data sprawl and total cost of ownership. Couchbase is the developer data platform that powers critical applications in our AI world. Find out more about how Couchbase Capella and AI services help organizations accelerate the development of agentic AI applications. Start using Capella today for free and sign up for the private preview of Capella AI Services.

YOUTUBE.COM/THENEWSTACK

Tech moves fast, don’t miss an episode. Subscribe to our YouTube
channel to stream all our podcasts, interviews, demos, and more.

Group
Created with Sketch.







Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Kafka 4.0: KRaft Simplifies Architecture

MMS Founder
MMS Steef-Jan Wiggers

Article originally posted on InfoQ. Visit InfoQ

Apache Kafka has reached a significant milestone with the release of version 4.0, a major update that introduces a host of new features and improvements, most notably the default operation in KRaft mode, which, according to Confluent’s documentation, eliminates the dependency on Apache ZooKeeper..

For over a decade, ZooKeeper has served as the backbone of Kafka, and the community has expressed gratitude for its contributions. However, the move to KRaft by default in Kafka 4.0 streamlines deployment and management by removing the need to maintain a separate ZooKeeper ensemble.

(Source: Confluent documentation)

Lalit Moharana, an AWS Community Builder, posted on LinkedIn:

ZooKeeper is stepping aside as Apache Kafka adopts KRaft with the upcoming Kafka 4.0 release, marking the end of a 14-year partnership. This shift simplifies Kafka’s architecture by ditching the separate ZooKeeper system, boosting scalability, and paving the way for a self-sufficient future – all thanks to KRaft’s Raft protocol magic.

In addition:

Why the Change? ZooKeeper’s overhead and limits (think 100,000+ partitions) couldn’t keep up with Kafka’s growth. And:

KRaft Benefits: One system, millions of partitions, faster recovery – Kafka’s ready to soar!

Beyond the architectural shift, Kafka 4.0 brings the general availability of KIP-848, which introduces a next-generation consumer group protocol. This new protocol is designed to dramatically improve rebalance performance, reducing downtime and latency for consumer groups, especially in large-scale environments. By minimizing “stop-the-world” rebalances, Kafka aims to provide a more stable and responsive data streaming experience. The new protocol is enabled by default on the server side, with consumers needing to opt in by setting group.protocol=consumer.

In a Hacker News thread, a respondent commented:

One thing I immediately noticed after switching from SNS/SQS to Kafka was its speed. Messages seem to get sent/received almost immediately.

Furthermore, Kafka 4.0 offers early access to Queues for Kafka (KIP-932). This feature introduces the concept of “share groups” to enable cooperative consumption using regular Kafka topics, effectively allowing Kafka to support traditional queue semantics. While not a direct addition of a “queue” data structure, this enhancement expands Kafka’s versatility, making it suitable for a broader range of messaging use cases, particularly those requiring point-to-point messaging patterns akin to durable shared subscriptions.

In a LinkedIn post, Govindan Gopalan, an AI & Data Engineering Leader at IBM, concluded:

Early queue support (KIP-932) introduces point-to-point messaging, expanding Kafka’s use cases beyond traditional publish-subscribe workflows.

This major release marks a significant step forward in platform modernization. As part of its evolution, Kafka 4.0 has removed APIs deprecated for at least 12 months. Furthermore, it updates the minimum Java requirements, with Kafka Clients and Kafka Streams now requiring Java 11, and Kafka Brokers, Connect, and Tools requiring Java 17. This move encourages the adoption of newer Java features and aligns Kafka with more current technology stacks. The release also updates the minimum supported client and broker versions (KIP-896) and defines new baseline requirements for supported upgrade paths, as detailed in KIP-1124.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Thriving Through Change: Leading Through Uncertainty

MMS Founder
MMS Jennifer Davis

Article originally posted on InfoQ. Visit InfoQ

Transcript

Davis: I’m going to be focusing on the positive elements of this. I do not want to talk in depth about change. There has been a lot of change that has occurred over the last few years: whether it’s from COVID, whether it’s from industry changes that have led to things that cause fear and anxiety in the workplaces. There is also positive change, promotions, and all of those kinds of things, or reorgs that are negative, or can be sometimes negative, and you don’t really know what’s happening. I’m here to talk to you about thriving through change, being prepared for change. I’m Jennifer. I am an engineering manager at Google. I’m also an author. I’m a builder of communities. I value connection and making opportunities for people to connect, like we have here with all of our unconferences and our talks to inspire and inform.

Context

Just like any team, my team, we want to make more value faster. That’s not surprising. The key thing is, it’s not for one individual or one function to move faster. Within DevRel, we have to think about all the cars or whatever other thing that we’re building. We have to be part of that process. We have to think for our team, but also for the larger org, and be aware of everything that’s in progress of being built. We have to think about, so how do we maximize the value we bring? I’m in DevRel engineering. What does that mean? We build code. We write code. Our value is in code that our users find valuable. It’s not just writing code. That’s not helpful. Just like any other company, we’re not just writing code for the fun, unless we are. There’s a core value to that code that we’re writing.

We think about the things that we want to minimize, and that’s building the wrong things, spending a lot of effort writing code for products that are never going to ship. Or, having a lot of things in progress and never delivering them. Changing up our context all the time. Working on technical debt. How many people have big missions that work on technical debt? I don’t like those ones, because the existence of a software artifact means it inherently has technical debt, and unless it is valuable to my users, I don’t really care. Unless it’s security, then I have to fix it. I don’t know if you have this experience, but for our team, if we write code, people are going to copy it and paste it straight into their production, without maybe examining it or understanding the context. Because it’s Google code, of course, it’s perfect.

If it’s a security issue, I want to fix it. We also want to not repeat the same problems over and again, and not learn from those mistakes. We want to not have instances of samples where they don’t follow established practices, so it ends up causing more technical debt.

One of the things I really love about my job is that we get to hold the banner of the zeroth customer. What does that mean? We sit right there and think about, what are teams really building? How are they building? What’s the context that they’re building them in? It’s just so thrilling to be thinking about reliability, operability, sustainability, all of the abilities, and then bringing that into the world and giving it to people so that they can copy and paste it, and try it out and learn. One of the examples I’m sharing here is the Avocano solutions. Avocano is a dynamic website, like, how would you build a dynamic website using Google Cloud Services? We talk through all the different constraints that we have and what choices we were making. Is it the one way? No. There are multiple ways.

Actually, we provide different solutions on how you build a dynamic website. It’s a core solution that we think is important for people to understand. Just like solutions, and there’s more than one way to do it, everything I’m going to share, there is more than one way to team. I’ve tried to focus on some of the things that are important from my context and my team, focused on DevRel engineering. There are different components, and you may not find that my paths are your paths. I’m hoping that you can take some things from it.

I’m going to talk a little bit about change. Last year, after years of looking at our organization, we started a transformation. It’s just started. We recognized the ever-increasing challenge of what our org as DevRel was trying to do, and specifically DevRel engineering. How many different APIs, how many products launching? How can we actually deliver samples in a meaningful way? The previous org, you have dedicated product teams. This is a combination of product and engineering that would map over to a DevRel team. DevRel engineering teams had multiple products that they were responsible for. My product areas were serverless, DevOps, orchestration. Then from that, we have to figure out, what’s the highest priority? Every single one of those product teams are not talking to each other. We’re the interface. When you think about DevOps teams and all the things, that’s what we were. There were multiple DevRel engineering teams responsible for different sets of samples.

Collectively, we had virtual teams where we would share the burden of the platform management. How do we write samples in these four main languages? What do we test? What’s our infrastructure? This is a process and set of processes and tools that have evolved over 10 years. There’s a lot of cruft in it and a lot of friction. Ownership of samples all sat in the DevRel engineering team, a team that went from 100 people to 50 people, to 23 people, to 14 people. Last year, they announced a reorg, and my team became 11 people.

The mission is to build a lightweight platform that enables more stewardship of samples so that the product teams, external contributors, and all of DevRel, including tech writing, can own and drive sample contribution. Not to apply some standard SLO across all samples, because there’s different types of samples. There’s how-tos. There’s concepts. There’s API SDKs where people really need it to be perfect and it does exactly the right things, because there’s nothing like going and looking up something and it does/doesn’t work. We need a bigger set of samples, but we cannot manage it the way we have been managing it. That’s the change, or where we want to go.

As I was planning out this talk, I started with one context. As the reorg hit, I was like, I’ve got to change up how I talk about this because everything’s changing. Yet, my team has been prepared to navigate this change because of the things that I’m going to talk about. They were ready and empowered, and they have that autonomy. Other folks have talked about DORA. DORA is this research that’s been happening for over nine years now. Effectively, there’s a set of metrics and a set of capabilities. It’s a choose your adventure. What capabilities are you trying to drive up to increase the metrics that matter to your organization so that you can have a high-performing team? It’s a way to categorize and evaluate how well your team is performing. It’s not meant to say, we’re better than this team and that team. High-performing has a lot of feels to it, but the goal is to create context that help people deliver.

Embrace Functional Leadership

There are four areas I want to talk about, and the things that I have used and leveraged to make change be something that, yes, sometimes it sucks, but we have the power to enable teams to navigate change. We don’t get to control all the changes. As much as we like to have cabs and change boards limiting what happens in production, we don’t control all change. Let’s talk about functional leadership. I want to talk about leadership, actually, because really words are hard, and we use the same word to mean different things at different times. When I say leadership, you might not be thinking about it the same way, and that’s ok. What is leadership? Often, it’s defined by someone’s following you, so then you’re the leader. That’s not very valuable. Sometimes the manager is the leader, and sometimes they’re not. Some people lead by example, and some people lead by coaching, and both of these are completely valid approaches.

In complex environments, leadership is about enabling people to come together to figure out the problems and share all the context, figure out what’s ambiguous, what’s possible. We need lots of different perspectives to enable strong choices. Not right choices, but strong choices, because generally there is no right answer. There’s lots of wrong ones, but there’s not one right answer. We need to help people to understand. That’s what a leader is. Mary Parker Follett was an American management consultant who did a lot, she pioneered a lot in terms of organizational theory and organizational management. She has some really amazing writing out there. I highly encourage. It’s very under-read, but she’s quoted a lot. Her concepts on functional leadership are amazing to me. I felt like, am I inventing some new way of doing things at one point when I first became a manager?

Then I started reading her, and I’m like, no, we discovered this hundreds of years ago, or 100 years ago, and we’re just not talking about it. She said, “Leadership is not defined by the exercise of power, but by the capacity to increase the sense of power among those led”. What does that even mean? It means we need to focus on the individuals first, and we need to empower everyone to be a leader. Everyone can be a leader, but then who’s following? That’s not the point of leadership. Some people talk about situational leadership, and people stepping up into leadership as needed, and that’s part of it too. Every one of us can be a leader. I approach a new team. What do I do? The first thing is, I do nothing to change anything, because there’s already enough change happening. Figure out the roles and responsibilities that everyone already thinks they have. Understand, I’m not going to make assumptions, no matter what anyone says to me, any other manager, any other person, that person can’t do this, or that person can’t do that, because I believe in my heart, everyone can be a leader.

I find out, I’m building these relationships, what are the motivations, goals, and worries that every individual has? What is their context? What are they hoping and dreaming of? What do they want out of this? It doesn’t matter if they’re only here for the money, or if they’re altruistic and they want to improve the world.

All that matters is I understand what their context is, what their capabilities are. Maybe there’s some context that is impeding them that I can help with, and maybe it’s just something I need to be able to recognize, and not change the conversation to be something they can’t do right now. I keep building up this knowledge about them. I see them in action. I understand more, what do they bring to the workplace? What are they bringing to the team? What are the different strengths and weaknesses everybody has? Where can I have opportunities to empower them to grow, to be more? How do I see them in these different lights? I can watch over time as these things change, and I can help them connect them with the people and the opportunities that best suit them.

I need to build trust. A lot of people know, we’re going to talk about values. None of that matters. Why are we talking about values again? Everybody wants to talk about values. It’s important. Even if you’re talking to a team that hates talking about values, it’s important. It’s important to talk about what the company’s culture and those values are, but it’s important to talk about as individuals. As a manager, I am the first one to step to the plate to be vulnerable and share. The three core values I have inform how I want to be present in the world and present for my teams. Authenticity, I want to commit who I am, who I’m presenting to you, this is me. This is what I believe, and this is what I value. If I say I’m going to do something, I’m going to do it. I’m going to be kind. I am going to be generous and thoughtful. I am going to treat every interaction to the best of my ability with kindness. I expect and want that out of the people I work with and the teams that I work with.

Sometimes kindness is about giving people feedback that they don’t really know that they need it. It’s like crucial feedback. Kindness is not the same thing as niceness and not telling someone, “I know that you want to do this thing, but you know how you did it this way”, and having those conversations, because feedback is a gift. I value trust. When people talk to me, when I talk to them, I do not assume that I have your trust. I will give you trust, but I am not going to assume you’re giving it to me until you’re ready. I’m your manager, I’m not going to assume anything. When you’re ready, you can tell me, and it’s good. I value your trust. I’m going to work hard to not break that trust.

All of these things are going to build together. The conversations that you can have by talking about your values are tremendous, because everybody has different things that they hold true to themselves. Helping people be authentic to themselves means they bring their best selves and their best perspectives to build the best things.

Functional leadership is about delegation. I care that everyone can achieve this goal of being a leader if they want to. I want to foster that capability, so I’m going to identify ways that are going to make it possible. By doing this, I can respond to change. Energies ebb and flow. People need time off. I don’t want any single points of failure within my organization, where now we’re going to have a fire drill, because nobody knows this, or we’re going to go and call this person who’s on leave. No. I want everyone to be enabled and empowered. I want to be able to take time off. I don’t want to have to check my phone. I don’t want to have to check my email. I want to make that commitment to the people who report to me. I delegate leadership. I also clearly define the roles and responsibilities, because I want people to understand what decisions do they get to make, what kind of autonomy do they have. If they need help, I’m going to coach them.

Enable Healthy Conflict

A key challenge we have is in conflict. Healthy conflict is an important part of team cohesion and finding great outcomes. What is healthy conflict? It’s having an argument that fosters creativity and personal development and builds stronger bonds. If we don’t address unhealthy conflict, we can cause more pain to be borne by the people who already have so much stacked up against them. In an intrateam conflict, building healthy conflict. You can recognize when you have it, when you have open communication, people are giving constructive feedback in a timely manner, and folks are open to other people’s ideas. They’re not immediately dismissing them. It’s so crucial. I’m emphasizing this so much because I’ve dealt with these challenges, and it’s really hard to preserve trust and foster psychological safety if we don’t address when someone’s being contemptuous or when somebody is being disrespectful.

Now, interteam conflict, that may not be something you have to worry so much on. In your context, it might be something where it helps us provide team cohesion, because we can be like, “They’re over there, they’re whatever. It’s all their fault”. It helps build bonds internally, but within DevRel, it’s not effective. We cannot do that. We have to work with so many product teams, with so many other core functionalities, tech writing, engineering teams across the org, security, OSPO. We cannot be an us and them. When you’re trying to achieve larger, big-scale impacts, you are literally forcing your virtual team to not be effective because you have caused a problem by turning into an us versus them. I don’t recommend that.

The core piece of this, and I tell folks about this, is that you have to keep a lot of things in your mind that may not feel congruent, but at the core, you don’t have to like how someone leads. We might say, in short, I don’t like them. I don’t care. Let’s not be sloppy. Really, you don’t like their decisions or the way they’re approaching this. You have to respect them. As a leader, if you find that you have people on a team, and that doesn’t matter if it’s a small team, a larger team, and you have people that are showing contempt and not respecting each other, that is when you have to make corrections if they don’t solve themselves. Because you’re not going to have an effective team otherwise. Too often, we’re too afraid to say something because then aren’t you like, are you not respecting people’s opinions or decisions? No. It’s ok to disagree, but you have to commit. You have to move forward.

The first step is maybe I talk to them and say, I get it, but what can you agree on? What are the things that you can agree on? It’s ok if it’s just that you both showed up. You both care passionately because that’s what usually causes the biggest conflict is when someone is like, I care passionately about it, and it needs to be this way, and I am right, and you’re wrong, and your idea sucks. In some cultures, you have a culture where it’s ok to speak in that language, and it’s completely acceptable. You know your context. In some cultures, you have to call it out and address the concerns that are coming up from this. It’s ok. It’s ok for people to have different opinions. The way to navigate that is to have clear roles and responsibilities. I said it before, I’ll say it again, who is responsible? Yes, everyone can be a leader, but not everyone all the time. There is a clear, articulated set of roles and responsibilities that line up and set the context so people know when they need to disagree but commit.

One of the other ways I navigate this, and there might be another term for this. I just want to share this. It’s establishing a common work item vocabulary. What does this mean? We have OKRs. Then we have projects that map out to those OKRs, and have impacts and business decisions. If we’re all doing work our own way, yes, everyone gets to choose how we do the work, but how we document how we do the work needs to be consistent. That way, we can improve the way that we deliver results, because if we look at the tree of work that maps out to the OKRs and the epics or the projects, and we map this down, down to the tasks, I know at any point in time as an IC, I can go look at my tree of work and know what my work maps to. I know how my work is changing and impacting the org. I know, before I even start to do the work, what is the value of that work? Am I achieving something that matters? That’s a core part of being happy. There’s five metrics of happiness, and autonomy is one, but doing work that matters.

The org could say, we’re changing priorities, but you have the record of what you did and what you accomplished. Connect the day-to-day work to the strategic, larger-level projects. As a manager and a leader, this also does something cool for me, and that means I can look across what the team is doing and make sure that all the projects are level-appropriate, and the scope of work is appropriate. I can search our internal tools and say, what’s the tree of work? Who’s getting what opportunities? Am I being fair and equitable and empowering people to take ownership? Are people having leadership opportunities, because everything has a scope and a set objective. It also provides transparency and accountability so people can see what everybody is doing, because we’re all talking in the same language. You can see, that person hasn’t done da-da-da. They’re not doing this or that. Are they on a big project? Are they the only one working on something? You have that visibility. It decreases the opportunities for people to have conflict about things that really don’t matter. They matter, but there’s ways to navigate them.

Other questions that I think about in helping people understand where they fit in to the bigger picture are these seven questions. The opportunities to measure and the tools that you have to engage and create these visualizations will vary across the org. Every individual in your team and across your org needs to know what’s going on, what’s the state of our work? What needs attention right now? What’s urgent and important? Where is my place in this? Am I actually contributing to any objectives that matter? What is meaningful to me? How am I having that meaning at work? How do I know what good looks like? How do I know when I’m done? What’s the state of my team? How healthy is my team?

Establish Metrics that Matter

We’re talking about stuff that’s measurements. Let’s talk about metrics that matter. Back to DORA. Again, nine years of research, lots of context, lots of community, lots of people have input these things. It’s distilled down into these four key metrics of deployment frequency, and lead time, which is throughput, and stability, change failure rate, and time to restore service. Back to me, because this is about my team, that’s data. I can have that context in my mind, but I need to actually think about, what am I doing now? I got a new team. What am I supposed to measure? The first thing is, figure out what the problem is. I’m sharing some metrics. This is all open source. All of our samples are open source. There’s no secrets here. We have 12,963 samples. We have 7,288 distinct use cases. What does that mean? There’s an intent for a sample. What are you trying to show? That’s what those use cases are.

Our goal is to at least have the four core languages have the sample, and that’s Python, Node, Go, and Java. There are additional languages that we try to support. Not every engineer knows every language. There are 11,320 files. There are 118 repositories that this covers. That was just our first assessment. Then we discovered, but some samples are just in documentation. They’re not in GitHub. We’re not even looking at the full picture. What do I do with this? I started talking to the team, and you’ll see that these are similar. I literally copied and pasted this from my notes in the doc I share with the team. These are very similar to some of the DORA metrics, but this is the start. This is not the finish. Every team is going to have their own special context of how they apply metrics, and what they gather, and what they measure.

For us, thinking about system green, it’s, we tested, and it’s the whole system and set of samples, and what that state is. If we know that early, when we’re building and validating and testing, that’s less expensive than, at the end, and discovering that we’ve launched that, and now the customers are finding the problem. Because, ideally, customers don’t find your problems. Again, we’re DevRel, so we are the zeroth customer, so if we are finding the problem, that’s a symptom of a larger issue. Another measurement, time to ship. That sounds obvious. It’s like in GitHub, you shipped it. Actually, no. Again, code is only as valuable to us, not if we finish writing it, but if people are using it. It has to be in docs. The measurement is from the point of the PR, to the point that it’s merged into docs. It’s available for people to go ahead and click, copy it direct from documentation. Rollbacks are, someone submitted a sample, we validated it, it looks good.

Then we discover, no, this is terrible, we got to roll back. You wouldn’t think, how could that happen with samples? It happens more often than you think, because the context is lost with you having different people doing different things and modifying changes that you wouldn’t imagine could happen. Because if you think about the context of all the samples, we’re validating platforms, there’s a lot of little axes of constraints, whether it’s version of language, version of runtimes, different third-party packages, it gets messy. Then, release cadence, thinking about, how often are we adding to our samples catalog? All of these felt like, this is a starter point. These are metrics I can measure right now, and we can see how we’re doing. Then we can make incremental change and see how we’re adding value to the organization, while also making change and improving processes, reducing how hard it is to add samples.

I don’t get to just decide, here’s the samples, this is the samples, and here’s the metrics we’re going to measure. You have to get buy-in. You have to get your leadership to agree, yes, this is valuable. Sometimes your leadership will say, so here is the metric we’re going to measure you against. Result, 50% of technical debt. Then you say, here is actually what we’re going to measure, because I don’t care about technical debt. I do, for certain values of technical debt, but what’s the value of me updating a node version of a language, of a sample? How do I know my customers value that sample at all? How do I know that we even have the right set of samples? Those are all things I really care about. Except I’ll resolve all the security issues, I can do that.

Craft Supportive Environments

Let’s craft some supportive environments. First, take care of yourself first, because leadership is hard. We don’t talk enough about how hard it is to be an effective leader. When we’re doing it in an environment that encourages a generative culture, where we encourage people to talk and connect, and we don’t have hierarchies. We’re often dealing with a lot of emotional baggage that people are carrying, and their trauma, and it can be hard. It’s not a right or wrong situation often. We have to be ok with things being 80% good. Everyone is different and unique and special. What helps you, you need to know. I have gone through periods of really stressful work environments and had burnouts, and been too afraid to know how to speak about it or to deal with it. The things that people talk about, it’s like do this, do that. No. You have to find out what works for you.

For me, I know that I am having a problem if I’m forgetting to do a daily walk, because walking is where I process a lot. Walking is how I feel connected to my emotions, where my body is feeling. I also have gotten into fiber arts and crochet because it’s all math and creating three-dimensional objects, which is fantastic. The things that help you are going to change, depending on where you are at. Take care of you first.

Let’s talk about teams, because that’s too emotional and frou-frou, but no, it’s really important. Think about the boundaries that you’re setting. The first step is being explicit and repeating over and again. This is an example of one of the agenda descriptions of my team meeting. It tells people explicitly every time, the goal of this meeting is that we are building together shared information, open communication. We’re going to discuss things. It’s not just a status meeting. We have async standups for that. We’re not doing just status. We’re together establishing team norms. We’re going to align on a shared goal and a mission. We’re going to share feedback. We are intentionally going to share feedback. When someone shares what they’re working on, we’re going to talk about it. You’re going to have opportunities to help, and we’re going to connect together. We’re going to have team rituals. Every team should have a set of rituals. I just introduced these to my new team in the last few months. We start our meetings with music. It gives people an opportunity.

Since we’re a distributed team, we don’t have water cooler moments. We don’t have time to share little bits and pieces all the time. This gives us an opportunity for people to share, here’s a set of music I like. We kick off meetings with a team temperature check. That gives people an opportunity to express support if they need it. It’s not all rainbows and unicorns in DevRel. There are challenging times. People get to talk about how they’re feeling, and be heard, and that matters. We end the meeting with kudos. That gives everybody the opportunity to practice giving feedback and to receive feedback. Instead of waiting to just do it quarterly or once a year at the end of the year, it means that we are thinking about and expressing what was the impact of how you did something, and please do more of it like that, because I appreciated it. That’s just the job. We get to practice accepting the feedback. It’s hard sometimes.

Another ritual that we don’t talk about a lot, but that we do, is goodbyes. I don’t own my people. I am sad when they leave. The opportunity to give them appreciation, to reflect, to share this in a team setting, is the most amazing gift. We do not do this enough to say goodbye, and we wish you well. Because the industry is so small, we are going to connect with these people again and again. The opportunities are massive. Showing the people who are on the team that you valued these people who are leaving, sets the context and increases that value for themselves as well.

Also, play. Play is awesome. Games, especially role-playing games, give us the opportunity to practice where our work can very much be connected to our identity. I encourage people to not have that sense of identity tied to their work, because what you work, what you do, is different than who you are. I recognize that often, one of the key problems with conflict is that you are questioning my identity when you question my work, but we’re not actually. Role-playing helps us to build up team connectiveness. It’s ok to practice failure at communication. You can do bad things in your game and it’s ok. Everything is cool. Also, playing helps you do really awesome things with your work.

This is part of my team that is in Vegas right now at the Next Conference. This is one of the cool apps that we built. It’s a meta-inception of building a train and building an app. It’s an app that builds and connects and validates, is your app going to work based on the services that you select? The train will move or not. You can grab the code and play with it yourself if you want to take a look. Play can lead to creativity and fun and team cohesion in a way that you wouldn’t expect you can inspire people.

Minimize the human toil. Let’s maximize what the machines do. Don’t maximize what the machines do and give all the toil to the humans. That’s my explicit boundary about certain things. How we could think about ways to automate some stuff. Automate context. We use GitHub Actions. This is a great utility, a feature of GitHub, if you are on GitHub. There’s also GitLab Actions that you can leverage. We automate what PR is their context, so we can identify who would be the best person to actually take on this context. We also automate linting, because not everybody remembers to lint their code and then this gives them fast feedback, you’ve got a problem. They’re not waiting for a review to find out that there’s actually a problem before they can even get their code merged. We’re also looking at, how do we take our standards of samples and how we write samples? Because the goal of our samples is not just working code. We want to teach something. To be effective, we have to think about the concepts and how we’re applying them. We’re working towards, how could we potentially automate this process too, and check to make sure that samples follow our guidelines?

We want to encourage cross-team projects, because cross-team projects is another way to facilitate leadership opportunities and grow people. Also, when you get teams that are made up of different specialists, you end up achieving really amazing things. One of the key solutions that I shared, the Avocano solution, my team built all the code, and we wrote the Neyer’s tutorial. We influenced the writing for the script for the demo that was made for the video. We worked with all these different teams that alone, if we tried to do it, it wouldn’t have been as nice.

Now we have this nice, polished solution that can meet people where they’re at, depending where they are in their learning journey. That’s enabled and empowered by cross-team projects. Include training and educations intentionally in planning. I always slice off the top 20%, and I say, this is going to be some kind of training, education. I factor it in in different ways. One of those ways is friction logging. Friction logging is a way to evaluate and be the customer zero, and determine, is this a good experience? If I didn’t know anything, or if I had this context, what would I experience? We provide that feedback to the product teams. We share and talk through it together, so it’s not that in isolation, that I’m just seeing what I know. Now I have a shared context. Here’s what Cloud Build does. Here’s what Cloud Run does. I have that together with my team. We write down what decisions we made and why. Sometimes we work on really cool things, and we then get prioritized over somewhere else.

These decision records actually help us set down context, so we don’t have to keep working on something. We can come back to it. For example, Emblem is this multi-product app that shows how to do something, that now there’s a new feature in Cloud Run with Cloud Deploy that we could refactor. We have our decision records that would allow us to do this, and tell us why we chose what we did at the time.

Have a show and tell. This is also a learning experience. It does not need to be something that’s like, here’s my polished demo, and now I’m going to present to everybody. It does give people the opportunity to practice presenting, but, more importantly, it gives people the opportunity to practice sharing what they’re doing. What did I learn today? Your team may have policies around open-source contributions, so before you encourage your team to do open source, make sure you check your OSPO policies and make sure it’s ok. One of the things about working in open source, for example, the Kubernetes projects are amazing at this, is it sets context for how to work with other teams. It empowers people and provides opportunities to do leadership, where there’s dynamics and there’s feelings, there’s a lot of feelings and personalities involved, but it’s outside of your job. It’s a separate context. It helps and supports people. I encourage people to contribute to open source. All of the continuous learning, it matters to you as well.

Conferences like QCon give us the opportunity to connect with one another and talk about all these different problems we’re facing. I’ve had so many amazing conversations that I just want to write about because I’m so inspired. One thing I want to share that I have gotten so much immense value is Ruth Malan’s technical leadership trainings. These are not, here, let me tell you how to do your job. Instead, it is a conversation. “That sounds terrible. I already talked to enough people. Why would I want to do that?” It’s leaders across the industry and different industries at different levels, CEOs, CTOs. You get this opportunity to talk to people and connect with people that are doing different problems, but sometimes the same problems, and see things from a different perspective. Ruth provides context and a common language, but it’s not a, let me tell you how to do this. It’s a conversation. I also recommend Lara Hogan’s management and leadership training.

A lot of her things you can get, is like, a play on your own time. She provides lots of examples of how to have some hard conversations. I would like to encourage folks to fill out the DORA survey, because your experience and how you solve problems matters. By filling out the survey, you help us validate and continue to evolve, what are the ways that we increase software performance? How can we measure? How can we improve?

Recap

I’ve talked about quite a few things. Embracing functional leadership. Everyone can be empowered to be a leader. Enable healthy conflict, and watch for patterns of contempt or dismissiveness that can impact how your team performs. Establish metrics that matter to you and to your team and your org. Craft those supportive environments that are going to build and nurture and create sustainable paths where humans are doing valuable, impactful work that’s not burning them out.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Top 9 AI News and Stock Ratings Today – Insider Monkey

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Artificial intelligence is the greatest investment opportunity of our lifetime. The time to invest in groundbreaking AI is now, and this stock is a steal!

My #1 AI stock pick delivered solid gains since the beginning of 2025 while popular AI stocks like NVDA and AVGO lost around 25%.

The numbers speak for themselves: while giants of the AI world bleed, our AI pick delivers, showcasing the power of our research and the immense opportunity waiting to be seized.

The whispers are turning into roars.

Artificial intelligence isn’t science fiction anymore.

It’s the revolution reshaping every industry on the planet.

From driverless cars to medical breakthroughs, AI is on the cusp of a global explosion, and savvy investors stand to reap the rewards.

Here’s why this is the prime moment to jump on the AI bandwagon:

Exponential Growth on the Horizon: Forget linear growth – AI is poised for a hockey stick trajectory.

Imagine every sector, from healthcare to finance, infused with superhuman intelligence.

We’re talking disease prediction, hyper-personalized marketing, and automated logistics that streamline everything.

This isn’t a maybe – it’s an inevitability.

Early investors will be the ones positioned to ride the wave of this technological tsunami.

Ground Floor Opportunity: Remember the early days of the internet?

Those who saw the potential of tech giants back then are sitting pretty today.

AI is at a similar inflection point.

We’re not talking about established players – we’re talking about nimble startups with groundbreaking ideas and the potential to become the next Google or Amazon.

This is your chance to get in before the rockets take off!

Disruption is the New Name of the Game: Let’s face it, complacency breeds stagnation.

AI is the ultimate disruptor, and it’s shaking the foundations of traditional industries.

The companies that embrace AI will thrive, while the dinosaurs clinging to outdated methods will be left in the dust.

As an investor, you want to be on the side of the winners, and AI is the winning ticket.

The Talent Pool is Overflowing: The world’s brightest minds are flocking to AI.

From computer scientists to mathematicians, the next generation of innovators is pouring its energy into this field.

This influx of talent guarantees a constant stream of groundbreaking ideas and rapid advancements.

By investing in AI, you’re essentially backing the future.

The future is powered by artificial intelligence, and the time to invest is NOW.

Don’t be a spectator in this technological revolution.

Dive into the AI gold rush and watch your portfolio soar alongside the brightest minds of our generation.

This isn’t just about making money – it’s about being part of the future.

So, buckle up and get ready for the ride of your investment life!

Act Now and Unlock a Potential 10,000% Return: This AI Stock is a Diamond in the Rough (But Our Help is Key!)

The AI revolution is upon us, and savvy investors stand to make a fortune.

But with so many choices, how do you find the hidden gem – the company poised for explosive growth?

That’s where our expertise comes in.

We’ve got the answer, but there’s a twist…

Imagine an AI company so groundbreaking, so far ahead of the curve, that even if its stock price quadrupled today, it would still be considered ridiculously cheap.

That’s the potential you’re looking at. This isn’t just about a decent return – we’re talking about a 10,000% gain over the next decade!

Our research team has identified a hidden gem – an AI company with cutting-edge technology, massive potential, and a current stock price that screams opportunity.

This company boasts the most advanced technology in the AI sector, putting them leagues ahead of competitors.

It’s like having a race car on a go-kart track.

They have a strong possibility of cornering entire markets, becoming the undisputed leader in their field.

Here’s the catch (it’s a good one): To uncover this sleeping giant, you’ll need our exclusive intel.

We want to make sure none of our valued readers miss out on this groundbreaking opportunity!

That’s why we’re slashing the price of our Premium Readership Newsletter by a whopping 70%.

For a ridiculously low price of just $29.99, you can unlock a year’s worth of in-depth investment research and exclusive insights – that’s less than a single restaurant meal!

Here’s why this is a deal you can’t afford to pass up:

• Access to our Detailed Report on this Game-Changing AI Stock: Our in-depth report dives deep into our #1 AI stock’s groundbreaking technology and massive growth potential.

• 11 New Issues of Our Premium Readership Newsletter: You will also receive 11 new issues and at least one new stock pick per month from our monthly newsletter’s portfolio over the next 12 months. These stocks are handpicked by our research director, Dr. Inan Dogan.

• One free upcoming issue of our 70+ page Quarterly Newsletter: A value of $149

• Bonus Reports: Premium access to members-only fund manager video interviews

• Ad-Free Browsing: Enjoy a year of investment research free from distracting banner and pop-up ads, allowing you to focus on uncovering the next big opportunity.

• 30-Day Money-Back Guarantee:  If you’re not absolutely satisfied with our service, we’ll provide a full refund within 30 days, no questions asked.

Space is Limited! Only 1000 spots are available for this exclusive offer. Don’t let this chance slip away – subscribe to our Premium Readership Newsletter today and unlock the potential for a life-changing investment.

Here’s what to do next:

1. Head over to our website and subscribe to our Premium Readership Newsletter for just $29.99.

2. Enjoy a year of ad-free browsing, exclusive access to our in-depth report on the revolutionary AI company, and the upcoming issues of our Premium Readership Newsletter over the next 12 months.

3. Sit back, relax, and know that you’re backed by our ironclad 30-day money-back guarantee.

Don’t miss out on this incredible opportunity! Subscribe now and take control of your AI investment future!

No worries about auto-renewals! Our 30-Day Money-Back Guarantee applies whether you’re joining us for the first time or renewing your subscription a year later!

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


How Meta is Using a New Metric for Developers: Diff Authoring Time

MMS Founder
MMS Craig Risi

Article originally posted on InfoQ. Visit InfoQ

Tracking developer productivity metrics is essential for understanding and improving the efficiency of software development workflows. In fast-paced engineering environments, small inefficiencies can accumulate, impacting overall delivery timelines and code quality. By leveraging precise metrics, organizations can identify bottlenecks, assess the impact of new tools, and make data-driven decisions to enhance developer experience. 

Now we can add another new metric to help track the development process better: Diff Authoring Time (DAT). DAT is a new metric developed by engineers at Meta to measure the duration required for developers to submit changes, known as “diffs,” to the codebase, which they shared in a recent Meta Tech Podcast. By tracking the time from the initiation of a code change to its submission, DAT offers insights into the efficiency of the development process and helps identify areas for improvement.

Implementing DAT involves integrating a privacy-aware telemetry system with version control systems, integrated development environments (IDEs), and operating systems. This setup allows for the precise measurement of the time developers spend authoring code changes without compromising privacy. The data collected through DAT enables Meta to conduct rigorous experiments aimed at enhancing developer productivity. ​

For instance, DAT has been instrumental in evaluating the impact of introducing a type-safe mocking framework in Hack, leading to a 14% improvement in authoring time. Additionally, the development of automatic memoization in the React compiler resulted in a 33% improvement, and efforts to promote code sharing have saved thousands of DAT hours annually, achieving over a 50% improvement. ​

The significance of DAT lies in its ability to provide a precise yet comprehensive measure of development productivity, facilitating data-driven decisions to enhance engineering efficiency. By aligning internal development workflows with an experiment-driven culture, DAT supports continuous improvement in software engineering practices. ​

As highlighted in the Meta Tech Podcast, engineers Sarita and Moritz discuss the challenges of measuring productivity, the implementation of DAT, and the new capabilities it unlocks for developers. Their insights underscore the importance of accurate productivity metrics in fostering an environment of continuous improvement within Meta’s engineering teams. ​

In summary, Diff Authoring Time serves as a tool for Meta to assess and enhance developer productivity, enabling the company to make informed decisions that streamline workflows and improve the overall efficiency of its engineering processes.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Unleashing Llama’s Potential: CPU-based Fine-tuning

MMS Founder
MMS Anil Rajput Rema Hariharan

Article originally posted on InfoQ. Visit InfoQ

Transcript

Rajput: I come from the hardware background, and we want to optimize. We run benchmarks, and you know the benchmarks are limited use. I wanted to always understand what customers are doing, what are their environments running. One of the things at QCon, I found at the time that the hottest topic in 2018 and 2019 frame, number one is Java. I think 70% to 80% of attendees were Java enterprise and they were solving related problems. The hottest topics were underneath CPUs, most of the deployment. That’s my experience.

Then, suddenly COVID happened. At the time I used to be Intel, and from Intel I changed to AMD. I’m joining back QCon, and that is what has happened since then. CPU has gone tiny and everything is GPU. The whole conference here is no longer Java, on the products, but it’s all about LLMs. I’m also trying to change with that. The interesting topic we bring you is not exactly the same, but the LLMs in particular, Llama running on the CPU. Hold your thing for the GPU part, but we plan to talk about CPUs.

How many folks are aware of the CPU architecture? The reason I wanted to check because many of the optimizations and discussions we want to do here is software-hardware synchronization. Because when we talk about the performance and we hear from many customers that, we were on-prem or we are going into the cloud, and their goals are, I want to save 10%, 20% of TCO, or I want to actually reduce the latency, or I want to do other things. Sure, there are a lot in the architecture of the application, but you can have actually significant performance improvements just from understanding underneath hardware and leveraging it to the best. We want to show you a particular example, how you can actually be aware of the underneath hardware that would be deployed, and design your thing, or architect. It would have both roles, you would have the deployments because some of the decisions are at the deployment time, and others even when you’re writing the code or application.

Hardware Focused Platform Features

We’re not talking about the GPU, and CPU plus GPU interactions, because those analyses become quite different, and how the data is flowing between them. It’s mostly a Llama model or that area being deployed on a CPU platform. Let me share a couple of components, and I’ll talk about each one of them when we take CPUs, cores, simultaneous multi-threading, or in Intel’s case we call Hyper-Threading, the different kind of caches. In caches particularly, I’ll show later a chiplet architecture which is unified. That’s a big change happening in the last three, four years of the deployment.

Of course, the memory when we talk memory, then there are two things, memory capacity and memory bandwidth. Memory latency too, but you don’t have to worry that much about that part. It’s usually memory capacity and bandwidth. Let me start with the CPU side. I wanted to show you one CPU and the two CPUs. The reason it’s good to know about those parts, is because many platforms have the two CPUs, we call them dual socket, and it becomes like two NUMA nodes. Unless you have to cross the application on both the CPUs, that means a cross-socket communication, you want to avoid that. The only time you need that large database, and it needs to have the memory capacity which is needed from both sockets, then it is good.

Otherwise, you want to keep each process and memory local. If it is a 1P platform, you don’t need to worry, but if it is 2P platform, you want to be a little bit aware. I/O side also becomes more interesting that is the I/O device sitting on which CPU, that’s usually much better to keep your process and thing on that socket than the other one. We are not going into the I/O area. If also you work on the I/O disk or network cards, they’re usually on one of the sockets, and how that processing comes are separate talks, actually. Just be aware, is the platform you’re using a 1P or 2P?

Let me go into a little more detail that when you talk about the CPU, typically, you would see a lot of cores, and within a core, you would have L1, L2 caches, and then L3 cache would be underneath. Then of course, you have the memory and I/O or NIC cards. That is a typical design. When we talk about the core, the core you would have SMT thread, L1 cache, L2 cache. This is the only place on the top, the SMT thread which is something you need to think. Do you need to worry about the SMT thread part?

The only part you need to think about it, it just gives you twice the number of threads. When you are designing from the application side, the thread pool size, if it is going to be, let’s say N core system and SMT on, then you want to make sure your thread pool or setting are not hardcoded, they can check how many vCPUs are available. Then you set accordingly. Because I have seen that kind of mistake where people hardcode, we found even in MongoDB or those kinds of things where they hardcode the core at 32 or 16. Suddenly you see when you try to deploy them, a scale-up model, they don’t scale and then you find that the programmer actually hardcoded it.

Now, let me show you, because I talked about the chiplet architecture, what is the difference between a unified L3 and a chiplet architecture. Most of the Intel systems, Intel Xeon, when their GNR is coming next, GNR will be the first chiplet type. Anything before has been unified, you see all the cores sitting within that socket, same L3. On the chiplet side, we try to create a group of cores associated to L3 in each one. That is the chiplet architecture. It has many benefits at the hardware level on the yield, but it has a benefit on the software side. Let me show you what benefit it gives you. I like to show it this way, a little more clarity on the chiplet. One of the benefits you would see that L3 associated to each chiplet, it cannot consume the whole memory bandwidth. On a unified L3 given few cores, if you run a noisy application, it could actually consume the whole memory bandwidth and the rest of the program may not get much bandwidth left, and poor latency suddenly when the noisy neighbor comes in.

One of the benefit of the chiplet architecture is that each chiplet cannot consume the full memory bandwidth. As an example, one platform, if it has 400 Gbps memory bandwidth, one chiplet can’t do more than 40 to 60. Number one, it protects you from the noisy neighbor scenario. Number two, we have the setting in the bus where you have NPS, NUMA Per Socket partitioning. You can set up to four. In this case, it is actually dividing your memory channel into four categories. Most of the cloud are running with NPS1 because they don’t want to manage the schedule of the memory. When you are running in your on-prem, you could actually create four clear NUMA nodes, and it will give you the memory bandwidth.

One of the benefits you would see in this scenario is that, let’s say you wanted to deploy an application on two chiplets, so two L3s, and then you have another application in another NUMA node. What you would see is that they are not colliding on the same memory bandwidth from the channels, and each of the new applications get their own L3. You are able to deploy an application where you have a consistent performance and they’re not consuming each other’s memory or memory bandwidth or the L3 clashing. These are the benefits. The unified L3 does give the benefit where if you need to exchange the data among the L3 or application, that part is faster. Other than that, most of the chiplet architecture benefits outweigh that kind of bandwidth. That’s the reasoning you will see the GNR also going a very similar path of the chiplet. As these caches are increasing, the number of cores are increasing, you cannot have just everything on one unified. It’s just a limitation in the architecture, by the time you have a huge number of cores.

Focus: Software and Synchronization (AI Landscape and LLMs)

Rema will talk with regards to LLMs or the Llamas, what is the role of the SMT, simultaneous multi-threading, what do you need to think about to leverage it best. Same thing on the core that when you have a lot of cores increasing, how do you want to leverage it or what you need to be aware of. Caches play a very important role, and I can tell you from the EDA tools or other areas that you could have 20%, 30% improvement. Because when you’re dividing your problem, just like in the LLM and other space too, if you fit it into the cache versus not fitting, the performance improvements and differences are not 4% or 5%, they are like 20%, 30% as soon as you start fitting in versus moving out. That part for a particular architecture, the more you are high frequency trading or tools or many response time sensitive, throughput time sensitive, you have profiling tools where you can see, is it fitting in or how much is missing? What do I need to adjust? Those kind of specific optimizations from different tools. That part is very important.

Then the memory capacity and the bandwidth part. Because LLMs, as Rema will show you later, when it is compute bound, when it is memory bandwidth bound, and what kind of decision can you do, and actually memory capacity on that extent. I just wanted to give you these details first because Rema will extensively use, this is my analysis and I’m memory bandwidth bound here, and I’m trying to fit in the cache or other area of the chiplet, in her talk. Just wanted to give you a quick idea that even though LLMs, if you look at the bigger picture, it is actually a pretty tiny piece and very complex piece just for the reference. I’m sure you are all aware where the LLM and ChatGPT fits, and another part also that with regard to the timelines, this part is exponentially increasing and changing. We are in exciting times from this change.

Llama

Hariharan: Let’s Llama. Excited to be talking about this Llama who is right now sitting on top of the Andes and everything. Let’s talk about the real thing that we are all interested in. Why Llama in the first place? Why are we not talking about other models? There are so many GPT models that we are all aware of. One of the main reasons to talk about Llama and why we use it for benchmarking and workload analysis is it is a small model. Particularly when we are talking about running it on a CPU, this is one of the smaller models. It comes in multiple sizes, but we still have the small model available.

Relative to other GPT models, this is the smaller one. Not only that, I think we are all aware of this, that Llama was actually trained on publicly available data and it’s fully open source. That’s something that protects us in whatever usage we have. A lot of our customers like to train it for their own select areas. That keeps them more protected because it was trained on publicly available data. There’s nothing to worry about in terms of lawsuits and stuff. This is why I think it’s a small model, open source, and it can be trained.

Let’s look at what our Llama does in action. How does it actually function here? There are two phases of Llama: the prefill phase and the decoding phase. What happens in the prefill phase? Prefill phase, you type something, and whatever you type, basically the model is loaded, and your input data, which is in this case ‘Computer science is’ is what you’re typing, and it is going to predict what’s the next word or next token. The whole model is loaded and the whole model actually works on everything that you have typed. In this case it’s just three words, but you could be sending a whole book there. You could be typing a whole book and putting it as the input data. It processes the entire data you have submitted and then it produces the very first token. Now it need not be one token, it could be a probability distribution over a bunch of tokens. Without loss of generality, let’s just say it’s one token that comes out of it. This portion of the work is extremely compute intensive. We call this the prefill phase. That’s the first phase of it.

The second phase is the decoding phase. What happens is it takes the previous token, and the KV cache that was built and whatever else was created there, and creates the next token, and then the next token, and the next token, and so on, until it reaches the end of sentence. This is the decoding phase. Because you are actually loading the model over and again every time. I said the model is small, but it’s still not small enough to fit into our caches. You really have to pull the model from the memory and load it multiple times, portions of it every time and so on. When you’re doing that there’s a lot of memory bandwidth involved in the second phase. This decoding phase is highly memory bandwidth intensive.

Basically, what happens in the prefill phase is tokenization, embedding, and encoding, and everything. The decode phase, you are actually going to iterate through the whole thing over and again, either deterministically or probabilistically. Like I said, each token produced is actually a probabilistic distribution over a bunch of tokens. You can set configuration parameters where you are actually just going to be greedy and pick the most probable one and move forward. That’s the fast way to do it, but there are ways to do that. Without loss of generality, let’s just say that these are the two stages of the model and we’re taking one token at a time and moving forward.

Now let’s look at the Llama internals. When I say internals, you’re driving a car. You want to look at the inside of the car. You have the engine, you have the transmission, all these things. If you have to make sure that your car is functioning well, you have to make sure you have a good engine, make sure you have a good transmission, and all those things. Let’s look at the internals here. What we see is when Llama is running, there’s going to be a lot of matrix multiplication that happens. Matrix multiplication is one of the key operations that happens here. Dot product is another one. One is cross product, dot product. Scaling and softmax computation. Weighted sum. Last but not the least, passing through multiple layers and aggregation of everything. These are the primitives that go into the Llama internals.

In order to get the best performance, what we really need to do is to optimize these primitives. Not just optimize these primitives, that happens through all the BLAS and MKL libraries that have been written. Also, optimize it for a given hardware platform. Now, the optimization, some of it is common optimization. Other part is specific to a particular hardware. That’s about something we have to be aware of, what is the latest thing that optimizes these? What are the latest libraries that optimize these for a particular hardware that you are running it on?

Next, let’s talk about metrics. Metrics is clearly something that comes from the user. A user decides what the metrics are, what is important to them. Let’s look at it. I’m showing you pretty much the same diagram that we showed before, but slightly differently placed. What happens in any LLM, not just Llama, you give the input. You are giving the input, and then if you have typed in ChatGPT, which all of us do, we’re waiting for some time, especially if the input is long. It goes blink, blink, blink. I don’t know whether my words are lost in the ether space or what happened. That is the time that the initial prefill phase is working on it. Then at the end of it comes the first token. Something is happening there, and I’m happy.

Then after that, all the tokens follow. Sometimes you can see when the output is large, as you are trying to read, there’s more that’s coming. It doesn’t all appear in one shot. All these tokens are coming slowly. Token is not exactly a word, but tokens are usually converted to words. I won’t go into the details of that at this point. What are the metrics here? The time it took from the time I gave my input to when I started seeing something appearing on my screen. That’s the time to first token. I’m iterating it just for completeness of the talk. Then all these tokens appear all the way to the end. The total latency is something that’s very important. I care whether I got my entire response or not. Inverse of that is the throughput. One divided by latency is the throughput, basically. Throughput is something that’s super important for all of us. We are ready to wait a little bit longer if we are really asking it to write an essay. If I’m just asking a yes or no question, you better be quick. I’m going to pretty much say the same things.

Basically, throughput is a real performance. Throughput actually marks how the system is being used. TTFT is something that’s super important. It is a result of that compute intensive phase. The TTFT is something that can be manipulated quite a bit by giving specialized hardware many times. For example, AMX is used as well. That definitely helps reducing the TTFT. GPUs also reduce TTFT. Throughput, on the other hand, is mostly controlled by the memory bandwidth, because the model is loaded over and again. The larger your output gets as it is pumping through more tokens, the throughput is controlled by the memory bandwidth.

Deployment Models – How Are Llamas Deployed?

Now let’s talk about the deployment models. I just thought it is important to show that our CPUs fit in whether you’re using GPUs or just CPUs. Smaller models particularly, can be run very well on street CPUs. Even when you’re running on GPUs, a CPU is involved. A CPU is connected to the GPUs. That typically suits larger models and allows for mixed precision and better parallelism and all that. Talking about GPUs, you can see deployments like these as well. They basically have a network of GPUs. What is important in this case is how these GPUs are connected. GPUs have to be connected through an NVLink or an InfiniBand. They need fast connections between all these GPUs and the CPU and GPU as well.

Typically, CPU connects to a GPU through the PCIe. It can get even more complex. You can have not just a network of GPUs, but even the inputs can be fed in a different way. You can have audio input, video input, and they’re processed differently, fed into the model, and then there are layers that are being handled by different sets of GPUs. You can make this as complex as you want to. We’re not going to get into all these complexities. Life is difficult even if you take a simple case. Let’s stick to it.

Llama Parameters

Let’s get into the details. First, let’s get familiar with some of the jargon that we use. They’re not exactly jargon, we’re all familiar with it, but let’s get them on the board here. The main three parameters that we’ll talk about are input tokens, output tokens, batch size. Those are three things that you will hear whenever you look at any benchmark publication or workload details corresponding to not just Llama, but in general, all LLMs. Input tokens, clearly that’s what you’re typing on the screen. That’s the input you’re providing. A paragraph, a quick set of prompts, whatever. Output tokens is what it produces. Tokens, again, are not the words that you see, but tokens are related to the words that you see. What is a batch size? Batch size can go all the way from one to whatever that you want. What does it look like here? Basically, if you give just one thing at a time, you give one prompt, wait for the response, then give the next prompt. You can say, tell me a story. Llama tells you a story.

Then, tell me a scary story. It tells you another one, and so on. You can give multiple prompts at the same time, like him, he’s showing an example of batch size equal to 4 there. All of them will be processed together. What can also happen is that some prompts are done earlier than the others. The parallelism doesn’t stay the same throughout. They are also working on things like dynamic batch sizes and so on. We’ll not get into all those details right now here. We’ll keep it simple. We’ll assume that a batch size of 4 is given, and then batch output of 4 is produced, and then next 4, and so on, if I say batch size equal to 4. What is a Llama instance? That’s the first thing I put there. Llama instance is your Llama program that you’re running. You can run multiple instances of these on the same system. You don’t have to run just one. You can run multiple instances. We will talk about how things scale and so on. Each instance is an instantiation of the Llama program.

Selecting the Right Software Frameworks

Let’s talk about selecting the right software frameworks. First, everything started with PyTorch. That’s the base framework that we started with. It has good support. It’s got a good community support, but except it doesn’t have anything special for any particular hardware or anything like that. It’s not optimized. It’s a baseline you can consider. Then came TPP that was created by Intel. TPP was done by Intel. A lot of what you see in TPP is optimized more for Intel. Given that it is optimized for Intel, it actually works pretty well on AMD as well. We also get a good gain going from the baseline to going to using TPP.

Then came IPEX. IPEX actually incorporates TPP right within. IPEX was built on top of TPP. That, again, was done by Intel. Also, benefits Intel a little bit more than it benefits AMD. Last but not the least is our favorite thing, which is ZenDNN. How many of you have used ZenDNN? It was recently released. The thing about ZenDNN is it builds on top of what is already there, obviously. It was recently released. It gives a good boost to the performance when you run it on AMD hardware, particularly. Let’s look at some numbers since I said this is better than this and so on.

As you can see, if I mark the baseline as equal to 1, so what I’m plotting here is various software optimizations: baseline, TPP, IPEX, and then finally Zen. You can see going from baseline to TPP is more than a factor of 2. These are all performance based on our hardware only. There are no competitive benchmarking numbers or anything presented in this talk. Going from baseline to IPEX, you get nearly a 3x. It’s a lot more than that. Then with Zen, you get even more of a boost above where IPEX is. One thing I have to say that in Zen, the advantage that you get will keep increasing as the batch size increases. It is actually optimized to benefit from higher core counts that we have, number one.

Secondly, the large L3 caches that we have as well. There’s a lot of code refactoring that went into it, and there’s a lot of optimization that has gone into it, but the benefits actually increase. As you can see with two batch sizes that I’ve shown, 1 and 16, it shows a 10% advantage over IPEX when you start, and then it goes to a little higher, and then I know I’ve not added those graphs, but the benefit actually does increase as the batch size increases.

Hardware Features and How They Affect Performance Metrics

Let’s come to the core of this talk. Here you have various hardware features, and the question is, how are they going to affect my performance metrics? How do we optimize things in order to use all these things optimally? Let’s first talk about cores. In this graph, what I’m showing here is how Llama scales. I’m using only one instance of Llama, size 16, size 32, 64, and 128. You can see I’ve gone eight times in size from the leftmost to the rightmost. That’s a factor of 8 involved, but the amount of gain that I’ve got is less than 50%. The software does not scale. There are multiple reasons to it. We can get into that.

Basically, the performance that you get with size 16 seems to be mostly good enough. Maybe I should run multiple of 16s rather than run a large instantiation of the same thing. What I plotted earlier, what I showed in the previous graph was basically throughput. The throughput doesn’t scale a whole lot, and that’s the main thing that as somebody who’s trying to use the system and getting the most out of it, I’m interested in throughput. The user is also interested in the TTFT. The TTFT does benefit, not a whole lot, but there is some benefit when you make the instance larger. Not a whole lot. As you can see, going from 16 to 128, it dropped by about 20% or something. Not a whole lot. That parallelism and that CPU capacity that you’re throwing in does benefit in terms of performance.

Moral of the story, additional cores do offer only incremental value. TTFT also benefits. The reason these two have to be taken together is there could be possibly a requirement on the TTFT when you’re working with a customer to say, I want my first token to appear within so many milliseconds or seconds, whatever it may be. You have to bear that in mind when you’re trying to say, can I make the instance really small and have a whole lot of them? There is another thing to it as well, which is, what happens here is when you have too many instances, each instance is going to consume memory. I’ll come to that later. You may not have that much memory to deal with it.

Next, let’s talk about SMT, the symmetric multi-threading. You have a core and then you have a sibling core. On each core, there are two of them that are operating. Are you going to get benefit from using the SMT core? Let’s take a look. The blue lines here show you what’s a performance improvement. I’m only plotting the improvement. I’m not giving you raw numbers, nothing. When you’re running a single instance, so this is a single instance of size 16 that I’m running. What have I done there? There’s nothing else running on the system. Remember my CPU has 128 cores. These are all run on our Turing. We have 128 cores, but I’m using only 16 of them. That means the background is very quiet. Nobody else is using the memory bandwidth. Memory bandwidth wise, we are not constrained at all.

The only constraint that’s coming here is from the core itself. It’s CPU bound. What happens there, you are getting a good boost by using the SMT thread. With and without SMT thread, if you actually run it twice, you can see that you get a good boost. The orange line on the other hand is the kind of boost that you will see when you’re actually running everything. You’re running all the 16, 16, 16, you’re running all of them together. What happens then, you’re actually going to be constrained by the memory bandwidth. Your memory bandwidth becomes a constraint. Really, there is no advantage or disadvantage. As you can see, the percentage is in single digits. The statistical variation is what you’re seeing there, nothing else.

Moral of the story again, SMT does not hurt, even in the fully loaded case, but it is going to give you a lot of benefit if the background is going to be quiet. Particularly that’s important because, let’s say you’re running on a cloud, on AWS or one of these things, you don’t assume everybody is running a Llama. You take an instance of size 16 and you’re running there. Maybe most likely that everybody else is quiet or doing very little thing. You will get that benefit, so use your SMT there.

This is the most important thing. You will see a big difference here, memory bandwidth. What is the role of memory bandwidth? What I did was, on a Turing system, we have a bandwidth of 6,000 megatransfers per second. I clocked it down to 4,800, 20% reduction in the bandwidth. When I did that, the question was, how much are we going to affect the overall performance? Remember I told you that the prefill phase is affected mostly by the CPU and the decoding phase is affected by the memory bandwidth. When you do that, that’s a substantial difference. Here is the opposite role. There are two things that I plotted here.

The first thing is a single instance, the dark brown one. When I just run a single instance, that means it doesn’t matter my memory bandwidth, whether it’s 6,000 or 4,800, I have plenty, for a single instance that’s running. When I run all the instances, that is when you can see the memory bandwidth really hitting you hard and it gets affected very badly. Basically, moral of the story here is that, use as much bandwidth as you can get. If the cloud is going to constrain you for the amount of bandwidth that you’re going to use, it’s worth paying for that if you have to pay for extra bandwidth. I know it can be manipulated how much bandwidth each instance gets.

Next let’s talk about role of caches here. I don’t have a graph here, but I can talk through this. Caching is important. Remember we are constrained on memory bandwidth. If we can get the data from caches, it’s better. However, you cannot fit the whole model into cache. What really happens is your model gets loaded over and again. Just by nature of this particular workload, it’s a use and throw model. That means you’re going to load the weights, use it to compute something, and that’s it. You’re not going to reuse it. The only way you can reuse it is if you use a higher batch size. If you’re going to process 64 of them together, so all 64 will be using the same type of weights in order to do the computation.

Otherwise, if you’re just using a batch size equal to 1, it is a use and throw model. You can actually see a very large L3 miss rate. Using a higher batch size is crucial in order to increase the caches. Earlier I talked about what happens as you scale a particular instance. I’m going to talk about what happens when you change the number of instances, how you use more instances, how does your performance scale. As you can see, now the two bars that you see there, the blue one, that’s what I’m using as a basis, is running just a single 16-core instance. The orange ones are running 16 of the 16-core instances. I’m running 16 of them in parallel.

If everything was ideal and everything you would have had, the height of the orange bar would have been 16. It would have been 16 times performant, but no, there are other constraints that come into play. Your memory bandwidth is a big constraint. You don’t get 16x, but you get nearly 10x, 12x performance overall compared to the single instance where there is no memory bandwidth constraint and you’re running all of them together. This is for different situations. Chat is where you have short input, short output. Essay is where short input, long output. Summary is very long and short. Translation is both long. In all the cases, you’re getting at least a 10x improvement against the baseline that we’re looking at. Basically, running parallel instances is the way to go, pretty much.

I talked about batches a lot. I said use higher batch sizes. What is our return? Throughput-wise, look at the return that we are getting. As the batch size increases, going from 1 to 128, I got more than 128x of performance. The reason is I’m getting much higher L3 hit rates. I’m getting things from the cache instead of going to the memory. I’ve reduced my latency. I’ve made my CPU much more performant here. I’m using my cores a lot more. I’m getting more than 128x return when I’m using 128 batch size. You don’t get anything for free. The place where it hurts is the TTFT. Your TTFT actually goes up also. That’s not a good thing. That makes sense.

If I’m working on 20 projects at the same time, everybody is going to be complaining. All my customers are going to be complaining that I’m not giving them the solution, but yes, I’m working day and night. That’s what matters to us as users of computers. We want to use our machines day and night and get the maximum throughput. After one month, I think everybody will get the answer when I’m working on 20 projects, but next day, probably not. That’s what we are seeing. That comes at a cost of TTFT. The TTFT does grow as well as you are running more in parallel.

These are various things that I already said. Do not use larger instances, if you can afford it. Use more instances. In order to harvest performance, use larger batch sizes. Also, the whole thing is going to be a balancing act for the most part between TTFT and overall memory needs. I said memory needs now, and I want to get into it immediately now after this. TTFT is a requirement that’s going to be placed on you by a customer. Memory needs is something that is going to grow as the number of parallel instances increases and as the batch sizes increases as well. Some formulae out there: fundamentally, I want to say this, the memory need comes from three different factors. Number one, the model itself. The larger the model, the more you’re going to load. If you have an 8 billion model, so each one is going to take 2 bytes. The next thing that is going to add to the memory is activations.

Last but not the least, the KV cache as well. The KV cache keeps growing as you’re building it as well, as you’re processing more. The total requirement comes from all three of them together. Basically, if you’re having multiple instances, in this case, let’s say I computed my need for memory as 41 gigabytes, so 41 times 32, if I start 32 instances in parallel. I said 32 instances, because typically on AMD systems, we like to keep an instance on one thing called the CCD or the Core Complex Die, which has 8 cores. We have 32 of them, so 8 times 32 is our whole system. That comes to 1.3 terabytes, which is really close to the total amount of memory I have on the system, 1.5 terabytes. This is when, when you have too many instances, you start seeing swapping. This calculation is something that we urge you to do in order to get a good idea of how to get the maximum out of the system. You don’t want swapping. Swapping is not a good thing.

I know I did a back-of-the-envelope calculation there, and did that. That calculation, most of the times is slightly under. It depends on which framework you are using. With ZenDNN, it’s pretty close by, as you can see, in most cases. In the case of IPEX, it was using even more memory. I have seen this go the other way as well, not for Llama, for some of the other use cases that we have run, where ZenDNN will take more memory and so on. I’m not making a statement on this here at all. The point is you have to be aware of that this is only a back-of-the-envelope calculation, but you have to look at what your framework is using.

If you’re using ZenDNN or IPEX, whichever it is, just see how much total memory that your instance is going to use. This is again one more thing that I want to say, free floating versus dedicated. Please pin your instances. This is probably the worst case I have shown, happened at least in one case. Doesn’t always happen, but when you pin it, each one is going to run on a different set of cores, and you’re going to get the returns proportionately. When you don’t pin, there is no telling. Pretty much all of them will run on the same bunch of cores, or they will be context switching back and forth. Either way, you pay the penalty. This is the worst case that I’ve shown. It’s not going to be as bad as this, but it can be as bad as this.

Summary

Recommendations for optimization: you know that the initial part of the run is core bound, and the second part is memory bound, so use more memory bandwidth if you can get it. Parallelism helps. Use the best software that you can use for your hardware. For Zen, definitely we recommend zentorch. These things will evolve with time as well, but this is where you have to do your due diligence and homework and identify what is the best software that fits your case. Pin instances as much as possible.

Questions and Answers

Participant 1: How are you capturing some of these metrics? What specific metrics and what tools are you using for the metric calculation of observability?

Hariharan: We know how many tokens we are sending. Typically, when we run Llama, we know our input, output tokens. The output tokens is the total number of tokens that are produced by the model, and we know how long it took to run. That’s what we use to compute the throughput. Again, for TTFT, what we do is, for any particular input token size, we set the output token equal to 1, and run it and estimate what the TTFT is going to be. Typically, we know that that is something that the user is actually simply waiting for.

Participant 1: How about CPU and other physical metrics, especially on cloud providers? Are you using hardware counters? How are you measuring swapping?

Hariharan: I have not run it on the cloud yet. Running it on bare metal, we have our regular tools to measure the utilization and also the counters and everything. We have our own software, and general-purpose software as well.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Podcast: Balancing Coupling in Software Design with Vlad Khononov

MMS Founder
MMS Vlad Khononov

Article originally posted on InfoQ. Visit InfoQ

Transcript

Thomas Betts: Hello and welcome to another episode of the InfoQ Podcast. Today I’m joined by Vlad Khononov. Vlad is a software engineer with extensive industry experience working for companies large and small in roles ranging from webmaster to chief architect. His core areas of expertise include software architecture, distributed systems, and domain-driven design. He’s a consultant, trainer, speaker, and the author of Learning Domain-Driven Design. But today we’re going to be talking about the ideas in Vlad’s latest book, Balancing Coupling in Software Design. Vlad, welcome to the InfoQ Podcast.

Vlad Khononov: Hey Thomas. Thank you so much for having me.

Balance coupling is the goal, not no coupling [01:07]

Thomas Betts: So the title of your book, Balancing Coupling, and I think a lot of architects and engineers are familiar with the idea of wanting low coupling, we want to have our systems loosely coupled. But as your book points out, that’s really an oversimplification that we don’t want to have no coupling, we need to have a balanced coupling. So can you explain why that’s an oversimplified idea to say, we just want loose coupling everywhere?

Vlad Khononov: Yes. So by the way, a loose coupling is okay. What I’m really afraid of is people saying, let’s decouple things. Let’s have completely independent components in our system, which is problematic because if you ask yourself, what is a system? What makes a system? Then the answer is a system is a set of components working together to achieve some overarching goal. Now, in order to achieve that goal, it’s not enough to have those components, they have to work together. Those interactions is what makes the value of that whole system greater than the sum of its components, sum of its parts. And those interactions is what we usually call coupling. If you look that word up in a dictionary, coupled means connected.

So to make the system work, we need coupling. Now, of course, too much of a good thing is going to be bad. We need water, any living organism that we know of on this planet needs water to survive. However, if you’re going to drink too much water, well guess what’s going to happen? Nothing good is going to happen. Same with coupling. We cannot eliminate it because just as in the case of water, you’re not going to survive, a system is not going to survive. So we need to find that “just right” amount of coupling that will make the system alive. It will allow it to achieve that overarching goal.

Thomas Betts: I like the idea of if we add too much water, maybe that’s how we get to the big ball of mud, that everything is completely connected. And we can’t see where there should be good separations between those couples, you can’t see the modules that should be there that make the system understandable. And I know that’s part of it is, we want to get to small enough modules that we can understand and work with and evolve over time without having to handle the entire big ball of mud, if you will.

If the outcome can only be discovered by action and observation, it indicates a complex system [03:35]

Thomas Betts: So that coupling itself, that’s not the problem. The problem really is the complexity. And I think people sometimes correlate the two that if I have a highly coupled system that everything’s talking to each other that’s causing the complexity. Can you distinguish where coupling and complexity are not always the same thing, one isn’t always the bad?

Vlad Khononov: Yes. That’s a great point. And the thing is, when we are designing the system, we need to find that “just right” amount of coupling to make it work. And if you go overboard, as you said, we’ll end up with that monster that we usually call “big ball of mud”. And that pretty much describes what we are afraid of, complexity. I guess anyone with a few years of experience in software engineering has that experience of working on a big ball of mud project that maybe it works, but nobody has the courage to modify it because you don’t know what’s going to happen following that change. Whether it’s going to break now or it’s going to break a week later after it was deployed to production. And what is going to break? And that relationship between an action and its outcome is my preferred way of describing complexity.

If you’re working on a system and you want to do something, and you know exactly what’s going to happen, that’s not complexity. If you can ask someone, and some other external expert knows what’s going to happen, that’s not complexity either. However, if the only way to find out the outcome of the thing you want to do is to do it and then observe what happens, then you’re dealing with a system that is complex, and that means that the design of that system makes those interactions much harder than we as people can fathom. We have our cognitive limits, our cognitive abilities, if you look at studies, they’re not looking good by the way. And it means that the design of that system exceeds our cognitive abilities, it’s hard for us to understand what’s going on there. Of course, it has something to do with coupling. However, it’s not because of coupling, but because of misdesigned coupling.

Thomas Betts: Yes. And then I think your book talks about the idea of sharing too much knowledge, that coupling is where knowledge is being transferred. And so the idea of cognitive load being exceeded, the knowledge that I have to have in order to troubleshoot this bug is, I have to understand everything. Well, I can’t understand everything and remember it all, so I’m just going to try and recreate it. And in order for me to try and recreate it, I have to have the full integration stack, right? I have to have everything running, be able to debug all the way through. And the flip side of that is somebody wants to be able to have that experience because they’re used to having the big monolith, the big ball of mud. They’re like, “I don’t understand it, so I’m going to just see what happens”.

Once they’re working in microservices, then they get to, “Well, I can’t actually step through the code once I send the request to the other call, how do I know what happens?” How do you help get people into that mindset of you’re making it better, but it’s a different shift of the paradigm that you can’t just run everything, but the benefit is you don’t have to know about it once it goes past that boundary.

Three dimensions of coupling [07:23]

Vlad Khononov: Yes. And that’s the thing about coupling, we are way too used to oversimplify it. As, Hey, coupling is bad. Let’s eliminate all the coupling, that’s how we get modular software systems. However, if you look what happens when you connect any two components, when you couple any two components in a system. What happens beneath the surface? Then you’ll see that coupling is not that simple, it’s not uni-dimension. Actually, it manifests itself in three dimensions. As you mentioned, first of all, we have that knowledge sharing. You have two components working together. How are they going to work together? How are they going to communicate with each other? How are they going to understand each other? They need to exchange to share that knowledge.

Then we have the dimension of distance. If you have two objects in the same file, then the distance between the source code of the two objects is short. However, if those two objects belong to different microservices, then you have different code bases, different projects, different repositories, maybe even different teams. Suddenly the distance grows much bigger. Why is that important? Well, the longer the distance that is traveled by the knowledge, the sooner it’ll cause that cognitive overload. And we’ll say, “Hey, that’s complexity. We need to decouple things”. So distance is a very important factor when designing coupling.

And the third dimension is a dimension of time, of volatility because oh, why do we care? We want to be able to change the system. We wanted to change its components, their behavior. Maybe we will modify existing functionalities, maybe we’ll add new ones. For that, we want to make sure that the coupling is just right. However, if that is not going to happen, maybe because the component is a part of a legacy system, or maybe the business is not interested in investing any effort in that specific area, then the effect of coupling is going to be much lower. So we better prioritize our efforts on other parts with higher volatilities.

Distance and knowledge sharing are intertwined [09:49]

Thomas Betts: So I want to talk about that distance part first. I think that’s a new way of thinking of the problem because I think we can relate to, I’m going to separate this into microservices and that’ll solve my problem. And if you go back to the combination of how much knowledge is being shared, and how far away it is. Well, if I have all the code in my monolith, then the distance between the code is pretty low, right? I can change all the code all at once, but that also leads to a lot of complexity because I might not be able to easily see what code I need to change because there’s too much of it.

Now, if I take it into the microservices approach, I can say, I only need to change this. There’s only so much code to look at, I can understand it. But if I say, if I make a change here, I also need to make a change in this upstream or downstream service, that they have to know that I’m making a change. Then you’re saying that, that’s where the knowledge comes in, the knowledge being shared is tightly coupled. Is that a good explanation of what you’re trying to say?

Vlad Khononov: Yes, yes. That’s where complexity gets complex. Essentially, we have two types of complexities when working on any system. First, let’s say that you’re working on one of its components, and it is a small big ball of mud, let’s call it a small ball of mud. Then we could say that the local complexity of that component is high. We don’t understand how it works, and if you want to change something, we don’t know what’s going to happen. Now, there is another type of complexity and that’s global complexity, and this one is about the interactions on a higher level of abstraction. Say we have our component and other components of that system, and they’re integrated in a way that makes it hard to predict how changing one of the components is going to be, whether it’s going to require simultaneous changes in other components. So that’s global complexity.

The difference between the two as you mentioned is distance. And way back when the microservices hype just started, people wanted to decouple things by increasing the distance because previously we had all the knowledge concentrated in a monolith, let’s call it the old-school monolith. Everything in one physical boundary. Now, back then decoupling involved extracting functionalities into microservices, so we increased the distance. However, way too many projects focused just on that, on increasing the distance. They were not focused enough on, “Hey, what is that knowledge that is going to travel that increased distance?” And that’s how many companies ended up transforming their old-school monoliths into new shiny distributed monoliths. So they kind of traded local complexity into global complexity.

Coupling is only a problem if a component is volatile [13:04]

Thomas Betts: And that only becomes a problem when that third element, that third dimension of volatility rears its head. Because as long as those two things don’t change, the fact that they share knowledge over a long distance shouldn’t matter. But if one of those has to make a change and it has to affect the other one, now you’ve got the distributed ball of mud problem, that everything in two different services has to change. You actually made the problem worse by going to microservices. So that’s where all three factors have to be considered, correct?

Vlad Khononov: Yes, exactly. And that’s funny because all those companies that tried doing that, of course, they didn’t decompose their whole systems on the very first day of that microservice endeavor. No, they start with a small proof of concept, and that proof concept is successful. So they said, “Hey, let’s go on. Let’s proceed and apply the same decomposition logic everywhere else”. Now, the difference is that POC is usually down on something that is not business critical, its volatility is low. So you are kind of safe introducing complexity there. So the mistake was taking those less business critical components, extracting them, and thinking that they will achieve the same result with other components of the system. And of course, there once you step into that distributed big ball of mud area, well, suddenly microservices became evil and people started praising monoliths.

Thomas Betts: Right. We didn’t understand what we were doing, we didn’t understand why we were trying to accomplish it. We thought the problem was “everything’s too close, we’ll solve it by just moving it apart”. But if you don’t factor in, how is the knowledge changing? How is the volatility affected? Because yes, that first one might work, it doesn’t matter if they’re close together in one monolith or separate. If there’s no volatility, if things aren’t changing, it doesn’t matter where it lives.

But once you get to, this is something that we’re going to be making changes to really quickly. Because that was the other thing that people said, if we go to microservices, we can make changes really quickly, and then they maybe make even more changes faster, but they run into all these issues that separate teams in separate modules and separate microservices are trying to change things all at once, and then they lead back to, we have to still have all this communication, or we have this major integration step that just you weren’t ready for it because you did the thing wrong. When you make them move to microservices, you have to consider all three factors. What is changing? And if I know it’s going to change, what do I do differently then? Because obviously we still want to break those things up, but how do I say this is going to be a volatile module, it’s going to have core business, it’s going to be evolving. What’s the solution then? Because I want to be able to change it.

Distance affects where code lives as well as the lifecycle to maintain related components [16:22]

Vlad Khononov: Yes. That dimension of space distance is very tricky, and what makes it even trickier is that it has, let’s call it sub dimensions. So first we have that physical distance between source codes. The greater that distance gets, the harder it is going to be to modify the two components simultaneously. So that’s one thing. We have another force that works in the opposing direction, and that’s lifecycle coupling. The closer things are, the more related their life cycles. So they will be developed, tested, deployed together. If you have components implemented in the same physical boundary, for example.

As you go toward the other end, then you are reducing those lifecycle dependencies. And then we have social technical factors, those two components are implemented by the same team, or do we have to coordinate the change with multiple teams? And suddenly the distance can grow even larger, and the lifecycle coupling will be reduced even further. So distance is super important, but as you mentioned, what makes it all, let’s call it painful, is that knowledge that is going to travel that distance.

Thomas Betts: Right. So if I know that this thing is going to be changing, in some ways those changes affect the knowledge that is being shared, right? If I’m adding new features and functionality, that means there’s more knowledge in this module. And if I have to communicate those changes, that’s the challenge. So is the trade-off of, I’m going to have more volatility in this module, I have to reduce the knowledge that’s being shared, reduce that integration strength of how tightly those two things are coupled. Is that a matter of defining good API batteries, for example?

Vlad Khononov: Yes. So we have to manage that knowledge that we are sharing across the boundaries, we have to make it explicit. Now, the thing about knowledge is, as you said, the more knowledge we’re sharing, the more cascading changes will follow because the more knowledge we share, the harder chances that the piece of that shared knowledge will change, and then we’ll have to communicate that change to the other component, to the coupled component.

Four levels for measuring coupling [19:10]

Vlad Khononov: Now, how do we evaluate knowledge? What units should be used to measure knowledge? That’s a tricky question. It’s tricky, and I’m not sure we have an answer for that. However, what we do have is a methodology from the ’70s called structure design. And in it there was a model for measuring or for evaluating interdependencies between components of a system called module coupling. That model had six levels, they were focused around the needs of systems that were written in those days. But essentially these levels describe different types of knowledge that can be exchanged across boundaries of components.

In my model, in the balanced coupling model, I adapted module coupling and changed its name to integration strength. I had to change its name because the levels of the model are completely different because again, they have to be accessible to people working on modern systems. I reduced the levels to four basic types of knowledge to make it easier to remember them. And if you need finer-grained details, then you can use a different model from a different era called connascence to measure the degrees of those types of knowledge.

Intrusive coupling [20:47]

Vlad Khononov: So the basic four types of knowledge are from highest to lowest. First of all is intrusive coupling. Say you have a component with a public interface that should be used for integration, however, you say, “Okay, that’s fine. I have a better way. I will go to your database directly, pick whatever I need, maybe modify it”. In other words, intrusive coupling is all about using private interfaces for integration.

Once you introduce that dependency on private interfaces, you basically have a dependency on implementation details. So any change can potentially break the integration. So with intrusive coupling, you have to assume that all knowledge is shared.

Thomas Betts: Right. That’s the classic, if you have a microservice, you own your own database. And no one else is allowed to go there, they have to go through this boundary. And I like that you’re calling back to, these are papers written 50 years ago. And no one was talking about microservices there, no one was talking about having several databases, but it’s still the same idea; if I can structure this so that in order for this to go through, it has to go through this module. That’s why C++ evolved to have object-oriented design to say, “I have this class and it has behavior, and here’s public and private data”. And that’s what you’re talking about, if you can just get all the way through, there’s no point in having that public versus private interface.

Vlad Khononov: Yes. Yes. It’s funny, if you look at one of the books from that period, one that I particularly like is called Composite/Structure Design by Glenford Myers. And if you ignore the publishing date, it sounds like he is talking about the problems we’re facing today. It’s crazy. It’s crazy.

Thomas Betts: What’s the next level after that intrusive coupling?

Functional coupling [22:45]

Vlad Khononov: Yes. So after intrusive coupling, we have functional coupling. And here we’re sharing the knowledge of functional requirements. We’re shifting from how the component is implemented, to what that component implements, what is that business functionality? Again, that’s quite a high amount of knowledge that is shared by this type because if you share that kind of knowledge, then probably any change in the business requirements is going to affect both of the coupled components, so they will change together.

Model coupling [23:22]

Vlad Khononov: Next, we have model coupling, which means we have two components that are using the same model of the business domain. Now, DDD people will get it right away. But the idea is when we are developing a software system, we cannot encode all the knowledge about its business domain, it’s not possible. If you are building a medical system, you’re not going to become a doctor, right? Instead, what we are doing is we’re building a model of that business domain that focuses only on the areas that are relevant for that actual system. Now, once you have two components based on the same model, then if you have an insight into that business domain and you want to improve your model, then guess what? Both of them will have to change simultaneously. So that’s model coupling.

Contract coupling [24:17]

And the lowest level is contract coupling. Here we have an integration contract, you can think about it as a model of a model that encapsulates all other types of knowledge. It doesn’t let any knowledge of the implementation model outside of the boundary, that means you can evolve it without affecting the integration contract. You’re not letting any knowledge of functional requirements across the boundaries, and of course, you want to protect your implementation details.

Examples of the four types of coupling [24:51]

Thomas Betts: Right. So just to echo that back. If you’re talking about, you said DDD people will get this right away. If I have a new invoice coming in that I want to pay, maybe I have an expense management system where somebody says, “Here’s a new thing to pay, I’m going to submit it to the expense management system”, and it has to go through an approval process to say, yes, it’s approved. Then all the way at the end we have our accounts payable person who’s going to log in and say, “Oh, I need to go pay this invoice, I have to pay the vendor”, right? There’s an invoice that flows all the way through the system, but if you say, “I need to know how is it going to get paid at the end, all the accounting details upfront”, it’s tightly coupled.

If you think about it from who’s doing the work, you might have the invoice request that starts in expense management, and then the paid invoice. And those ideas of, I have one model, but the words sound the same, but ubiquitous language says in this domain, that’s what this means. And I work on accounting systems, so the invoice, whether you’re in accounts payable or accounts receivable, we both have invoices, but they’re exactly the opposite. Am I going to pay someone or is someone going to pay me? And so ubiquitous language helps us reduce the cognitive load because I know in this space, I’m only talking about this part of the workflow because it’s satisfying this person, this role, they’re doing their job.

And so that’s going to the levels of coupling you’re talking about. The contract coupling says, I’m going to hand off from here, to the next, to the next, and I don’t have to know what’s going to happen a week from now with this because once it exceeds my boundary, I’m done with it. And the intrusive coupling is, they’re all editing the same database record and everybody knows about all the details. And somewhere above that is, I have to know that there’s this next workflow of pay the invoice versus submit the invoice, and everybody knows about those things. Is that a good example of how to see those different layers in there?

Vlad Khononov: Yes, absolutely. Absolutely. There are so many creative ways to introduce intrusive coupling. There are such interesting death-defying stunts we can pull. For example, maybe you’re introducing, not a dependency, but you rely on some undocumented behavior, that’s intrusive coupling. Maybe you’re working in, let’s say an object-oriented code base and a component that you are interacting with returns you an array or a list of objects, and then you can go ahead and modify it. And because it’s reference type, it’s going to affect the internals of that component. So that’s another creative example of intrusive coupling. By the way, a reader of the book sent it to me. And I was like, “Oh, why haven’t I thought about it when I was writing the book? It’s such a great example”.

Modularity is the opposite of complexity [28:01]

Thomas Betts: Yes. Well, I think what you’re describing is, that’s the difference between the local and the global complexity, right? We think about these as microservices, I’m going to separate big modules out. But the same problems occur within our code base because even if you’re working in a monolith, you can structure… This is where the book talked about modular monoliths. You can set up your code, so even if it’s stored in one repository, you can make it easier to understand. And that gets to, this class doesn’t have to know about the 900 other classes that are in the project, I only know about the 10 that are close to me.

Vlad Khononov: Yes. Exactly. And by the way, it brings us back to the topic of complexity, or rather the opposite of complexity. So if complexity is, if we’re going to define it as the relationship between an action and its outcome, then modularity is the opposite. It’s a very strong relationship between an action and its outcome. So if we want to design a modular system, we want to be able to know what we have to change, that’s one thing. And the second thing is, once we make the change, what’s going to happen? That I would say is the idea of modularity.

Modular monoliths can reduce complexity [29:19]

Vlad Khononov: Now, how can we do it? How can we achieve what you described? Let’s say that you have a monolith that can be a big ball of mud, but it also can be a modular monolith. If the thing is, the core ideas are the same. You can increase the distance, you don’t have to step across its physical boundary. You can introduce distance in the form of modules within that monolith. You can put related things together because let’s say you have one boundary with lots of unrelated things. And how can we define unrelated things? Things that are not sharing knowledge between them.

So if they’re located close to each other, then it will increase the cognitive load to find what we have to change there, right? So we can reduce the cognitive load by grouping related things, those components that have to share a knowledge in logical groups, logical modules. And that’s how we can achieve modular monoliths, which is by the way, in my opinion, the first step towards decomposing a system into microservices because it’s way easier to fix a mistake once you are in the same physical boundary.

Thomas Betts: Right. You’re keeping the distance a little bit closer, you’re separating it logically into separate namespaces, different directory structures, but you’re not making a network call, right?

Vlad Khononov: Exactly.

Thomas Betts: That’s definitely increasing the distance. You’re not necessarily handing over to another team. You might be, but maybe it is still the same team just saying, “Hey, I want to be able to think about this problem right now, and I don’t want to have to think about these other problems”, and so let me just split the code. But that causes you as an architect designing this to say, “What makes sense? What do I move around? Where am I having the problem understanding it because there’s too much going on, there’s too much local complexity? And let’s look for that and figure out how do I increase the distance a little bit so that the knowledge that’s being shared stays within the things that are close”. And you start looking for, have I introduced distance while not reducing the knowledge, right? That’s what you’re trying to do, is have the knowledge transfer go down that integration strength when you’re adding distance, right?

If shared knowledge is appropriately high, then balance it with distance [31:45]

Vlad Khononov: Yes. Yes, absolutely. We always want to reduce integration strength; we want to always minimize the knowledge. But if you’re familiar with the business domain, you kind of know that, hey, here, I need to use the same model of the business domain, here we have closely related business functionalities. So it doesn’t matter if you want to reduce it to the minimum, you can’t. You have to remain on that level of, let’s say for example, functional coupling. Once you observe that level of knowledge being shared, then you have to take it into consideration, and balance it with another dimension, which is distance. Don’t spread those things apart because otherwise that’s going to be cognitive load, and as a result, complexity.

Thomas Betts: Right. And again, this is where the volatility comes into place. So if I’m focused on, let’s go from our big ball of mud to having a more organized modular monolith. Then I can look at, oh, where are we seeing lots of changes? Where’s the business evolving a lot and where is it not? And so I can now focus on, if we’re going to pull one service out, because let’s say we actually have scaling needs, we need to make sure that this part of the system can grow up to 10 times the size, but the rest of it, we don’t need to scale up as big. Those types of things you can look at, well, what’s volatile? And then if you pull it out of that monolith, you say, “I’m adding the distance, have I reduced the knowledge to a safer coupling level?” I haven’t kept that high integration strength, that you still know about my private methods and how to call my database even though I pulled you out because you haven’t actually done anything to solve the volatility problem, right?

Evaluating volatility requires understanding the business domain [33:35]

Vlad Khononov: And volatility, initially it sounds like something simple, the simplest dimension of the three. Oh my god, it’s not. It’s tricky because to truly predict the rate of changes in a component, it’s not enough to look at maybe your experience, or at the source code because there are things we can differentiate between, essential volatility and accidental volatility or accidental in-volatility. Accidental volatility can be because of, or design of the system, things are changing just because that’s the way the system is designed. And accidental in-volatility can happen. Let’s say that you have an area of the system that the business wants to optimize, but it is designed in such a way that people are afraid to touch it. And the business is afraid to touch it, to modify it as well as a result. So to truly, truly evaluate volatility, you have to understand the business domain. You have to analyze the business strategy, what differentiates that system from its competitors. Again, DDD people are thinking about core subdomains right now.

Thomas Betts: Yes.

Vlad Khononov: And once you identify those areas based on their strategic value to the company, then you can really start thinking about the volatility levels desired by the business.

Thomas Betts: You mentioned things happen internal and external, so the business might have, we want to pursue this new business venture, or this was an MVP, and the MVP has taken off, we want to make sure it’s a product we can sell to more people, but we need to make changes to it. So there are business drivers that can change the code, but there’s also internal things. Like I just need to make sure my code is on the latest version of whatever so that it’s not sitting there getting obsolete, and hasn’t gotten security patches or whatever. So some of those, the system’s just going to evolve over time because you need to keep, even the legacy code, you need to keep up to date to some standards. And then there’s the, no, we want to make big changes because the business is asking us to, right? So the architect has to factor in all of those things, as well as I think you mentioned the socio-technical aspects, right? Who is going to do the work? All of this comes into play, it’s not always just one simple solution. You can’t just go to loose coupling, right?

Balancing the three dimensions of coupling [36:13]

Vlad Khononov: Yes. It’s complicated. I’m not going to say that it’s complex, but it’s complicated. But the good news is that once you truly understand the dynamics of system design, it doesn’t really matter what level of abstraction you’re working on. The underlying rules are going to be the same, whether it’s methods within an object or microservices in a distributed system, the underlying ideas are the same. If you have a large amount of knowledge being shared, balance it by minimizing the distance. If you’re not sharing much knowledge, you can increase the distance. So it’s one of the two, either knowledge is high and the distance is low, or vice distance is high but knowledge is low. Or, or things are not going to change, which is volatility is low, which can balance those two altogether.

Thomas Betts: Right. So if you just looked at strength and distance, how much knowledge is being shared over too long? That looks bad. But if it’s never going to change, you don’t care. If it does change, then it’s not balanced. On the flip side, if it’s going to change a lot, then you need to think about the relationship between the integration strength and the distance. So if there’s not much knowledge being shared over a long distance, that’s okay, or if there’s a lot of knowledge shared over a small distance, that’s okay. So you can have one but not both, if things are changing. But if things aren’t changing, you don’t care.

Vlad Khononov: Yes. And of course, things are not changing today, maybe something is going to change on the business side tomorrow. And as an architect you have to be aware of that change and its implications on the design. The classical example here is, I am integrating a legacy system, nobody is going to change it, and I can just go ahead and grab whatever I need from its database, that’s fine. Another classic example is, again, DDD influence, some functionality that is not business critical, but you have to implement it, which is usually in DDD lexicon is called a supporting subdomain. Usually they’re going to be much less water than core subdomains. However, business strategy might change, and suddenly that core subdomain will evolve into a core one. Suddenly there is that big strategy change that should be reflected in the design of the system. So it’s three dimensions working together, and whether it will end up with modularity or complexity depends on how you’re balancing those forces.

Thomas Betts: Right. And I think you got to the last point I wanted to get to is, we can design this for today based on what we know, but six months, six years from now, those things might shift not because of things we can predict right now. And if you try and design for that future state, you’re always going to make some mistakes, but you want to set yourself up for success. So do the small things first. Like if it is reorganize your code so it’s a little easier to understand, that seems like a benefit, but don’t jump to, I have to have all microservices.

And I liked how you talked about how this can be applied at the system level, or the component level, or the code level. I think you described this as the fractal approach of, no matter how much you keep looking at it, the same problem exists at all these different layers of the system. So that coupling and balance is something you have to look at, at different parts of your system either inside a microservice at the entire system level, and what are you trying to solve for at different times, right?

Vlad Khononov: Yes. And that’s by the way, why I’m saying that if you pick up a book from the ’70s, like that book I mentioned, Composite/Structured Design, it looks way too familiar. The problems that they’re facing, the problems they’re describing, the solutions they’re applying are also going to be quite familiar once you step over those terms that are used there because those terms are based on languages like FORTRAN and COBOL. Yes, you need some time, some cognitive effort to understand what they mean. But yes, the underlying ideas are the same, it’s just a different level of abstraction that was popular back then. Not popular, that’s all they had back then.

Wrapping up [40:57]

Thomas Betts: So if you’ll want to follow up with you or want to learn more about your balanced coupling model, any recommendations of where they can go next?

Vlad Khononov: Yes. So on social media aspect, I am the most active on LinkedIn at the moment. I have accounts on other social networks like Bluesky, Twitter, et cetera. Right now LinkedIn is my preferred network. At the moment I’m working on a website called Coupling.dev, so if you’re listening to this, I hope that it is already live and you can go there and learn some stuff about coupling.

Thomas Betts: Well, Vlad Khononov, I want to thank you again for being on the InfoQ Podcast.

Vlad Khononov: Thank you so much, Thomas. It’s an honor and a pleasure being here.

Thomas Betts: And listeners, we hope you’ll join us again soon for a future episode.

Mentioned:

About the Author

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.