Month: November 2024
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
Article originally posted on mongodb google news. Visit mongodb google news
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
Healthcare of Ontario Pension Plan Trust Fund bought a new position in MongoDB, Inc. (NASDAQ:MDB – Free Report) in the third quarter, according to the company in its most recent Form 13F filing with the SEC. The fund bought 94,824 shares of the company’s stock, valued at approximately $25,636,000. Healthcare of Ontario Pension Plan Trust Fund owned about 0.13% of MongoDB as of its most recent filing with the SEC.
A number of other hedge funds have also bought and sold shares of MDB. Jennison Associates LLC increased its position in shares of MongoDB by 23.6% during the 3rd quarter. Jennison Associates LLC now owns 3,102,024 shares of the company’s stock valued at $838,632,000 after purchasing an additional 592,038 shares during the last quarter. Swedbank AB increased its position in shares of MongoDB by 156.3% during the 2nd quarter. Swedbank AB now owns 656,993 shares of the company’s stock valued at $164,222,000 after purchasing an additional 400,705 shares during the last quarter. Westfield Capital Management Co. LP increased its position in shares of MongoDB by 1.5% during the 3rd quarter. Westfield Capital Management Co. LP now owns 496,248 shares of the company’s stock valued at $134,161,000 after purchasing an additional 7,526 shares during the last quarter. Thrivent Financial for Lutherans increased its position in shares of MongoDB by 1,098.1% during the 2nd quarter. Thrivent Financial for Lutherans now owns 424,402 shares of the company’s stock valued at $106,084,000 after purchasing an additional 388,979 shares during the last quarter. Finally, Blair William & Co. IL increased its holdings in MongoDB by 16.4% in the 2nd quarter. Blair William & Co. IL now owns 315,830 shares of the company’s stock worth $78,945,000 after acquiring an additional 44,608 shares in the last quarter. Hedge funds and other institutional investors own 89.29% of the company’s stock.
Analyst Upgrades and Downgrades
Several equities research analysts recently issued reports on MDB shares. Citigroup increased their target price on shares of MongoDB from $350.00 to $400.00 and gave the stock a “buy” rating in a report on Tuesday, September 3rd. Truist Financial increased their target price on shares of MongoDB from $300.00 to $320.00 and gave the stock a “buy” rating in a report on Friday, August 30th. DA Davidson increased their target price on shares of MongoDB from $330.00 to $340.00 and gave the stock a “buy” rating in a report on Friday, October 11th. Wells Fargo & Company increased their target price on shares of MongoDB from $300.00 to $350.00 and gave the stock an “overweight” rating in a report on Friday, August 30th. Finally, Mizuho increased their target price on shares of MongoDB from $250.00 to $275.00 and gave the stock a “neutral” rating in a report on Friday, August 30th. One analyst has rated the stock with a sell rating, five have issued a hold rating, nineteen have issued a buy rating and one has assigned a strong buy rating to the company. Based on data from MarketBeat.com, the stock currently has an average rating of “Moderate Buy” and an average price target of $340.29.
Read Our Latest Stock Report on MongoDB
MongoDB Price Performance
MDB traded up $0.32 during trading on Friday, reaching $324.92. The company had a trading volume of 49,481 shares, compared to its average volume of 1,448,312. The business has a 50 day simple moving average of $282.65 and a 200-day simple moving average of $272.02. MongoDB, Inc. has a 52 week low of $212.74 and a 52 week high of $509.62. The company has a quick ratio of 5.03, a current ratio of 5.03 and a debt-to-equity ratio of 0.84.
MongoDB (NASDAQ:MDB – Get Free Report) last issued its quarterly earnings results on Thursday, August 29th. The company reported $0.70 earnings per share (EPS) for the quarter, topping analysts’ consensus estimates of $0.49 by $0.21. The company had revenue of $478.11 million during the quarter, compared to the consensus estimate of $465.03 million. MongoDB had a negative return on equity of 15.06% and a negative net margin of 12.08%. MongoDB’s revenue was up 12.8% on a year-over-year basis. During the same period last year, the company earned ($0.63) earnings per share. As a group, research analysts predict that MongoDB, Inc. will post -2.37 earnings per share for the current year.
Insider Buying and Selling
In other MongoDB news, CRO Cedric Pech sold 302 shares of the stock in a transaction on Wednesday, October 2nd. The stock was sold at an average price of $256.25, for a total value of $77,387.50. Following the sale, the executive now owns 33,440 shares in the company, valued at $8,569,000. This trade represents a 0.90 % decrease in their ownership of the stock. The transaction was disclosed in a document filed with the SEC, which is available at this hyperlink. Also, Director Dwight A. Merriman sold 2,000 shares of the stock in a transaction on Monday, November 25th. The stock was sold at an average price of $349.17, for a total value of $698,340.00. Following the completion of the sale, the director now owns 1,124,006 shares in the company, valued at approximately $392,469,175.02. The trade was a 0.18 % decrease in their position. The disclosure for this sale can be found here. Insiders sold a total of 26,600 shares of company stock worth $7,442,189 over the last three months. 3.60% of the stock is owned by insiders.
MongoDB Profile
MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.
See Also
Before you consider MongoDB, you’ll want to hear this.
MarketBeat keeps track of Wall Street’s top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis. MarketBeat has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on… and MongoDB wasn’t on the list.
While MongoDB currently has a “Moderate Buy” rating among analysts, top-rated analysts believe these five stocks are better buys.
With average gains of 150% since the start of 2023, now is the time to give these stocks a look and pump up your 2024 portfolio.
Article originally posted on mongodb google news. Visit mongodb google news
MMS • RSS
Posted on nosqlgooglealerts. Visit nosqlgooglealerts
Picture this: It’s Double 11 (Singles’ Day), China’s colossal online shopping frenzy and millions are hitting ‘buy’ simultaneously. Behind the scenes, a distributed database called OceanBase is handling the massive traffic spikeensuring every transaction proceeds without a hitch.
OceanBase is not just built to handle peak shopping traffic; it was specifically designed to overcome the limitations of traditional monolithic databases. With its distributed architecture, OceanBase ensures ultra-low-latency transactions, high throughput, and seamless scalability — essential for industries that need to handle large-scale data and real-time applications. Its ability to process millions of transactions per second and its robust fault tolerance and elastic scaling capabilities have made it a newly emerged innovative force for businesses that demand reliability even during peak loads.
From crisis to creation
OceanBase’s origin story is one of necessity and innovation. Rewind to 2013: Alipay, already a leader in the payments world, faced a scaling crisis as its monolithic Oracle database struggled to handle the rising volume of transactions.
However, the technical challenges extended beyond just the database. “The entire technical infrastructure was a massive fabric under extreme stress,” explains OceanBase.
Recognizing that throwing money at the database and hardware was not sustainable, Alipay developed a fully distributed database, OceanBase, from the ground up to resolve performance bottlenecks and scale effortlessly.
In 2019, OceanBase made history by becoming the first distributed database to top the TPC-C benchmark with a score of 60 million tpmC, outperforming Oracle on standard x86 servers. It then shattered its record in 2020 with 707 million tpmC. This marked a turning point, as it was the first time a distributed database claimed the top spot in the TPC-C benchmark, traditionally dominated by monolithic systems.
Open source, enterprise-ready
Today, OceanBase serves customers from a wide range of industries, from banks and insurers to telecom operators and retailers. Its open-source approach has been a driving force behind its growing popularity, making it accessible to developers worldwide and accelerating collaboration within the community.
“Open source is essential for expanding our global reach,” OceanBase says. Now, the open-source version of OceanBase supports technology exploration and collaboration and accelerates further innovations, while the enterprise version focuses on advanced security features, catering to the rigorous demands of financial institutions and customers from other sectors.
What sets OceanBase apart is its ability to deliver seamless, unified capabilities for mission-critical workloads, ensuring consistency and efficiency across transactional, analytical, and AI-driven operations.
Its hybrid TP and AP architecture ensures high performance for transactional and analytical workloads, eliminating the need for separate systems. With multi-model integration, OceanBase supports relational, JSON, Key Value and more, enabling businesses to handle diverse data types efficiently within a single database. Furthermore, its vector hybrid search capabilities empower AI-driven workloads, bringing generative AI and recommendation system applications into a unified database environment.
Beyond its unified design and performance, OceanBase is also secure, cost-efficient, and flexible. Security is paramount for financial institutions. OceanBase offers transparent data encryption, granular access controls, and comprehensive audit trails backed by SOC2 and PCI DSS certifications.
Additionally, OceanBase caters to a wide range of deployment scenarios, whether fully on-premises, cloud-native, or somewhere in between. This multi-cloud approach supports businesses to tailor their technical stack to meet specific needs while avoiding vendor lock-in.
Database for AI and hybrid workloads
OceanBase’s roadmap is designed to meet the growing complexity of modern data-driven workloads. Key upcoming features include enhanced Hybrid Transactional/Analytical Processing (HTAP) for real-time insights, a high-performance NoSQL KV store offering a reliable alternative to HBase, and vector hybrid search to support demanding generative AI and recommendation workloads efficiently.
As organizations increasingly rely on AI and hybrid workloads, the role of the database is more critical than ever. OceanBase is not just adapting to this shift but actively driving innovation, ensuring that businesses can seamlessly manage transactional, analytical, and AI-driven operations within a unified platform.
Image credit: iStockphoto/jullasart somdok
MMS • Aditya Kulkarni
Article originally posted on InfoQ. Visit InfoQ
Thoughtworks recently published their Technology Radar Volume 31, providing an opinionated guide to the current technology landscape.
As per the Technology Radar, Generative AI and Large Language Models (LLMs) dominate, with a focus on their responsible use in software development. AI-powered coding tools are evolving, necessitating a balance between AI assistance and human expertise.
Rust is gaining prominence in systems programming, with many new tools being written in it. WebAssembly (WASM) 1.0’s support by major browsers is opening new possibilities for cross-platform development. The report also notes rapid growth in the ecosystem of tools supporting language models, including guardrails, evaluation frameworks, and vector databases.
In the Techniques quadrant, notable items in the Adopt ring include 1% canary releases, component testing, continuous deployment, and retrieval-augmented generation (RAG). The Radar stresses the need to balance AI innovation with proven engineering practices, maintaining crucial software development techniques like unit testing and architectural fitness functions.
For Platforms, the Radar highlights tools like Databricks Unity Catalog, FastChat, and GCP Vertex AI Agent Builder in the Trial ring. It also assesses emerging platforms such as Azure AI Search, large vision model platforms such as V7, Nvidia Deepstream SDK and Roboflow, along with SpinKube. This quadrant highlights the rapid growth in tools supporting language models, including those for guardrails, evaluations, agent building, and vector databases, indicating a significant shift towards AI-centric platform development.
The Tools section underscores the importance of having a robust toolkit that combines AI capabilities with reliable software development utilities. The Radar recommends adopting Bruno, K9s, and visual regression testing tools like BackstopJS. It suggests trialing AWS Control Tower, ClickHouse, and pgvector, among others, reflecting a focus on cloud management, data processing, and AI-related database technologies.
For Languages and Frameworks, dbt and Testcontainers are recommended for adoption. The Trial ring includes CAP, CARLA, and LlamaIndex, reflecting the growing interest in AI and machine learning frameworks.
The Technology Radar also highlighted the growing interest in small language models (SLMs) as an alternative to large language models (LLMs) for certain applications, noting their potential for better performance in specific contexts and their ability to run on edge devices. This edition drew a parallel between the current rapid growth of AI technologies and the explosive expansion of the JavaScript ecosystem around 2015.
Overall, the Technology Radar Vol 31 reflects a technology landscape heavily influenced by AI and machine learning advancements, while also emphasizing the continued importance of solid software engineering practices. Created by Thoughtworks’ Technology Advisory Board, the technology Radar provides valuable insights twice-yearly for developers, architects, and technology leaders navigating the rapidly evolving tech ecosystem, offering guidance on which technologies to adopt, trial, assess, or approach with caution.
The Thoughtworks Technology Radar is available in two formats for readers: an interactive online version accessible through the website, and a downloadable PDF document.
MMS • Steef-Jan Wiggers
Article originally posted on InfoQ. Visit InfoQ
AWS recently announced a new integration between AWS Amplify Hosting and Amazon Simple Storage Service (S3), enabling users to deploy static websites from S3 quickly. This integration streamlines the hosting process, allowing developers to deploy static sites stored in S3 and deliver content over AWS’s global content delivery network (CDN) with just a few clicks according to the company.
AWS Amplify Hosting, a fully managed hosting solution for static sites, now offers users an efficient method to publish websites using S3. The integration leverages Amazon CloudFront as the underlying CDN to provide fast, reliable access to website content worldwide. Amplify Hosting handles custom domain setup, SSL configuration, URL redirects, and deployment through a globally available CDN, ensuring optimal performance and security for hosted sites.
Setting up a static website using this new integration begins with an S3 bucket. Users can configure their S3 bucket to store website content, then link it with Amplify Hosting through the S3 console. From there, a new “Create Amplify app” option in the Static Website Hosting section guides users directly to Amplify, where they can configure app details like the application name and branch name. Once saved, Amplify instantly deploys the site, making it accessible on the web in seconds. Subsequent updates to the site content in S3 can be quickly published by selecting the “Deploy updates” button in the Amplify console, keeping the process seamless and efficient.
(Source: AWS News blog post)
This integration benefits developers by simplifying deployments, enabling rapid updates, and eliminating the need for complex configuration. For developers looking for programmatic deployment, the AWS Command Line Interface (CLI) offers an alternative way to deploy updates by specifying parameters like APP_ID and BRANCH_NAME.
Alternatively, according to the respondent on a Reddit thread, users could opt for Cloudflare:
If your webpage is static, you might consider using Cloudflare – it would probably be cheaper than the AWS solution.
Or using S3 and GitLab CI, according to a tweet by DrInTech:
Hello everyone! I just completed a project to host a static portfolio website, leveraging a highly accessible and secure architecture. And the best part? It costs only about $0.014 per month!
Lastly, Amplify Hosting integration with Amazon S3 is available in AWS Regions where Amplify Hosting is available and pricing details for S3 and hosting on the respective pricing pages.
Podcast: Trends in Engineering Leadership: Observability, Agile Backlash, and Building Autonomous Teams
MMS • Chris Cooney
Article originally posted on InfoQ. Visit InfoQ
Transcript
Shane Hastie: Good day, folks. This is Shane Hastie for the InfoQ Engineering Culture Podcast. Today I’m sitting down across many miles with Chris Cooney. Chris, welcome. Thanks for taking the time to talk to us today.
Introductions [01:03]
Chris Cooney: Thank you very much, Shane. I’m very excited to be here, in indeed many miles. I think it’s not quite the antipodes, right, but it’s very, very close to the antipodes. Ireland off New Zealand. It’s the antipodes of the UK, but we are about as far away as it gets. The wonders of the internet, I suppose.
Shane Hastie: Pretty much so, and I think the time offset is 13 hours today. My normal starting point is who is Chris?
Chris Cooney: That’s usually the question. So hello, I’m Chris. I am the Head of Developer Relations for a company called Coralogix. Coralogix is a full stack observability platform processing data without indexing in stream. We are based in several different countries. I am based in the UK, as you can probably tell from my accent. I have spent the past 11, almost 12 years now as a software engineer. I started out as a Java engineer straight out of university, and then quickly got into front-end engineering, didn’t like that very much and moved into SRE and DevOps, and that’s really where I started to enjoy myself. And over the past several years, I’ve moved into engineering leadership and got to see organizations grow and change and how certain decisions affect people and teams.
And now more recently, as the Head of Developer Relations for Coralogix, I get to really enjoy going out to conferences, meeting people, but also I get a lot of research time to find out about what happens to companies when they employ observability. And I get to also understand the trends in the market in a way that I never would’ve been able to see before as a software engineer, because I get to go meet hundreds and hundreds of people every month, and they all give me their views and insights. And so, I get to collect all those together and that’s what makes me very excited to talk on this podcast today about the various different topics that got on in the industry.
Shane Hastie: So let’s dig into what are some of those trends? What are some of the things that you are seeing in your conversation with engineering organizations?
The backlash against “Agile” [02:49]
Chris Cooney: Yes. When I started out, admittedly 11, 12 years ago is a while, but it’s not that long ago really. I remember when I started out in the first company I worked in, we had an Agile consultant come in. And they came in and they explained to me the principles of agility and so on and so forth, so gave me the rundown of how it all works and how it should work and how it shouldn’t work and so on. We were all very skeptical, and over the years I’ve got to see agility become this massive thing. And I sat in boardrooms with very senior executives in very large companies listening to Agile Manifesto ideas and things like that. And it’s been really interesting to see that gel in. And now we’re seeing this reverse trend of people almost emotionally pushing back against not necessarily the core tenets of Agile, but just the word. We’ve heard it so many times, there’s a certain amount of fatigue around it. That’s one trend.
The value of observability [03:40]
The other trend I’m seeing technically is this move around observability. Obviously, I spend most of my time talking about observability now. It used to be this thing that you have to have so when things have gone wrong or to stop things from going wrong. And there is this big trend now of organizations moving towards less to do with what’s going wrong. It’s a broader question, like, “Where are we as a company? How many dev hours did we put into this thing? How does that factor in the mean times to recovery reduction, that kind of thing?” They’re much broader questions now, blurring in business measures, technical measures, and lots more people measures.
I’ll give you a great example. Measuring rage clicks on an interface is like a thing now and measuring the emotionality with which somebody clicks a button. It’s a fascinating, I think it’s like a nice microcosm of what’s going on in the industry. Our measurements are getting much more abstract. And what that’s doing to people, what it’s doing to engineering teams, it’s fascinating. So there’s lots and lots and lots.
And then, obviously there’s the technical trends moving around AI and ML and things like that and what’s doing to people and the uncertainty around that and also the excitement. It’s a pretty interesting time.
Shane Hastie: So let’s dig into one of those areas in terms of the people measurements. So what can we measure about people through building observability into our software?
The evolution of what can be observed [04:59]
Chris Cooney: That’s a really interesting topic. So I think it’s better to contextualize, begin with okay, we started out, it was basically CPU, memory, disk, network, the big four. And then, we started to get a bit clever and looked at things at latency and response sizes, data exchanged over a server and so forth. And then, as we built up, we started to look at things like we’ve got some marketing metrics in there, so balance rates, how long somebody stays on a page and that kind of thing.
Now we’re looking at the next sort of tier, so the next level of abstraction up, which is more like, did the user have a good experience on the website, and what does that mean? So you see web vitals are starting to break into this area, things like when was the meaningful moment that a user saw the content they wanted to see? Not first ping, not first load, load template. The user went to this page, they wanted to see a product page. How long was it, not just how long did the page take to load before they saw all the meaningful information they needed? And that’s an amalgamation of lots and lots of different signals and metrics.
I’ve been talking recently about this distinction between a signal and an insight. So my taxonomy, the way I usually slice it, is a signal is a very specific technical measurement of something: latency, page load time, bytes exchange, that kind of thing. And in insight, there’s an amalgamation of lots of different signals to produce one useful thing, and my litmus test for an insight is that you can take it to your non-technical boss and they will understand it. They will understand what you’re talking about. When I say to my non-technical boss, “My insight is this user had a really bad experience loading the product page. It took five seconds for the product to appear, and they couldn’t buy the thing. They figured they that couldn’t work out where to do it”. That would be a combination of various different measures around where they clicked on the page, how long the HTML ping took, how long the actual network speed was to the machine, and so on.
So that’s what I’m talking about with the people experience metrics. It’s fascinating in that respect, and this new level now, which is directly answering business questions. It’s almost like we’ve built scaffolding up over the years, deeply technical. When someone would say, “Did that person have a good experience?” And we’d say, “Well, the page latency was this, and the HTTP response was 200, which is good, but then the page load time was really slow”. But now we just say yes or no because of X, Y and Z. And so, that’s where we’re going to I think. And this is all about that trend of observability moving into that business space, is taking much more broad encompassing measurements and a much higher level of abstraction. And that’s what I mean when I said to more people metrics as a general term.
Shane Hastie: So what happens when an organization embraces this? When not just the technical team, but the product teams, when the whole organization is looking at this and using this to perhaps make decisions about what they should be building?
Making sense of observations [07:47]
Chris Cooney: Yes. There are two things here in my opinion. One is there’s a technical barrier, which is making the information literally available in some way. So putting a query engine, and so putting, what’s an obvious one? Putting Kibana in front of open search is the most common example. It’s a way to query your data. Making a SQL query engine in front of your database is a good example. So just doing that is the technical boom. And that is not easy, by the way. That is a certain level of scale. Technically, that is really hard to make high performance queries for hundreds, potentially thousands of users hence taken concurrently. That’s not easy.
Let’s assume that’s out of the way and the organization’s work that out. The next challenge is, “Well, how do we make it so that users can get the questions they need answered, answered quickly without specialist knowledge?” And we’re not there yet. Obviously AI makes a lot of very big promises about natural language query. It’s something that we’ve built into the platform in Coralogix ourselves. It works great. It works really, really well. And I think what we have to do now is work out how do we make it as easy as possible to get access to that information?
Let’s assume all those barriers are out of the way, and an organization has achieved that. And I saw something similar to this when I was a Principal Engineer at Sainsbury’s when we started to surface, it’s an adjacent example, but still relevant, introduction of SLOs and SLIs into the teams. So where before if I went to one team and said, “How has your operational success been this month?” They would say, “Well, we’ve had a million requests and we serviced them all in under 200 milliseconds”. Okay. I don’t know what that means. Is 200 milliseconds good? Is that terrible? What does that mean? We’d go to another team and they’d say, “Well, our error rate is down to 0.5%” Well, brilliant. But last month it was 1%. The month before that it was 0.1% or something.
When we introduced SLOs and SLIs into teams, we could see across all of them, “Hey, you breached your error budget. You have not breached your error budget”. And suddenly, there was a universal language around operational performance. And the same thing happens when you surface the data. You create a universal language around cross-cutting insights across different people.
Now, what does that do to people? Well, one, it shines spotlights in places that some people may not want them shined there, but it does do that. That’s what the universal language does. It’s not enough just to have the data. You have to have effective access to it. You have to have effective ownership of it. And doing that surfaces conversations that would be initially quite painful. There are lots of people, especially in sufficiently large organizations, that have been kind of just getting by by flying under the radar, and it does make that quite challenging.
The other thing that it does, some people, it makes them feel very vulnerable because they feel like KPIs. They’re not. We’re not measuring that performance on if they miss their error budget. When I was the business engineer, no one would get fired. We’d sit down and go, “Hey, you missed your error budget. What can we do here? What’s wrong? What are the barriers?” But it actually made some people feel very nervous and very uncomfortable with it and they didn’t like it. Other people thrived and loved. It became a target. “How much can we beat our budget by this month? How low can we get it?”
Metrics create behaviors [10:53]
So the two things I would say, the big sweeping changes in behavior, it’s that famous phrase, “Build me a metric and I’ll show you a behavior”. So if you measure somebody, human behavior is what they call a type two chaotic system.
By measuring it, you change it. And it’s crazy in the first place. So as soon as you introduce those metrics, you have to be extremely cognizant of what happens to dynamics between teams and within teams. Teams become competitive. Teams begin to look at other teams and wonder, “How the hell are they doing that? How is their error budget so low? What’s going on?” Other teams maybe in an effort to improve their metrics artificially will start to lower their deployment frequency and scrutinize every single thing. So while their operational metrics look amazing, their delivery is actually getting worse, and all these various different things that go on. So that competitiveness driven by uncertainty and vulnerability is a big thing that happens across teams.
The other thing that I found is that the really great leaders, the really brilliant leaders love it. Oh, in fact, all leadership love it. All leadership love higher visibility. The great leaders see that higher visibility and go, “Amazing. Now I can help. Now I can actually get involved in some of these conversations that would’ve been challenging before”.
The slightly more, let’s say worrying leaders will see this as a rod with which to beat the engineers. And that is something that you have to be extremely careful of. Surfacing metrics and being very forthright about the truth and being kind of righteous about it all is great and it’s probably the best way to be. But the consequence is that a lot of people can be treated not very well if you have the wrong type of leadership in place, who see these measurements as a way of forcing different behaviors.
And so, it all has to be done in good faith. It all has to be done based on the premise that everybody is doing their best. And if you don’t start from that premise, it doesn’t matter how good your measurements are, you’re going to be in trouble. Those are the learnings that I took from when I rolled it out and some of the things that I saw across an organization. It was largely very positive though. It just took a bit of growing pains to get through.
Shane Hastie: So digging into the psychological safety that we’ve heard about and known about for a couple of decades now.
Chris Cooney: Yes. Yes.
Shane Hastie: We’re not getting it right.
Enabling psychological safety is still a challenge [12:59]
Chris Cooney: No, no. And I think that my experience when I first go into reading about, it’s like Google’s Project Aristotle and things like that may be. And my first attempt at educating an organization on psychological safety was they had this extremely long, extremely detailed incident management review, where if something goes wrong, then they have, we’re talking like a 200-person, several day, sometimes several day. I think on the low end it was like five, six hours, deep review of everything. Everyone bickers and argues at each other and points fingers at each other. And there’s enormous documents produced, it’s filed away, and nobody ever looks at it ever again because who wants to read those things? It’s just a historical text about bickering between teams.
And what I started to do is I said, “Well, why don’t we trial like a more of a blameless post-mortem method? Let’s just give that a go and we’ll see what happens”. So the first time I did it, the meeting went from, they said the last meeting before them was about six hours. We did it in about 45 minutes. I started the meeting by giving a five-minute briefing of why this post-mortem has to be blameless. The aviation industry and the learnings that came from that around if you hide mistakes, they only get worse. We have to create an environment where you’re okay to surface mistakes. Just that five-minute primer and then about a 40-ish-minute conversation. And we had a document that was more thorough, more detailed, more fact-based, and more honest than any incident review that I ever read before that.
So rolling that out across organizations was really, really fun. But then, I saw it go the other way, where they’d start saying, “Well, it’s psychologically safe”. And it’s turned inside this almost hippie loving, where nobody’s done anything wrong. There is no such thing as a mistake. And no, that’s not the point. The point is that we all make mistakes, not that they don’t exist. And we don’t point blame in a malicious way, but we can attribute a mistake to somebody. You just can’t do it by… And the language in some of these post-mortem documents that I was reading was so indirect. “The system post a software change began to fail, blah, blah, blah, blah, blah”. Because they’re desperately trying not to name anybody or name any teams or say that an action occurred. It was almost like the system was just running along and then the vibrations from the universe just knocked it out of whack.
And actually, when you got into it, one of the team pushed a code change. It’s like, “No. Team A pushed a code change. Five minutes later there was a memory leak issue that caused this outage”. And that’s not blaming anybody, that’s just stating the fact in a causal way.
So the thing I learned with that is whenever you are teaching about blameless post-mortem psychological safety, it’s crucial that you don’t lose the relationship between cause and effect. You have to show cause A, effect B, cause B, effect C, and so on. Everything has to be linked in that way in my opinion. Because that forces them to say, “Well, yes. We did push this code change, and yes, it looks like it did cause this”.
That will be the thing I think where most organizations get tripped up, is they really go all in on psychological safety. “Cool, we’re going to do everything psychologically safe. Everyone’s going to love it”. And they throw the baby out with the bath water as it were. And they missed the point, which is to get to the bottom of an issue quickly, not to not hurt anybody’s feelings, which is often sometimes a mistake that people make I think, especially in large organizations.
Shane Hastie: Circling back around to one of the comments you made earlier on. The agile backlash, what’s going on there?
Exploring the agile backlash [16:25]
Chris Cooney: I often try to talk about larger trends rather than my own experience, purely because anecdotal experience is only useful as an anecdote. So this is an anecdote, but I think it’s a good indication of what’s going on more broadly. When I was starting out, I was a mid-level Java engineer, and this was when agility was really starting to get a hold in some of these larger companies and they started to understand the value of it. And what happened was we were all on the Agile principles. We were regularly reading the Agile Manifesto.
We had a coach called Simon Burchill who was and is absolutely fantastic, completely, deeply, deeply understands the methodology and the point of agility without getting lost in the miasma of various different frameworks and planning poker cards and all the rest of it. And he was wonderful at it, and I was very, very fortunate to study under him in that respect because it gave me a really good, almost pure perspective of agile before all of the other stuff started to come in.
So what happened to me was that we were delivering work, and if we went even a week over budget or a week over due, the organization would say, “Well, isn’t agile supposed to speed things up?” And it’s like, “Well, no, not really. It’s more of just that we had a working product six weeks ago, eight weeks ago, and you chose not to go live with it”. Which is fine, but that’s what you get with the agile process. You get a much earlier working software that gives you the opportunity to go live if you get creative with how you can productionize it or turn into a product.
So that was the first thing, I think. One of the seeds of the backlash is a fundamental misunderstanding about what Agile is supposed to be doing for you. It’s not to get things done faster, it’s to just incrementally deliver working software so you have a feedback loop and a conversation that’s going on constantly. And an empirical learning cycle is occurring, so you’re constantly improving the software, not build everything, test everything, deploy it, and find out it’s wrong. That’s one.
The other thing I will say is what I see on Twitter a lot now, or X they call it these days, is the Agile Industrial Complex, which is a phrase that I’ve seen batted around a lot, which is essentially organizations just selling Scrum certifications or various different things that don’t hold that much value. That’s not to say all Scrum certifications are useless. I did one, it was years and years ago, I forget the name of the chap now. It was fantastic. He gave a really, really great insight into Scrum, for example, why it’s useful, why it’s great, times when it may be painful, times when some of its practices can be dropped, the freedom you’ve got within the Scrum guide.
One of the things that he said to me that always stuck with me, this is just an example of a good insight that came from an Agile certification was he said, “It’s a Scrum guide, not the Scrum Bible. But it’s a guide. The whole point is to give you an idea. You’re on a journey, and the guide is there to help you along that journey. It is not there to be read like a holy text”. And I loved that insight. It really stuck with me and it definitely informed how I went out and applied those principles later on. So there is a bit of a backlash against those kinds of Agile certifications because as is the case with almost any service, a lot of it’s good, a lot of it’s bad. And the bad ones are pretty bad.
And then, the third thing I will say is that an enormous amount of power was given to Agile coaches early on. They were almost like the high priests and they were sort of put into very, very senior positions in an organization. And like I said, there are some great Agile coaches. I’ve had the absolute privilege of working with some, and there were some really bad ones, as there are great software engineers and bad software engineers, great leaders and poor leaders and so on.
The problem is that those coaches were advising very powerful people in organizations. And if you’re giving bad advice to very powerful people, the impact of that advice is enormous. We know how to deal with a bad software engineering team. We know how to deal with somebody that doesn’t want to write tests. As a software function, we get that. We understand how to work around that and solve that problem. Sometimes it’s interpersonal, sometimes it’s technical, whatever it is, we know how to fix it.
We have not yet figured out this sort of grand vizier problem of there is somebody there giving advice to the king who doesn’t really understand what they’re talking about, and the king’s just taking them at their word. And that’s what happened with Agile. And that I think is one of the worst things that we could have done was start to take the word of people as if they are these experts in Agile and blah, blah, blah. It’s ultimately software delivery. That’s what we’re trying to do. We’re trying to deliver working software. And if you’re going to give advice, you’d really better deeply understand delivery of working software before you go and about interpersonal things and that kind of stuff.
So those are the three things I think that have driven the backlash. And now there’s just this fatigue around the word Agile. Like I say, I had the benefit of going to conferences and I’ve seen the word Agile. When I first started talking, it was everywhere. You couldn’t miss a conference where the word Agile wasn’t there, and now it is less and less prevalent and people start talking more about things like continuous delivery, just to avoid saying the word Agile. Because the fatigue is almost about around the word than it’s around the principles.
And the last thing I’ll say is there is no backlash against the principles. The principles are here to stay. It’s just software engineering now. We just call it what would’ve been Agile 10 years ago is just how to build working software now. It’s so deeply ingrained in how we think that we think we’re backlash against Agile. We’re not. We’re backlash against a few words. The core principles are parts of software engineering now, and they’re here to stay for a very long time, I suspect.
Shane Hastie: How do we get teams aligned around a common goal and give them the autonomy that we know is necessary for motivation?
Make it easy to make good decisions [21:53]
Chris Cooney: Yes. I have just submitted a talk to Cube put on this. And I won’t say anything just at risk of jeopardizing our submission, but the broad idea is this. Let’s say I was in a position, I had like 20 something teams, and the wider organization was hundreds of teams. And we had a big problem, which was every single team had been raised on this idea of, “You pick your tools, you run with it. You want to use AWS, you want to use GCP, you want to use Azure? Whatever you want to use”.
And then after a while, obviously the bills started to roll in and we started to see that actually this is a rather expensive way of running an organization. And we started to think, “Well, can we consolidate?” So we said, “Yes, we can consolidate”. And a working group went off, picked a tool, bought it, and then went to the teams and said, “Thou shalt use this, and nobody listened”. And then, we kind of went back to the drawing board and they said, “Well, how do we do this?” And I said, “This tool was never picked by them. They don’t understand it, they don’t get it. And they’re stacking up migrating to this tool against all of the deliverables they’re responsible for”. So how do you make it so that teams have the freedom and autonomy to make effective decisions, meaningful decisions about their software, but it’s done in a way that there is a golden path in place such that they’re all roughly moving in the same direction?
When we started to build out a project within Sainsbury’s was completely re-platforming the entire organization. It’s still going on now. It’s still happening now. But hundreds and hundreds of developers have been migrating onto this platform. It was a team in which I was part of. It’s started, I was from Manchester in the UK, we originally called it the Manchester PAS, Platform As a Service. I don’t know if you know this, but the bumblebee is one of the symbols of Manchester. It had a little bumblebee in the UI. It was great. We loved it. And we built it using Kubernetes. We built it using Jenkins for CI, CD, purely because Jenkins was big in the office at the time. It isn’t anymore. Now it’s GitHub Actions.
And what we said was, “Every team in Manchester, every single resource has to be tagged so we know who owns what. Every single time there’s a deployment, we need some way of seeing what it was and what went into it”. And sometimes some periods of the year are extremely busy and extremely serious, and you have to do additional change notifications in different systems. So every single team between the Christmas period for a grocer, Sainsbury’s sells an enormous amount of trade between let’s say November and January. So during that period, they have to raise additional change requests, but they’re doing 30, 40 commits a day, so they can’t be expected to fill up those forms every single time. So I wonder if we can automate that for them.
And what I realized was, “Okay, this platform is going to make the horrible stuff easy and it’s going to make it almost invisible; not completely invisible because they still have to know what’s going on, but it has to make it almost invisible”. And by making the horrible stuff easy, we incentivize them to use the platform in the way that it’s intended. So we did that and we onboarded everybody in a couple of weeks, and it took no push whatsoever.
We had product owners coming to us and saying one team just started, they’d started the very first sprint. The goal of their first sprint was to have a working API and a working UI. The team produced, just by using our platform, because we made a lot of this stuff easy. So we had dashboard generation, we had alert generation, we had metric generation because we were using Kubernetes and we were using Istio. We got a ton of HTTP service metrics off the bat. Tracing was built in there.
So in their sprint review at the end of the two weeks, they built this feature. Cool. “Oh, by the way, we’ve done all of this”. And it was an enormous amount of dashboards and things like that. “Oh, by the way, the infrastructure is completely scalable. It’s totally, it’s multi-AZ failover. There’s no productionizing. It’s already production ready”. The plan was to go live in months. They went live in weeks after that. It changed the conversation and that was when things really started to capitalize and have ended up in the new project now, which is across the entire organization.
The reason why I told that story is because you have to have a give and take. If you try and do it like an edict, a top-down edict, your best people will leave and your worst people will try and work through it. Because the best people want to be able to make decisions and have autonomy. They want to have kind of sense of ownership of what they’re building. Skin in the game is often the phrase that it’s banded around.
And so, how do you give engineers the autonomy? You build a platform, you make it highly configurable, highly self-serviced. You automate all the painful bits of the organization, for example, compliance of change request notifications and data retention policies and all that. You automate that to the hilt so that all they have to do is declare some config and repository and it just happens for them. And then, you make it so the golden path, the right path, is the easy path. And that’s it. That’s the end of the conversation. If you can do that, if you can deliver that, you are in a great space.
If you try to do it as a top-down edict, you will feel a lot of pain and your best people will probably leave you. If you do it as a collaborative effort so that everybody’s on the same golden path, every time they make a decision, the easy decision is the right one, it’s hard work to go against the right decision. Then you’ll incentivize the right behavior. And if you make some painful parts of their life easy, you’ve got the carrot, you’ve got the stick, you’re in a good place. That’s how I like to do it. I like to incentivize the behavior and let them choose.
Shane Hastie: Thank you so much. There’s some great stuff there, a lot of really insightful ideas. If people want to continue the conversation, where do they find you?
Chris Cooney: If you open up LinkedIn and type Chris Cooney, I’ve been reliably told that I am the second person in the list. I’m working hard for number one, but we’ll get there. If you look for Chris Cooney, if I don’t come up, Chris Cooney, Coralogix, Chris Cooney Observability, anything like that, and I will come up. And I’m more than happy to answer any questions. On LinkedIn is usually where I’m most active, especially for work-related topics.
Shane Hastie: Cool. Chris, thank you so much.
Chris Cooney: My pleasure. Thank you very much for having me.
Mentioned:
.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.
MMS • Aditya Kulkarni
Article originally posted on InfoQ. Visit InfoQ
Recently, AWS CodeBuild introduced support for managed GitLab self-hosted runners, towards advancement in continuous integration and continuous delivery (CI/CD) capabilities. This new feature allows customers to configure their CodeBuild projects to receive and execute GitLab CI/CD job events directly on CodeBuild’s ephemeral hosts.
The integration offers several key benefits including Native AWS Integration, Compute Flexibility and Global Availability. GitLab jobs can now seamlessly integrate with AWS services, leveraging features such as IAM, AWS Secrets Manager, AWS CloudTrail, and Amazon VPC. This integration enhances security and convenience for the users.
Furthermore, customers gain access to all compute platforms offered by CodeBuild, including Lambda functions, GPU-enhanced instances, and Arm-based instances. This flexibility allows for optimized resource allocation based on specific job requirements.The integration is available in all regions where CodeBuild is offered.
To implement this feature, users need to set up webhooks in their CodeBuild projects and update their GitLab CI YAML files to utilize self-managed runners hosted on CodeBuild machines.
The setup process involves connecting CodeBuild to GitLab using OAuth, which requires additional permissions such as create_runner
and manage_runner
.
It’s important to note that CodeBuild will only process GitLab CI/CD pipeline job events if a webhook has filter groups containing the WORKFLOW_JOB_QUEUED
event filter. The buildspec in CodeBuild projects will be ignored unless buildspec-override:true
is added as a label, as CodeBuild overrides it to set up the self-managed runner.
When a GitLab CI/CD pipeline run occurs, CodeBuild receives the job events through the webhook and starts a build to run an ephemeral GitLab runner for each job in the pipeline. Once the job is completed, the runner and associated build process are immediately terminated.
As a side, GitLab has been in the news since earlier this year as they planned to introduce CI Steps, which are reusable and composable pieces of a job that can be referenced in pipeline configurations. These steps will be integrated into the CI/CD Catalog, allowing users to publish, unpublish, search, and consume steps similarly to how they use components.
Moreover, GitLab is working on providing users with better visibility into component usage across various project pipelines. This will help users identify outdated versions and take prompt corrective actions, promoting better version control and project alignment.
AWS CodeBuild has been in the news as well, as they added support for Mac Builds. Engineers can build artifacts on managed Apple M2 instances that run on macOS 14 Sonoma. Few weeks ago, AWS CodeBuild enabled customers to configure automatic retries for their builds, reducing manual intervention upon build failures. They have also added support for building Windows docker images in reserved fleets.
Such developments demonstrate the ongoing evolution of CI/CD tools and practices, with a focus on improving integration, flexibility, and ease of use for DevOps teams.
MMS • Noemi Vanyi Simona Pencea
Article originally posted on InfoQ. Visit InfoQ
Transcript
Pencea: I bet you’re wondering what this picture is doing on a tech conference. These are two German academics. They started to build a dictionary, but they actually became famous, because along the way, they collected a lot of folk stories. The reason they are here is partly because they were my idol when I was a child. I thought there was nothing better to do than to listen to stories and collect them. My family still makes fun of me because I ended up in tech after that. The way I see it, it’s not such a big difference. Basically, we still collect folk stories in tech, but we don’t call them folk stories, we call them best practices. Or we go to conferences to learn about them, basically to learn how other people screwed up, so that we don’t do the same. After we collect all these stories, we put them all together. We call them developer experience, and we try to improve that. This brings us to the talk that we have, improving developer experience using automated data CI/CD pipelines. My name is Simona Pencea. I am a software engineer at Xata.
Ványi: I’m Noémi Ványi. I’m also a software engineer at the backend team of Xata. Together, we will be walking through the data developer experience improvements we’ve been focused on recently.
Pencea: We have two topics on the agenda. The first one, testing with separate data branches covers the idea that when you create a PR, you maybe want to test your PR using a separate data environment that contains potentially a separate database. The second one, zero downtime migrations, covers the idea that we want to improve the developer experience when merging changes that include schema changes, without having any downtime. Basically, zero downtime migrations. For that, we developed an open-source tool called pgroll. Going through the first one, I will be covering several topics. Basically, I will start by going through the code development flow that we focused on. The testing improvements that we had in mind. How we ensured we have data available in those new testing environments. How we ensured that data is safe to use.
Code Development Workflow
This is probably very familiar to all of you. It’s basically what I do every day. When I’m developing, I’m starting with the local dev. I have my local dev data. It’s fake data. I’m trying to create a good local dev dataset when I’m testing my stuff. I’m trying to think about corner cases, and cover them all. The moment I’m happy with what I have in my local dev, I’m using the dev environment. This is an environment that is shared between us developers. It’s located in the cloud, and it also has a dataset. This is the dev dataset. This is also fake data, but it’s crowdfunded from all the developers that use this environment. There is a chance that you find something that it’s not in the local dev. Once everything is tested, my PR is approved. I’m merging it. I reach staging.
In staging, there is another dataset which is closer to the real life, basically, because it’s from beta testing or from demos and so on. It’s not the real thing. The real thing is only in prod, and this is the production data. This is basically the final test. The moment my code reaches prod, it may fail, even though I did my best to try with everything else. In my mind, I would like to get my hands on the production dataset somehow without breaking anything, if possible, to test it before I reach production, so that I minimize the chance of bugs.
Data Testing Improvements – Using Production Data
This is what led to this. Can we use production data to do testing with it? We’ve all received those emails sometimes, that say, test email, and I wasn’t a test user. Production data would bring a lot of value when used for testing. If we go through the pros, the main thing is, it’s real data. It’s what real users created. It’s basically the most valuable data we have. It’s also large. It’s probably the largest dataset you have, if we don’t count load test generated data and so on. It’s fast in the way that you don’t have to write a script or populate a database. It’s already there, you can use it. There are cons to this. There are privacy issues. It’s production data: there’s private information, private health information. I probably don’t even have permission from my users to use the data for testing. Or, am I storing it in the right storage? Is this storage with the right settings so that I’m not breaking GDPR or some other privacy laws.
Privacy issues are a big con. The second thing, as you can see, large is also a con, because a large dataset does not mean a complete dataset. Normally, all the users will use your product in the most common way, and then you’ll have some outliers which give you the weird bugs and so on. Having a large dataset while testing may prevent you from seeing those corner cases, because they are better covered. Refreshing takes time because of the size. Basically, if somebody changes the data with another PR or something, you need to refresh everything, and then it takes longer than if you have a small subset. Also, because of another PR, you can get into data incompatibility. Basically, you can get into a state where your test breaks, but it’s not because of your PR. It’s because something broke, or something changed, and now it’s not compatible anymore.
If we look at the cons, it’s basically two categories that we can take from those. The first ones are related to data privacy. Then the second ones are related to the size of the dataset. That gives us our requirements. The first one would be, we would like to use production data but in a safe way, and, if possible, fast. Since we want to do a CI/CD pipeline, let’s make it automated. I don’t want to run a script by hand or something. Let’s have the full experience. Let’s start with the automated part. It’s very hard to cover all the ways software developers work. What we did first was to target a simplification, like considering GitHub as a standard workflow, because the majority of developers will use GitHub. One of the things GitHub gives to you is a notification when a PR gets created. Our idea was, we can use that notification, we can hook up to it. Then we can create what we call a database branch, which is basically a separate database, but with the same schema as the source branch, when a GitHub PR gets created. Then after creation, you can copy the data after it. Having this in place would give you the automation part of the workflow.
Let’s see how we could use the production data. We said we want to have a fast copy and also have it complete. I’ll say what that means. Copying takes time. There is no way around it. You copy data, it takes time. You can hack it. Basically, you can have a preemptive copy. You copy the data before anyone needs it, so when they need it, it’s already there. Preemptive copying means I will just have a lot of datasets around, just in case somebody uses it, and then, I have to keep everything in sync. That didn’t really fly with us. We can do Copy on Write, which basically means you copy at the last minute before data is actually used, so before that, all the pointers point to the old data. The problem with using Copy on Write for this specific case is that it did not leave us any way into which we could potentially change the data to make it safe. If I Copy on Write, it’s basically the same data. I will just change the one that I’m editing, but the rest of it is the same one.
For instance, if I want to anonymize an email or something, I will not be able to do it with Copy on Write. Then, you have the boring option, which is, basically, you don’t copy all the data, you just copy a part of the data. This is what we went for, even though it was boring. Let’s see about the second thing. We wanted to have a complete dataset. I’ll go back a bit, and consider the case of a relational database where you have links as a data type. Having a complete dataset means all the links will be resolved inside of this dataset. If I copy all the data, that’s obviously a complete dataset, but if I copy a subset, there is no guarantee it will be complete unless I make it so. The problem with having a complete dataset by following the links is it sounds like an NP-complete problem, and that’s because it is an NP-complete problem. If I want to copy a subset of a bigger data, and to find it of a certain size, I would actually need to find all the subsets that respect that rule, and then select the best one. That would mean a lot of time. In our case, we did not want the best dataset that has exactly the size that we have in mind. We were happy with having something around that size. In that case, we can just go with the first dataset that we can construct that follows this completeness with all the links being resolved in size.
Data Copy (Deep Dive)
The problem with constructing this complete subset is, where do we start? How do we advance? How do we know we got to the end, basically? The where do we start part is solvable, if we think about the relationships between the tables as a graph, and then we apply a topological sort on it. We list the tables based on their degrees of independence. In this case, this is an example. t7 is the most independent, then we have t1, t2, t3, and we can see that if we remove these two things, the degrees of independence for t2 and t3 are immediately increased because the links are gone. We have something like that. Then we go further up. Here we have the special case of a cycle, because you can’t point back with links to the same table that pointed to you. In this case, we can break the cycle, because going back we see the only way to reach this cycle is through t5.
Basically, we need to first reach t5 and then t6. This is what I call the anatomy of the schema. We can see this is the order in which we need to go through the tables when we collect records. In order to answer the other two questions, things get a bit more complicated, because the schema is not enough. The problem with the schema not being enough for these cases is because, first of all, it will tell you what’s possible, but it doesn’t have to be mandatory, unless you have a constraint. Usually, a link can also be empty. If you reach a point where you run into a link that points to nothing, that doesn’t mean you should stop. You need to go and exhaustively find the next potential record to add to the set. Basically, if you imagine it in 3D, you need to project this static analysis that we did on individual rows. The thing that you cannot see through the static analysis from the beginning is that you can have several records from one table pointing to the same one in another table. The first one will take everything with it, and the second one will bring nothing.
Then you might be tempted to stop searching, because you think, I didn’t make any progress, so then the set is complete, which is not true. You need to exhaustively look until the end of the set. These are just a few of the things that, when building this thing, need to be kept on the back of the mind, basically. We need to always allow full cycle before determining that no progress was made. When we select the next record, we should consider the fact that it might have been already brought into the set, and we shouldn’t necessarily stop at that point.
We talked about at the beginning how we want to have this production data, but have it safe to use. This is the last step that we are still working on. It is a bit more fluffy. The problem with masking the data is that, for some fields, you know exactly what they are. It’s an email, then, sure, it’s private data. What if it’s free text, then what? If it’s free text, you don’t know what’s inside. The assumption is it could be private data. The approach here was to provide several possibilities on how to redact data and allow the user to choose, because the user has the context and they should be able to select based on the use case. The idea of having, for instance, a full reduction or a partial reduction, is that, sure you can apply that, but it will break your aggregations.
For instance, if I have an aggregation by username, like my Gmail address, and I want to know how many items I have assigned to my email address, if I redact the username and it will be, **.gmail, then I get aggregations on any Gmail address that has items in my table. The most complete would be a full transformation. The problem with full transformation is that it takes up a lot of memory, because you need to keep the map with the initial item and the changed item. Depending on the use case, you might not need this because it’s more complex to maintain. Of course, if there is a field that has sensitive data and you don’t need it for your specific test case, you can just remove it. The problem with removing a field is that that would basically mean you’re changing the schema, so you’re doing a migration, and that normally causes issues. In our case, we have a solution for the migrations, so you can feel free to use it.
Zero Downtime Migrations
Ványi: In this section of the presentation, I would like to talk about, what do we mean by zero downtime. What challenges do we face when we are migrating the data layer? I will talk about the expand-contract pattern and how we implemented it in PostgreSQL. What do I mean when I say zero downtime? It sounds so nice. Obviously, downtime cannot be zero because of physics, but the user can perceive it as zero. They can usually tolerate around 20 milliseconds of latency. Here I talk about planned maintenance, not service outages. Unfortunately, we rarely have any control over service outages, but we can always plan for our application updates.
Challenges of Data Migrations
Let’s look at the challenges we might face during these data migrations. Some migrations require locking, unfortunately. These can be table, read, write locks, meaning no one can access the table. They cannot read. They cannot write. In case of high availability applications, that is unacceptable. There are other migrations that might rely on read, write locks. Those are a bit better, and we can live with that. Also, it’s something we want to avoid. Also, when there is a data change, obviously we have to update the application as well, and the new instance, it has to start and run basically at the same time as the old application is running. This means that the database that we are using, it has to be in two states at the same time. Because there are two application versions interacting with our database, we must make sure, for example, if we introduce a new constraint, that it is enforced in both the existing records and on the new data as well.
Based on these challenges, we can come up with a list of requirements. The database must serve both the old schema and the new schema to the application, because we are running the old application and the new application at the same time. Schema changes are not allowed to block database clients, meaning we cannot allow our applications to be blocked because someone is updating the schema. The data integrity must be preserved. For example, if we introduce a new data constraint, it must be enforced on the old records as well. When we have different schema versions live at the same time, they cannot interfere with each other. For example, when the old application is interacting with the database, we cannot yet enforce the new constraints, because it would break the old application. Finally, as we are interacting with two application versions at the same time, we must make sure that the data is still consistent.
Expand-Contract Pattern
The expand-contract pattern can help us with this. It can minimize downtime during these data migrations. It consists of three phases. The first phase is expand. This is the phase when we add new changes to our schema. We expand the schema. The next step is migrate. That is when we start our new application version. Maybe test it. Maybe we feel lucky, we don’t test it at all. At this point, we can also shut down the old application version. Finally, we contract. This is the third and last phase. We remove the unused and the old parts from the schema. This comes with several benefits.
In this case, the changes do not block the client applications, because we constantly add new things to the existing schema. The database has to be forward compatible, meaning it has to support the new application version, but at the same time, it has to support the old application version, so the database is both forward and backwards compatible with the application versions. Let’s look at a very simple example, renaming a column. It means that here we have to create the new column, basically with a new name, and copy the contents of the old column. Then we migrate our application and delete our column with the old name. It’s very straightforward. We can deploy this change using, for example, the blue-green deployments. Here, the old application is still live, interacting with our table through the old view. At the same time, we can deploy our new application version which interacts through another new view with the same table. Then we realize that everything’s passing. We can shut down the old application and remove the view, and everything just works out fine.
Implementation
Let’s see how we implemented in Postgres. First, I would like to explain why we chose PostgreSQL in the first place. Postgres is well known, open source. It’s been developed for 30 years now. The DDL statements are transactional, meaning, if one of these statements fail, it can be rolled back easily. Row level locking. They mostly rely on row level locking. Unfortunately, there are a few read, write locks, but we can usually work around those. For example, if you are adding a nonvolatile default value, the table is not rewritten. Instead, the value is added to the metadata of the table. The old records are updated when the whole record is updated. It doesn’t really work all the time. Let’s look at the building blocks that Postgres provides. We are going to use three building blocks, DDL statements, obviously, to alter the schema.
Views, to expose the different schema version to the different application versions. Triggers and functions to migrate the old data, and on failure, to roll back the migrations. Let’s look at a bit more complex example. We have an existing column, and we want to add the NOT NULL constraint to it. It seems simple, but it can be tricky because Postgres does a table scan, meaning it locks the table, and no one can update or read the table, because it goes through all of the records and checks if any of the existing records violate the NOT NULL constraint. If it finds a record that violates this constraint, then the statement returns an error, unfortunately. We can work around it. If we add NOT VALID to this constraint, the table scan escaped. Here we add the new column and set NOT NULL constraint and add NOT VALID to it, so we are not blocking the database clients.
We also create triggers that move the old values from the columns. It is possible that some of the old records don’t yet have values, and in this case, we need to add some default value or any backfill value we want, and then we migrate our app. We need to complete the migration, obviously. We need to clean up the trigger, the view we added, so the applications can interact with the table and the old column. Also, we must remember to remove NOT VALID from the original constraint. We can do it because the migration migrated the old values, and we know that all of the new records, or new values are there, and every record satisfies the constraint.
It all seemed quite tedious to do this all the time, and that’s why we created pgroll. It’s an open-source command line tool, but it is written in Go, so you can also use it as a library. It is used to manage safe and reversible migrations using the expand-contract pattern. I would like to walk you through how to use it. Basically, pgroll is running a Postgres instance, so you need one running somewhere. After you installed and initialized it, you can start creating your migrations. You can define migrations using JSON files. I will show you an example. Once you have your migration, you run a start command. Then it creates a new view, and you can interact with it through your new application. You can test it. Then you can also shut down your old application. You run the complete command. pgroll removes all of these leftover views and triggers for you. This is the JSON example I was just talking about.
Let’s say that we have a user’s column that has an ID field, name field, and a description, and we want to make sure that the description is always there, so we put a NOT NULL constraint on it. In this case, you have to define a name. For the migration, it will be the name of the view, or the schema in Postgres. We define a list of operations. We are altering a column. The table is obviously named users. The description field, we no longer allow null values in the column. This is the interesting part. This is the up migration. It contains what to do when we are migrating the old values. In this case, it means that if the description is missing, we add the text description for and insert the name. Or if the data is there, we just move it to the new column. The down migration defines what to do when there is an error and we want to roll back. In this case, we keep the old value, meaning, if the value was missing, it’s a null, and if there was something, we keep it there.
Here is the start command. Let’s see in psql what just happened. We have a user’s table with these three columns, but you can see that pgroll added a new column. Remember, there is this migration ongoing right now. In the old description column, there are records that do not yet satisfy the constraint. In the new description the backfill value is already there for us to use. We can inspect what schemas are in the database. We can notice that there is this create_users_table, that’s the old schema version. The new one is the user_description_set_nullable, which is the name of the migration we just provided in our JSON. Let’s try to insert some values into this table. We are inserting two values. The first one is basically how the new application version is behaving. The description is not empty. In the second record, we are mimicking what the old application is doing. Here the description is NULL. Let’s say that we succeeded. We can try to query this table.
From the old app’s point of view, we can set the search path to the old schema version and perform the following query so we can inspect what happened after we inserted these values. This is what we get back. The description for Alice is, this is Alice, and for Bob it’s NULL because the old application doesn’t enforce the constraint. Let’s change the search path again to the new schema version and perform the same query. Here we can see that we have the description for Alice. Notice that Bob has a description. It is the default description or default migration we provided in the JSON file. Then we might complete the migration using the complete command, and we can see that the old schema is cleaned up. Also, the intermediary column is also removed, and the triggers, functions, everything is removed. Check out pgroll. It’s open source. It takes care of mostly everything. There is no need to manually create new views, functions, new columns, nothing. After you complete your migrations, it cleans up after itself. It is still under development, so there are a few missing features. For example, few missing migrations. We do not yet support adding comments, unfortunately, or batched migrations.
Takeaways
Pencea: Basically, what we presented so far were bits and pieces from this puzzle that we want to build the CI/CD data pipeline. What we imagined when we created this was, somebody creates a PR. Then, test environment with a test database with valid data that’s also safe to use, gets created for them. Then the tests are run. Everything is validated, PR is merged. Then it goes through the pipeline, and nobody has to take care or worry about migrations, because we can do the changes and everything.
Ványi: The migrations are done without downtime. If your pull request is merged, it goes through the testing pipeline, and if everything passes, that’s so nice. We can clean up after ourselves and remove the old schema. If maybe there is a test failure or something is not working correctly, we can roll back anytime, because the old schema is still kept around just in case. As we just demonstrated or told you about, there are still some work left for us, but we already have some building blocks that you can integrate into your CI/CD pipeline. You can create a test database on the fly using GitHub notifications, fill it with safe and relevant data to test. You can create schema changes and merge them back and forth without worrying about data migrations. You can deploy and roll back your application without any downtime.
Questions and Answers
Participant 1: Does pgroll take care of keeping the metadata of every migration done: is started, ongoing, finished?
Ványi: Yes, there is a migration site. Also, you can store, obviously, your migrations file in Git if you want to control it, but pgroll has its own bookkeeping for past migrations.
Participant 2: For the copying of the data from production, was that for local tests, local dev, or the dev? How did you control costs around constantly copying that data, standing up databases, and tearing them back down?
Pencea: It’s usually for something that sits in the cloud, so not for the local dev.
Participant 2: How did you control cost if you’re constantly standing up a near production size database?
Pencea: What we use internally is data branching. We don’t start a new instance every time. We have a separate schema inside a bigger database. Also, what we offer right now is copying 10k of data, it’s not much in terms of storage. We figured it should be enough for testing purposes.
Participant 3: I saw in your JSON file that you can do migrations that pgroll knows about like, is set nullable to false? Can you also do pure SQL migrations?
Ványi: Yes. We don’t yet support every migration. If there is anything missing, you can always work around it by using raw SQL migrations. In this case, you can shoot yourself in the foot, because, for example, in case of NOT NULL, we take care of the skipping of the table scan for you. When you are writing your own raw SQL migration, you have to be careful not to block your table and the database access.
Participant 4: It’s always amazed me that these databases don’t do safer actions for these very common use cases. Have you ever talked to the Postgres project on improving the actual experience of just adding a new column, or something? It should be pretty simple.
Ványi: We’ve been trying to have conversations about it, but it is a very mature project, and it is somewhat hard to change such a fundamental part of this database. Constraints are like the basic building block for Postgres, and it’s not as easy to just make it more safe. There is always some story behind it.
Pencea: I think developer experience was not necessarily something that people were concerned about, up until recently. I feel like sometimes it was actually the opposite, if it was harder, you looked cooler, or you looked like a hacker. It wasn’t exactly something that people would optimize for. I think it’s something that everybody should work towards, because now everybody has an ergonomic chair or something, and nobody questions that, but we should work towards the same thing about developer experience, because it’s ergonomics in the end.
Participant 5: In a company, assuming they are adopting pgroll, all these scripts can grow in number, so at some point you have to apply all of them, I suppose, in order. Is there any sequence number, any indication, like how to apply these. Because some of them might be serial, some of them can be parallelized. Is there any plan to give direction on the execution? I’ve seen there is a number in the script file name, are you following that as a sequence number, or when you’re then developing your batching feature, you can add a sequence number inside?
Ványi: Do we follow some sequence number when we are running migrations?
Yes and no. pgroll maintains its own table or bookkeeping, where it knows what was the last migration, what is coming next? The number in the file name is not only for pgroll, but also for us.
Participant 6: When you have very breaking migrations using pgroll, let’s say you need to rename a column, or even changing its type, which you basically replicate a new column and then copying over the data. How do you deal with very large tables, say, millions of rows? Because you could end up having even some performance issues with copying these large amounts of data.
Ványi: How do we deal with tables that are basically big? How do we make sure that it doesn’t impact the performance of the database?
For example, in case of moving the values to the new column, we are creating triggers that move the data in batches. It’s not like everything is copied in one go, and you cannot really use your Postgres database because it is busy copying the old data. We try to minimize and distribute the load on the database.
Participant 7: I know you were using the small batches to copy the records from the existing column to the new column. Once you copy all the records, only then you will remove the old column. There is a cost with that.
See more presentations with transcripts
MMS • Ben Linders
Article originally posted on InfoQ. Visit InfoQ
DORA can help to drive sustainable change, depending on how it is used by teams and the way it is supported in a company. According to Carlo Beschi, getting good data for the DORA keys can be challenging. Teams can use DORA reports for continuous improvement by analysing the data and taking actions.
Carlo Beschi spoke about using DORA for sustainable improvement at Agile Cambridge.
Doing DORA surveys in your company can help you reflect on how you are doing software delivery and operation as Beschi explained in Experiences from Doing DORA Surveys Internally in Software Companies. The way you design and run the surveys, and how you analyze the results, largely impact the benefits that you can get out of them.
Treatwell’s first DORA implementation in 2020 focused on getting DORA metrics from the tools. They set up a team that sits between their Platform Engineering team and their “delivery teams” (aka product teams, aka stream aligned teams), called CDA – Continuous Delivery Acceleration team. Half of their time is invested in making other developers and teams life better, and the other half is about getting DORA metrics from the tools:
We get halfway there, as we manage to get deployment frequency and lead time for changes for almost all of our services running in production, and when the team starts to dig into “change failure rate”, Covid kicks in and the company is sold.
DORA can help to drive sustainable change, but it depends on the people who lead and contribute to it, and how they approach it, as Beschi learned. DORA is just a tool, a framework, that you can use to:
- Lightweight assess your teams and organisation
- Play back the results, inspire reflection and action
- Check again a few months / one year later, maybe with the same assessment, to see if / how much “the needle has moved”
Beschi mentioned that teams use the DORA reports as part of their continuous improvement. The debrief about the report is not too different from a team retrospective, one that brings in this perspective and information, and from which the team defines a set of actions, that are then listed, prioritised, and executed.
He has seen benefits from using DORA in terms of aligning people on “this is what building and running good software nowadays looks like”, and “this is the way the best in the industry work, and a standard we aim for”. Beschi suggested focusing the conversation on the capabilities, much more than on the DORA measures:
I’ve had some good conversations, in small groups and teams, starting from the DORA definition of a capability. The sense of “industry standard” helped move away from “I think this” and “you think that”.
Beschi mentioned the advice and recommendations from the DORA community on “let the teams decide, let the teams pick, let the teams define their own ambition and pace, in terms of improvement”. This helps in keeping the change sustainable, he stated.
When it comes to meeting the expectations of senior stakeholders, when your CTO is the sponsor of a DORA initiative then there might be “pushback” on teams making decisions, and expectations regarding their “return of investment” on doing the survey, aiming to have more things change, quicker, Beschi added.
A proper implementation of DORA is far from trivial, Beschi argued. The most effective ones rely on a combination of data gathered automatically from your system alongside qualitative data gathered by surveying (in a scientific way) your developers. Getting good data quickly from the systems is easier said than done.
When it comes to getting data from your systems for the four DORA keys, while there has been some good progress in the tooling available (both open and commercial) it still requires effort to integrate any of them in your own ecosystem. The quality of your data is critical.
Start ups and scale ups are not necessarily very disciplined when it comes to consistent usage of their incident management processes – and this impacts a lot the accuracy of your “failure change rate” and “response time” measures, Beschi mentioned.
Beschi mentioned several resources for companies that are interested in using DORA:
- The DORA website, where you can self-serve all DORA key assets and find the State of DevOps reports
- The DORA community has a mailing list and bi-weekly vídeo calls
- The Accelerate book
In the community you will find a group of passionate and experienced practitioners, very open, sharing their stories “from the trenches” and very willing to onboard others, Beschi concluded.
.NET Aspire 9.0 Now Generally Available: Enhanced AWS & Azure Integration and More Improvements
MMS • Robert Krzaczynski
Article originally posted on InfoQ. Visit InfoQ
.NET Aspire 9.0 is now generally available, following the earlier release of version 9.0 Release Candidate 1 (RC1). This release brings several features aimed at improving cloud-native application development on both AWS and Azure. It supports .NET 8 (LTS) and .NET 9 (STS).
A key update in Aspire 9.0 is the integration of AWS CDK, enabling developers to define and manage AWS resources such as DynamoDB tables, S3 buckets, and Cognito user pools directly within their Aspire projects. This integration simplifies the process of provisioning cloud resources by embedding infrastructure as code into the same environment used for developing the application itself. These resources are automatically deployed to an AWS account, and the references are included seamlessly within the application.
Azure integration has been upgraded in Aspire 9.0. It now offers preview support for Azure Functions, making it easier for developers to build serverless applications. Additionally, there are more configuration options for Azure Container Apps, giving developers better control over their cloud resources. Aspire 9.0 also introduces Microsoft Entra ID for authentication in Azure PostgreSQL and Azure Redis, boosting security and simplifying identity management.
In addition to cloud integrations, Aspire 9.0 introduces a self-contained SDK that eliminates the need for additional .NET workloads during project setup. This change addresses the issues faced by developers in previous versions, where managing different .NET versions could lead to conflicts or versioning problems.
Aspire Dashboard also receives several improvements in this release. It is now fully mobile-responsive, allowing users to manage their resources on various devices. Features like starting, stopping, and restarting individual resources are now available, giving developers finer control over their applications without restarting the entire environment. The dashboard provides better insights into the health of resources, including improved health check functionality that helps monitor application stability.
Furthermore, telemetry and monitoring have been enhanced with expanded filtering options and multi-instance tracking, enabling better debugging in complex application environments. The new support for OpenTelemetry Protocol also allows developers to collect both client-side and server-side telemetry data for more comprehensive performance monitoring.
Lastly, resource orchestration has been improved with new commands like WaitFor
and WaitForCompletion
, which help manage resource dependencies by ensuring that services are fully initialized before dependent services are started. This is useful for applications with intricate dependencies, ensuring smoother deployments and more reliable application performance.
Community feedback highlights how much Aspire’s development experience has been appreciated. One Reddit user noted:
It is super convenient, and I am a big fan of Aspire and how far it has come in such a short time.
Full release details and upgrade instructions are available in the .NET Aspire documentation.