Presentation: How to Ship Updates to 40+ Apps Every Week With Nx

MMS Founder
MMS Santosh Yadav

Article originally posted on InfoQ. Visit InfoQ

Transcript

Yadav: When I was asked to come here and give a talk, I was thinking about how not to give a talk which we have been through, because at Celonis, we went through a lot of troubles to where we are right now, so it’s like a journey, and the tools which we used in the process to make us deliver faster. Nx is one of the most important tools which we use in our ecosystem. That’s why I just mentioned it in the talk title as well. We’ll talk about Nx as well. Let’s see what we are going to talk about.

First, I want to show you how our application looks like. This is our application. If you see the nav bar, actually each nav bar is actually an application. It’s a separate application which we load inside our shell. Even inside the shell, there can be multiple components which we can combine together and create a new dashboard for our end users. It means there are different teams which are building these applications. This is where we are right now.

Problem Statement

How did we start? I just want to show you what the problem statement was. What was the problem or the issue which we were trying to resolve? Then we ended up here. This was our old approach. I was speaking to some of my friends, and we were talking about the same issue where we have multiple repositories, but we are still thinking about, of course, moving all the code to a single monorepo stuff, but we are not able to do it. Or we are struggling, like we know that there might be challenges. This is where we were three years ago. We had separate apps with separate repositories. We used to load each app using URL routing. It’s not like SPA or module federation or microcontent, which we know today. Because in the past few years, tools have added more capabilities.

For example, webpack came with the support of module federation, which was not there earlier. Everyone was solving module federation in some different ways, just not in the right way. This is another issue which we had. Of course, we had close to 40 different repositories, and then we used to build those code. We are using GitHub Actions. Of course, we used to build that code, push it in the artifact or database, because that was the one way to load the application. We used to push the entire build into the database and then load it on frontend. The only problem is we were doing it X times. The same process, same thing, just 30 times. Of course, it costs a lot of money. The other issue which we had was, of course, we have a design system.

Which company doesn’t have a design system? The first thing which a company decides to do is, let’s have a design system. We don’t have a product, but we should have a design system. This was another issue which we had. Now this became a problem. Of course, we had a design system, but now different applications started using different versions of design system, because sometimes they had time. Some teams started pushing back, we don’t have frontend developers or we don’t have time to upgrade it right now. This was, of course, a big pain. How should we do it? This caused another issue.

Some of our customers are actually seeing the same app, but as soon as they move to a different application or part of the application, they see a different design system. There’s probably a dark theme and light theme, just as an example. Think about a different text box. Someone is seeing a different text box and someone is seeing different.

What were the issues we went through? Page reloads, for example, of course, now with HTML5, everyone knows of course, the experience should be smooth. As soon as I click on the URL, there should not be page refresh. That’s the expectation of today’s users. This is not early ’90s or 2000 where we can just click on a URL and wait for one hour to download my page. This is the thing of the past. Our users were facing this issue. Every page, every app, it reloads the entire thing. Bundle size, of course, we could not actually tree shake anything, or there was no lazy loading. Of course, there was a huge bundle size which we used to download.

Of course, when we have to upgrade Angular or any other framework, this can be any other framework which you are using in your enterprise. We are using Angular, of course. We had too much effort upgrading Angular because we have to do it 30 times. Plus, our reusables and design system. Maintaining multiple versions of shared libraries and design system became a pain because we cannot move ahead and adopt the new things which are available in Angular or any other ecosystem because it’s always about backward compatibility.

Everyone knows, backward compatibility is not a thing. It’s just a compromise. It’s a compromise we do that, ok, we have to support this. That’s why we are just still here. Now, as we said, we had 30-plus apps and then we used to deploy them separately. We had to sync our design system, which we saw in the previous slide. Which was, again, very difficult because for a few seconds or a few minutes, if your releases are not synchronized, you will see different UIs.

What Is Nx?

Then came Nx. Of course, we started adopting Nx almost three years back. Let’s see what is Nx. It’s a build tool. It’s an open-source build tool, which is available for everyone. You can just start using it for free. There’s no cost needed. It also supports monorepo. Monorepo is just an extra thing which you get. The main thing is it’s a build tool. It’s a build tool you can use. Let’s see. It actually provides build cache for tasks like build and test. As of today, one thing which we all are doing is we are building the same code again and again. Nx takes another approach. The founders actually are from Google. Everyone knows Google has different tools.

If you have a colleague from Google, you keep hearing about, we had this tool and we had that tool, and how crazy it was. Of course, these people, they used to work in the Angular team. They took this idea of Bazel. Bazel was, of course, the idea, because Google uses it a lot. They built the entire Nx based on it. Eventually, they launched it for Angular first, and then now it’s platform technology independent. As I said, it’s framework and technology agnostic. You can use it for anything. It’s plugin based, so you can bring your own framework. If there is no support for any existing technology, you can just add it. Or if you have any homegrown framework, you build it on your own. You can also bring it as a plugin, as part of Nx, and you can start getting all the features which Nx offers.

For example, build cache. It supports all the major frameworks out of the box. For example, Angular, React, Vue. On top of it, it supports micro-frontend. If you want to do micro-frontend with React or Angular, it’s just easy. I’ll show you the commands. It also supports backend technologies. They have support for .NET, Java, Spring. They have support for Python. They also added support for Gradle recently. As I said, it’s wide.

Celonis Codebase

This is our codebase as of today. We have 2 million lines of code. We have close to 40-plus applications. We have 200 projects. Why are applications not projects? Because we also have libraries. We try to split our code into smaller chunks using libraries, so that’s why we have close to 200 projects. Then, more than 40-plus teams which are contributing to this codebase. We get close to 100 PRs per day. That’s average. There are some times where we get more. With module federation, this is what we do today. We are not loading those applications via URL routing. It’s just the Angular application loads natively. We have multiple applications here. Shell app is something which just renders your nav bar.

Then you can access any apps. It just feels like you’re a single page application. There is no reload. We can do tree shaking. We can actually do code splitting. We can also share or reduce the need to share our design system across the application, because now we have to do it only once. These are some tasks which we run for each and every PR. Of course, we do build. Once you write your code, the first thing which you do is you build your project. Then we write unit tests. We use Jest. We also have Cypress component test to write our test. Then we, of course, run it on the CI as well. Before we merge our PR, we also run end-to-end test. We are using Playwright for writing our end-to-end test or user journey.

Then, let’s see how to start using module federation with Angular. You can just use this command. You can generate nx generate. For any framework, you will find nx generate. Then you will say nx, and the framework name. You can just here, for example, replace Angular with React, and you get your module federated app or micro-frontend app for your React application. These remotes are actually applications which will be loaded when you route through your URLs. For example, home, about, blogs, this can be different URLs which we have. They are actually different applications. It means your three teams can work on three different applications but, at the end, they will be loaded together.

Feature Flags

We use feature flags a lot because when we started migrating all of the codebase, it became a mess. Of course, a lot of teams started pushing their code in a single codebase. We were coming from different teams. A different team had their different ways to write code. We had feature flags for backend. Of course, that was something which was taken care of. At the frontend, we were seeing a lot of errors. We thought of creating a feature flag framework for our frontend application. This is how it feels like without feature flag. I’ve seen this meme many times. This always says, this is fine. We believe this is not fine. If your organization is always on fire, this is not fine. This is not fine for everyone. You should not do 24 by 7 just monitoring your systems because you just did a release. This is where we started. Of course, we had a lot of fires.

Then we decided, of course, we will have our own feature flag framework for frontend applications. This is what we used to think before we had a feature flag. We’re used to, ok, backend, frontend, we will merge it. Then everything goes fine. We’ll do a release, and everyone is happy. This is not the reality. This looks good on paper but, in reality, this is what happens once you merge your code. Everything just collapses. We started with this. We started creating our frontend feature flag to do this. We now have the ability to ship a feature based on a user, based on a cluster. We can also define how many percentages of users or customers we want to ship this feature to. Or we can also ship a specific build. We generally try to avoid this. This is something which we use for our POCs.

Let’s say if you want to do a POC for a particular customer, we can say, just use this build. That customer will do its POC, and if they’re fine or they’re happy with this, we can go ahead and write for the code. For example, of course, we have to still write tests. We have to write user journey test. This is just for POC. We can also combine all of the above. We ended up with this. We started seeing, now there are less bugs, because now the bugs are isolated, because they are behind a feature flag. We also have the ability to roll back a feature flag if anything goes wrong. We don’t have to roll back the entire release, which was the case earlier. Now we are shipping features with more confidence, which we need.

Before you ask me which feature flag solution we are using, I’m not here to sell anything. We built our own. We decided to build our own. How? Again, Nx comes into the picture. Because Nx, as I said, is plugin based. You can build anything and just create it as a plugin. You get everything out of the box. It feels native. It feels like you are still working with Nx. This is the command. You can just say, nx add and a new plugin. You can define where you want to put that plugin into. For our feature flag solution, we use a lot of YAML files. We added all the code to read those YAML files as part of our plugin. It’s available for everyone.

One thing which you have to focus more on, in case you are creating a custom solution, is developer experience. Otherwise, no one will use it. We also added the ability to enable/disable flags. Developers can just raise a PR and enable and disable a feature flag. We also added some checks that no one should disable a flag in case it’s already being used, and no one knows about it. There are some checks. Like, for example, your release manager or your team lead has to approve it.

Otherwise, someone just does it by mistake. Then we also have a dashboard where you can see which features are enabled and in which environment. Our developers can also see that. We also have a weekly alert, just in case there is a feature flag which is GA now, and it’s available for everyone. We also send a weekly alert so developers can go ahead and remove those feature flags. This is fine, because we know where the fire is, and we can just roll it back.

Proof of Concepts

Of course, when you have a monorepo, the other problem which we have seen is that a lot of teams are actually not fans of monorepos, because they think they’re being restricted to do anything. This is where we came up with the idea like, what if teams want to do a proof of concept? Recently, there were a few teams which said, we want to come into the monorepo, but the problem is our code is something which is a POC. We don’t want to write tests, because we also have checks. I think most of you might have checked for your test coverage. You should have 80%, or 90%, or whatever. I don’t know why we keep it, but it’s useful, just to see the numbers.

Then we said, let’s give you a way so you can start creating POCs, and we will not be a blocker for you anymore. In Angular, you can just say, I’ll define a new remote, and that’s it. A new application is created. They can just do it. Another issue is, most of the enterprises, they have their own way of creating applications. They may need some customization. That, I want to create an application, but I need some extra files to be created when I create this application. Nx offers you that. Nx offers you the way to customize how your projects will be created. For example, in our use case, what we do is whenever we create an Angular application, we also add the ability to write component test. What we did is we just took the functionality from Nx, added all this into a single bundle or a single plugin, and we gave it to our developers.

That whenever you create a new application, you will also get component test out of the box. Or let’s say it can be your Cypress, or it can be your Playwright, or it can be anything which you like. For example, you want to create some extra files, for example, maybe Dockerfile, or maybe something related to your deployment, which is mandatory for each and every app. You can customize the way your applications are created by using the generators. This is called Nx Generator. As I said, you can also create files. You can define the files wherever you want to. Generally, we use files as a folder. You can put all the files.

For example, as I said, Dockerfile, or any other files which you need for configuration. You can pass them as a parameter. It uses a format called EJS. I’m not sure how many people are aware of EJS. It uses a syntax called EJS to replace any variables into the actual file. Here, I’m talking about the actual file. This is not any temporary files. I’m talking about the actual files which will be written on the drive. You can all do this with the help of Nx Generator. This is what we do whenever someone creates a new application. We just add some things out of the box.

Maintaining a Large Codebase

When it comes to maintaining a large codebase, because now we are here, we have 2 million lines of code in a single repository, there are a few things which we have to take care of. For example, refactoring. We do a lot of refactoring because we got the legacy code. I’m sure everyone loves legacy code, because you love to hate it. Then, we keep doing deprecations. This is one thing I think we are doing better, that we are doing deprecations. As soon as we see some old code, we start deprecating that code if it’s not used. Then, migration. Of course, over the period of time, we have migrated multiple apps into our monorepo.

We still support, just in case anyone wants to migrate their code to our monorepo. It took us time. It took us close to two years. Now we are at the stage where I think we have only one app, which is outside our monorepo. This is not going to happen in a day, but you have to start someday. Then, adding linters and tools. Of course, this is very important for any project. You need to have linters today. You may need to add tools tomorrow. Especially with the JavaScript ecosystem, there is a tool every one hour, I think. Then, helping team members. This is very important in case you are responsible for managing your monorepo. I’m sure if you end up doing this, initially you will end up actually doing this a lot.

Most of the time, you’ll be helping your new developers onboard into a monorepo. This is very important, again. Documentation, this is critical, because if you don’t do this, then more developers will rely on you, which you don’t want to. It will take your time away. Then the ability to upgrade Angular framework for everyone. Whatever framework you use, we use Angular, but in case you use React or Vue. This is what we wanted. This is what comes under the maintaining our monorepo. How do we do this? For example, Nx offers something called nx graph. If I run nx graph, I get this view, where I can see all the applications, all the projects.

I can figure out which library is dependent on which app. If I want to refactor something, I can just check if this is being used or not by using the nx graph. Or if there is something refactored which is required, I can just look at this graph and say, probably this UI should not be used in home, it should be used in blogs. Then you can just refactor your code. It helps a lot during refactoring and during deprecations as well.

Now, talking about the migrations. As I said, you may have to migrate a lot of code to your monorepo once you start, because all the code is available in different repositories. Nx offers you a command called nx import, where you can define your source repository and your destination repository, and it will migrate your code with your Git history. This command just came in the last release. From past years, we have been doing it manually. We did it for more than 30 repositories, but we did it manually. The same thing is now available as part of Nx. You can just run this command and do everything automatically. We deploy our documentation on Backstage.

This is what we do, so everyone is aware of where the documentation is. We use Slack for all the communications or any new initiatives or deprecations which we are announcing. We have a dedicated Slack channel, so just in case developers have any questions, they can ask on this channel. It actually improves the knowledge sharing as well, because if someone already knows something, we don’t have to jump in and say, this is how you should do it. It reduced a lot of dependency from us, the core team. Education is important.

We started doing a lot of workshops initially when we moved to a monorepo, just to give the confidence to the developers that we are not taking anything from you. We are actually giving you more control over your codebase, and we are just here to support. We started educating. We did multiple workshops. Whenever we add a new tool, we do a workshop. That’s very important.

Tools

As I said, every other hour, you are getting a tool. What should you use? Which tool should you add? This is true that, of course, introducing a tool in a new codebase is very time consuming. You may actually end up doing probably two, three days just to figure out how to make this tool work. At the same time, sometimes adding a tool is easy, but maintaining it is hard. Because as soon as you add it, there is a new tool, which is available the next hour, which is much more powerful than this. Now you are maintaining this tool, because there is no upgrades. Most of your code is already using this tool, so you cannot actually move away from this now.

At the end of the day, you have to just maintain this code or maintain this tool. Nx makes it easy. It also makes it easy to introduce a new tool and maintain a new tool. Let’s see how. Nx offers you support out of the box for the popular tools, for example, Cypress and Playwright. This is now a go-to tool for writing end-to-end tests. I’m not sure about the others, but it’s widely used in the JavaScript ecosystem. Anyone who starts a new project probably now goes for Playwright, but there was a time that many people were going with Cypress. Nx, just a command, and then you can just start adding or start using this tool. You don’t have to even invest time configuring this. You just start using it. That’s what I’m talking about.

For unit tests, it gives you Jest and Vitest out of the box. You can just add this and then start using it. No time needed to configure this tool. What about the upgrades? Nx offers you something called Migrate. With the migrate command, you can just migrate everything to the latest version. For example, if you’re using React and you want to move to the new React version, you can just say nx migrate latest, and it will migrate your React version. Same for Angular. This is what we do now. We don’t invest a lot of time doing manual upgrades or something. We just use this nx migrate, and our code gets migrated to the new version. It works for all the frameworks, all the technologies which is supported by Nx, but you can also do it for your plugins.

For example, let’s say if you end up writing something for your own company, a new plugin, and you want to push some new updates, you can just write a migration, where this migration tool will just automate the migration for your codebase, and your developers don’t have to even worry about what’s happening. Of course, you have to make sure that you test it properly before shipping.

Demo

I’ll show you a small demo, because everything we saw was a picture. Always believe when you see something running, otherwise, don’t. This is how your nx graph looks like, whenever you run nx graph, and you can click on Show all projects. Then you can hover on any project and see how it is connected, like how it’s being used, which application is dependent on which application. For example, shell, you see dotted lines. Dotted lines is lazy loading. It means they are not directly related, but they are related.

For example, Home and UI, it says that there is a direct dependency. You can figure out all this from nx graph. It also gives you the ability to see tasks, tasks like build or lint. Let’s say if you make a code change, you can figure out what tasks will be run after my code change. Which builds will be running? Which applications will be affected? Everything you can figure out from this nx graph. This is free, so you don’t have to pay. I’m just saying this is one of the best features which I have seen, which is available for free. Let me show you the build. I talked about caching. Let’s run a build, nx run home:build. I’ll just do production build. It’s running the build. This line is important. It says, 1 read from cache. Let’s say if you make some changes, like right now, one thing about monorepo, people think I have 40 projects.

Whenever I make changes, my 40 projects will be built. Monorepos have actually a bad name for this. I have done .NET, so I know. We used to have so many projects, and then rerun the same code or build the same code again and again, but not with Nx. Nx knows your dependency graph, so it can figure out what needs to be built again and what needs to be read from the cache. They do it really well. Here we can see one read from cache, because I already built it before. It just retrieved the same build from the cache. Now let’s say, 40 teams working on 40 different apps, but one team makes changes to its own app, then 39 apps are not built again, because Nx knows from dependency graph that this application is not affected, so I don’t have to build anything.

If I try to build it again, so next time it will just retrieve everything from cache. Now it’s faster than before. It says now it took 3 seconds, which earlier was 10 seconds. This is what Nx offers you out of the box. Nx is available for your builds, your test, your component test, or your end-to-end test, anything. All the tasks can be cached. This is caching.

CI/CD

Of course, CI/CD, there is always one guy in your team who is asking for faster builds. I was one of them. We use GitHub Actions with Nx, which gives us superpower. How do we do it? We use actually larger runners on GitHub Actions. We use our own machines. We used to use GitHub-provided machines, but it was too expensive for us. We moved on to using our own machines now. We use Merge Queue to run end-to-end tests. I’ll talk about Merge Queue, because this is an amazing feature given by GitHub. This is only available for enterprises. We can cache build for faster build and test, which we saw on the local. What we saw was on the local. I’ll show you how we do it on CI. Let’s talk about Merge Queue and user journey test first.

One thing about user journey test is they are an excellent way to avoid bugs. Everyone knows. Because you are testing in a real-time simulation, because you are actually going to log in and click on a button to process something. We all know that if you try running user journey on every PR, it will be very expensive, because we are interacting with the real database. It may take a lot of time to complete your build. We also know that when you are running multiple branches, this is another issue. Because the next branch will soon go out of sync with the main branch because you already have latest changes in main branch.

Then running the user journey test again on an old branch is pointless because now you don’t have latest changes. It means there is chances that you may introduce errors. This is where actually Merge Queue was introduced by GitHub. Let’s see how it works. Let’s say these are four PRs in your pipeline, PR is pull request, and PR 4 fails, so it’s removed from your queue. These three PRs, PR 1, PR 2, PR 3, will be sent to your Merge Queue. Merge Queue is actually a feature provided by GitHub, which you can enable from your settings. You can define how many PRs you need to consider for Merge Queue. We do 10. Ten PRs will be pushed to Merge Queue at once. You can change. Because we have 100 PRs per day, we found that this is our average. We can do 10.

In your case, if you get more PRs, you can just increase the number of PRs which you want to push into Merge Queue. Then once it goes to Merge Queue, this is how it works. GitHub will create a new branch from your PR, the first PR, and the base branch will be main. Then it will rebase your changes from PR 1 to this new branch, which is created, but it will not do anything else. The branch is created. That’s it. Then it creates another branch called PR 1, PR 2. Now the PR 1 branch is your base. Then it will merge PR 2 changes into this branch. Now it’s latest code. Same with PR 3. Now it will create PR 1, PR 2, PR 3, take PR 1, PR 2 as base, and PR 3 changes will be merged to this branch.

After this, it will run all the tasks which are actually available on your CI/CD. For example, you run build, you run test, you run your component test, plus user journey test. Whenever you are running user journey test, you are running it on latest code. It’s not the old code which is out of sync. Yes, it reduces the number of errors you have.

Before I go with affected, I want to give some stats, like how we are doing today. With 2 million lines of code, 200 projects, as of today, our average time for each PR is 12 minutes. For entire rebuild, it’s 30 minutes. It’s all possible because we take usage of affected builds. Because Nx knows what has been affected, so this is what it does internally. For example, Lib1, Lib2, it affects five different applications. Your change is this. You push a new code, which affects your library 1, in turn affects App1 and App3. What we will do is we will just run affected tasks. We will say, run affected and do build, lint, test. That’s it. We retrieve the cache from S3 bucket.

As of today, we are using S3 bucket to push our cache and then retrieve it back whenever there is a change. We just retrieve it back from the S3 bucket. You can do it if you have money. There is a paid solution by Nx, it’s called Nx Cloud. You can just remove this. You don’t have to do it on your own. Nx Cloud can take care of everything for you. It can actually even do cache hit distribution. I’m talking about cache hit distribution on your CI pipeline as well as on your developer’s machine. Your developers can get the latest build, which is available on the cache, and they don’t have to build even a single thing. It’s very powerful, especially if you are onboarding new developers. They can just join your team on day one, within one hour, they are running your code without doing anything, because everything is already built.

As soon as they make changes, they are just building their own code and not everything. If you want to explore Nx Cloud, just go to nx.dev, and then you will find a link for Nx Cloud. As of today, we are not using Nx Cloud because it was probably too expensive for us and not a good fit, but if you have a big organization? As I said, Nx Cloud works for everyone. It’s not only for frontend or backend: any technology, any framework. This is an example from our live code. We have our design system. For example, when I tried to run it for the first time, it took 48 seconds. The next run took us 0.72 seconds, not even a second. This is a crazy level of time which we save on every time we build something. Our developers are saving a lot of time. They are drinking less coffee.

Release Strategy

The last thing is about the release strategy. One thing at Celonis, is we love our weekends. I’m sure everyone loves their weekend, but we really care about it. Our release strategy is actually built around the same, that we don’t have to work on weekends. This is what we do. Of course, we have 40-plus apps, so we know that this is risky, so we don’t do Friday releases. Because it’s not fun, going home and working on Saturdays and Sundays to fix some bugs. What we do today, we create a new release candidate every Monday morning. Then we ask teams to run their test. It’s a journey. There are teams who have automated tests. There are teams who don’t have automated tests. They do manual or whatever way they are doing, or they just say, ok, it’s fast. You should not do that, but, yes, that might be a possibility. They execute their tests, automated or manual.

If everything goes fine, we deploy by Wednesday or Thursday. Wednesday is our timeline that we ask every team to finish their test by Wednesday, or worst case, Thursday. If something goes wrong, we say, no release this week. Because we are already on Thursday, if we do a release, it means our weekends are destroyed. We don’t like that. We really care about our weekends, so we cancel our release, and then we say, we’ll come back on Monday and then see if it goes ahead and we can do a deployment. If everything goes green, we just deploy and then go home and monitor it for Monday, either Thursday or Friday, based on when we release. Everything is happy. Then we do this again next week.

Of course, there are some manual interventions which are required here. This is where we want to be. Of course, every company has a vision. Every person has a vision. We also have a vision. This is what we want to do. We want to create a release candidate every day. If CI/CD is green, we want to deploy to production. That’s it. If there’s something which goes wrong, we want to cancel our deployment and do it next day. Renato accidentally mentioned 40 releases per week. We at least want to do five releases a week. That’s our goal. Probably we will be there one day. We are probably very close to that, but it will take us some time.

Questions and Answers

Participant 1: I have a question about end-to-end test. As I understand you call it user journey test. How do you debug that in this huge setup of 40 teams? Let’s say if test is red, how do I understand root causes? It can be a problematic red.

Yadav: Playwright actually has a very good way to debug the test. We use Playwright. Then it comes with a debug command. You can just pass, –debug, and whichever application is giving us an error, you can just debug that particular application. You don’t have to debug 40 applications. We also have insights. Whenever we run tests, we push the data of success and failure on Datadog. We display it on our GitHub summary. Even the developer knows which test is failing. They don’t have to look into the void and see, what’s going wrong? They know, this is the application, and this is what I have to debug.

Participant 2: I was wondering if you also integrate backend systems into this monorepo, or if it was a conscious decision not to do so.

Yadav: It does support. As I said, you can actually use your backend, like .NET Core. I think it supports Spring, as well as Maven. Now they added support for Gradle as well. You can bring whatever framework or whatever technology you want to. We are not using it because I think that’s not a good use case for us. I think more teams will be happy with having the current setup where they own the backend, and the frontend is owned by a single team.

Participant 3: How do you handle major framework updates or, for example, design system updates? Because I think in the diagram you showed that you try to do it like every day release. I can imagine that with many breaking changes, this is not how it can work. You need more time to test and make sure it’s still working.

Yadav: We actually recommend every developer write their own test. It’s not like another team who is writing the test. That’s one thing. Of course, about the upgrades, this is what we do. We have the ability to push a specific build. For example, Angular 14 upgrade, which was a really big upgrade for us, because after Angular 13, we were doing it for the first time, and there were some breaking changes. We realized very early that there are some breaking changes. We wanted to play safe. What we did is with feature flag, we started loading only Angular 14 build for some customers and see how it goes. We rolled it out initially for our internal customers, like our users.

Then we ran it for a week. We saw, ok, everything is fine. Everything is good. Then we rolled it out for 20% of the users. Then we monitored it again for a week. Then 50%, and now we will go 100%. This is very safe. We don’t have any unexpected issues. With design system, we do it weekly. It’s like design system is owned by another team, so they make all the changes. They also do it on Monday. They get enough time, like four or five days, to test their changes, and then make it stable before the next release goes.

Participant 4: You explained about the weekly release. How do you handle hotfix with so many teams?

Yadav: Of course, there will be hotfixes, we cannot avoid this. There will be some code which goes by mistake on release. We try to capture hotfixes or any issues on release before they go on to production. Just in case there is anything which needs to be hotfixed, they generally create a PR with the last release, which is there. Then we create a new hotfix. It’s all automated. You just need to create a new release candidate from the last build, which we had, and just push a new build again. Good thing is, with the setup, it’s not like we have to roll back the entire release.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Cloudflare Launches Media Transformations: Optimizing Short-Form Video

MMS Founder
MMS Steef-Jan Wiggers

Article originally posted on InfoQ. Visit InfoQ

To streamline video optimization for the explosion of short-form content, Cloudflare has launched Media Transformations, a new service that extends its Image Transformations capabilities to short-form video files, regardless of their storage location, eliminating the need for complex video pipelines.

With the service, the company aims to simplify video optimization for users with large volumes of short video content, such as AI-generated videos, e-commerce product videos, and social media clips.

Traditionally, Cloudflare Stream offered a managed video pipeline, but Media Transformations addresses the challenge of migrating existing video files. By allowing users to optimize videos directly from their existing storage, like Cloudflare R2 or S3, Cloudflare aims to reduce friction and streamline workflows.

(Source: Cloudflare blog post)

Media Transformations enables users to apply various optimizations through URL-based parameters. Using URL parameters, Media Transformations enables automation and integration, allowing dynamic video adjustments without complex code changes – simplifying workflows and ensuring optimized video delivery across various platforms and devices.

The key features of the service include:

  • Format Conversion: Outputting videos as optimized MP4 files.
  • Frame Extraction: Generating still images from video frames.
  • Video Clipping: Trimming videos with specified start times and durations.
  • Resizing and Cropping: Adjusting video dimensions with “fit,” “height,” and “width” parameters.
  • Audio Removal: Stripping audio from video outputs.
  • Spritesheet Generation: creating images with multiple frames.

The service is accessible to any website already using Image Transformations and new zones can be enabled through the Cloudflare dashboard. The URL structure for Media Transformations mirrors Image Transformations, using the /cdn-cgi/media/ endpoint.

Initial limitations include a 40MB file size cap and support for MP4 files with h.264 encoding.  Users like Philipp Tsipman, founder of CamcorderAI, quickly pointed out the initial limitations, tweeting:

I really wish the media transforms were much more generous. The example you gave would actually fail right now because QuickTime records .mov files. And they are BIG!

Cloudflare plans to adjust input limits based on user feedback and introduce origin caching (Cloudflare stores frequently accessed original videos closer to its servers, reducing the need to fetch them repeatedly from the source).

Internally, Media Transformations leverages the same On-the-Fly Encoder (OTFE) platform Stream Live uses, ensuring efficient video processing. Cloudflare aims to unify Images and Media Transformations to simplify the developer experience further.

In addition to the Cloudflare offering, alternatives are available regarding video optimization, such as Cloudinary, ImageKit, and Gumlet, which have comprehensive features for format conversion, resizing, and compression. Other cloud providers, such as Google Cloud Platform, offer various cloud services, including video processing and delivery solutions. While not solely focused on video transformation, it provides the building blocks for creating custom solutions.

Lastly, Cloudflare highlights use cases such as optimizing product videos for e-commerce, creating social media snippets, and generating thumbnails. The service is currently in beta and free for all users until Q3 2025, after which it will adopt a pricing model similar to Image Transformations.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MongoDB Inc.: Will Its Diversification & Expansion of Atlas Platform Help Achieve the Set Goal?

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Begin exploring Smartkarma’s AI-augmented investing intelligence platform with a complimentary Preview Pass to:

  • Unlock research summaries
  • Follow top, independent analysts
  • Receive personalised alerts
  • Access Analytics, Events and more

Join 55,000+ investors, including top global asset managers overseeing $13+ trillion.

Upgrade later to our paid plans for full-access.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Article: Beyond Trends: A Practical Guide to Choosing the Right Message Broker

MMS Founder
MMS Nehme Bilal

Article originally posted on InfoQ. Visit InfoQ

Key Takeaways

  • Message brokers can be broadly categorized as either stream-based or queue-based, each offering unique strengths and trade-offs.
  • Messages in a stream are managed using offsets, allowing consumers to efficiently commit large batches in a single network call and replay messages by rewinding the offset. In contrast, queues have limited batching support and typically do not allow message replay, as messages are removed once consumed.
  • Streams rely on rigid physical partitions for scaling, which creates challenges in handling poison pills and limits their ability to dynamically auto-scale consumers with fluctuating traffic. Queues, such as Amazon SQS and FIFO SQS, use low-cardinality logical partitions (that are ordered), enabling seamless auto-scaling and effective isolation of poison pills.
  • Streams are ideal for data replication scenarios because they enable efficient batching and are generally less susceptible to poison pills.
  • When batch replication is not required, queues like Amazon SQS or FIFO SQS are often the better choice, as they support auto-scaling, isolate poison pills, and provide FIFO ordering when needed.
  • Combining streams and queues allows organizations to standardize on a single stream solution for producing messages while giving consumers the flexibility to either consume directly from the stream or route messages to a queue based on the messaging pattern.

Messaging solutions play a vital role in modern distributed systems. They enable reliable communication, support asynchronous processing, and provide loose coupling between components. Additionally, they improve application availability and help protect systems from traffic spikes. The available options range from stream-based to queue-based services, each offering unique strengths and trade-offs.

In my experience working with various engineering teams, selecting a message broker is not generally approached with a clear methodology. Decisions are often influenced by trends, personal preference, or the ease of access to a particular technology; rather than the specific needs of an application. However, selecting the right broker should focus on aligning its key characteristics with the application’s requirements – this is the central focus of this article.

We will examine two of the most popular messaging solutions: Apache Kafka (stream-based) and Amazon SQS (queue-based), which are also the main message brokers we use at EarnIn. By discussing how their characteristics align (or don’t) with common messaging patterns, this article aims to provide insights that will help you make more informed decisions. With this understanding, you’ll be better equipped to evaluate other messaging scenarios and brokers, ultimately choosing the one that best suits your application’s needs.

Message Brokers

In this section, we will examine popular message brokers and compare their key characteristics. By understanding these differences, we can evaluate which brokers are best suited for common messaging patterns in modern applications. While this article does not provide an in-depth description of each broker, readers unfamiliar with these technologies are encouraged to refer to their official documentation for more detailed information.

Amazon SQS (Simple Queue Service)

Amazon SQS is a fully managed message queue service that simplifies communication between decoupled components in distributed systems. It ensures reliable message delivery while abstracting complexities such as infrastructure management, scalability, and error handling. Below are some of the key properties of Amazon SQS.

Message Lifecycle Management: In SQS, the message lifecycle is managed either individually or in small batches of up to 10 messages. Each message can be received, processed, deleted, or even delayed based on the application’s needs. Typically, an application receives a message, processes it, and then deletes it from the queue, which ensures that messages are reliably processed.

Best-effort Ordering: Standard SQS queues deliver messages in the order they were sent but do not guarantee strict ordering, particularly during retries or parallel consumption. This allows for higher throughput when strict message order isn’t necessary. For use cases that require strict ordering, FIFO SQS (First-In-First-Out) can be used to ensure that messages are processed in a certain order (more on FIFO SQS below).

Built-in Dead Letter Queue (DLQ): SQS includes built-in support for Dead Letter Queues (DLQs), which help isolate unprocessable messages.

Write and Read Throughput: SQS supports effectively unlimited read and write throughput, which makes it well-suited for high-volume applications where the ability to handle large message traffic efficiently is essential.

Autoscaling Consumers: SQS supports auto-scaling compute resources (such as AWS Lambda, EC2, or ECS services) based on the number of messages in the queue (see official documentation). Consumers can dynamically scale to handle increased traffic and scale back down when the load decreases. This auto-scaling capability ensures that applications can process varying workloads without manual intervention, which is invaluable for managing unpredictable traffic patterns.

Pub-Sub Support: SQS does not natively support pub-sub, as it is designed for point-to-point messaging where each message is consumed by a single receiver. However, you can achieve a pub-sub architecture by integrating SQS with Amazon Simple Notification Service (SNS). SNS allows messages to be published to a topic, which can then fan out to multiple SQS queues subscribed to that topic. This enables multiple consumers to receive and process the same message independently, effectively implementing a pub-sub system using AWS services.

Amazon FIFO SQS

FIFO SQS extends the capabilities of Standard SQS by guaranteeing strict message ordering within logical partitions called message groups. It is ideal for workflows that require the sequential processing of related events, such as user-specific notifications, financial transactions, or any scenario where maintaining the exact order of messages is crucial. Below are some of the key properties of FIFO SQS.

Message Grouping as Logical Partitions: In FIFO SQS, each message has a MessageGroupId, which is used to define logical partitions within the queue. A message group allows messages that share the same MessageGroupId to be processed sequentially. This ensures that the order of messages within a particular group is strictly maintained, while messages belonging to different message groups can be processed in parallel by different consumers. For example, imagine a scenario where each user’s messages need to be processed in order (e.g., a sequence of notifications or actions triggered by a user).

By assigning each user a unique MessageGroupId, SQS ensures that all messages related to a specific user are processed sequentially, regardless of when the messages are added to the queue. Messages from other users (with different MessageGroupIds) can be processed in parallel, maintaining efficient throughput without affecting the order for any individual user. This is a major benefit for FIFO SQS in comparison to standard SQS or stream based message brokers such as Apache Kafka and Amazon Kinesis.

Dead Letter Queue (DLQ): FIFO SQS provides built-in support for Dead Letter Queues (DLQs), but their use requires careful consideration as they can disrupt the strict ordering of messages within a message group. For example, if two messages – message1 and message2 – belong to the same MessageGroupId (e.g., groupA), and message1 fails and is moved to the DLQ, message2 could still be successfully processed. This breaks the intended message order within the group, defeating the primary purpose of FIFO processing.

Poison Pills Isolation: When a DLQ is not used, FIFO SQS will continue retrying the delivery of a failed message indefinitely. While this ensures strict message ordering, it can also create a bottleneck, blocking the processing of all subsequent messages within the same message group until the failed message is successfully processed or deleted.

Messages that repeatedly fail to process are known as poison pills. In some messaging systems, poison pills can block an entire queue or shard, preventing any subsequent messages from being processed. However, in FIFO SQS, the impact is limited to the specific message group (logical partition) the message belongs to. This isolation significantly mitigates broader failures, provided message groups are thoughtfully designed.

To minimize disruption, it’s crucial to choose the MessageGroupId in a way that keeps logical partitions small while ensuring that ordered messages remain within the same partition. For example, in a multi-user application, using a user ID as the MessageGroupId ensures that failures only affect that specific user’s messages. Similarly, in an e-commerce application, using an order ID as the MessageGroupId ensures that a failed order message does not impact orders from other customers.

To illustrate the impact of this isolation, consider a poison pill scenario:

  • Without isolation (or shard-level isolation), a poison pill could block all orders in an entire region (e.g., all Amazon.com orders in a country).
  • With FIFO SQS isolation, only a single user’s order would be affected, while others continue processing as expected.

Thus, poison pill isolation is a highly impactful feature of FIFO SQS, significantly improving fault tolerance in distributed messaging systems.

Throughput: FIFO SQS has a default throughput limit of 300 messages per second. However, by enabling high-throughput mode, this can be increased to 9,000 messages per second. Achieving this high throughput requires careful design of message groups to ensure sufficient parallelism.

Autoscaling Consumers: Similar to Standard SQS, FIFO SQS supports auto-scaling compute resources based on the number of messages in the queue. While FIFO SQS scalability is not truly unlimited, it is influenced by the number of message groups (logical partitions), which can be designed to be very high (e.g. a message group per user).

Pub-Sub Support: Just like with Standard SQS, pub-sub can be achieved by pairing FIFO SQS with SNS, which offers support for FIFO topics.

Apache Kafka

Apache Kafka is an open-source, distributed streaming platform designed for real-time event streaming and high-throughput applications. Unlike traditional message queues like SQS, Kafka operates as a stream-based platform where messages are consumed based on offsets. In Kafka, consumers track their progress by moving their offset forward (or backward for replay), allowing multiple messages to be committed at once. This offset-based approach is a key distinction between Kafka and traditional message queues, where each message is processed and acknowledged independently. Below are some of Kafka’s key properties.

Physical Partitions (shards): Kafka topics are divided into physical partitions (also known as shards) at the time of topic creation. Each partition maintains its own offset and manages message ordering independently. While partitions can be added, this may disrupt ordering and requires careful handling. On the other hand, reducing partitions is even more complex and generally avoided, as it affects data distribution and consumer load balancing. Because partitioning affects scalability and performance, it should be carefully planned from the start.

Pub-Sub Support: Kafka supports a publish-subscribe model natively. This allows multiple consumer groups to independently process the same topic, enabling different applications or services to consume the same data without interfering with each other. Each consumer group gets its own view of the topic, allowing for flexible scaling of both producers and consumers.

High Throughput and Batch Processing: Kafka is optimized for high-throughput use cases, enabling the efficient processing of large volumes of data. Consumers can process large batches of messages, minimizing the number of reads and writes to Kafka. For instance, a consumer can process up to 10,000 messages, save them to a database in a single operation, and then commit the offset in one step, significantly reducing overhead. This is a key differentiator of streams from queues where messages are managed individually or in small batches.

Replay Capability: Kafka retains messages for a configurable retention period (default is 7 days), allowing consumers to rewind and replay messages. This is particularly useful for debugging, reprocessing historical data, or recovering from application errors. Consumers can process data at their own pace and retry messages if necessary, making Kafka an excellent choice for use cases that require durability and fault tolerance.

Handling Poison Pills: In Kafka, poison pills can block the entire physical partition they reside in, delaying the processing of all subsequent messages within that partition. This can have serious consequences on an application. For example, in an e-commerce application where each region’s orders are processed through a dedicated Kafka shard, a single poison pill could block all orders for that region, leading to significant business disruptions. This limitation highlights a key drawback of strict physical partitioning compared to logical partitioning available in queues such as FIFO SQS, where failures are isolated within smaller message groups rather than affecting an entire shard.

If strict ordering is not required, using a Dead Letter Queue can help mitigate the impact by isolating poison pills, preventing them from blocking further message processing.

Autoscaling Limitations: Kafka’s scaling is constrained by its partition model, where each shard (partition) maintains strict ordering and can be processed by only one compute node at a time. This means that adding more compute nodes than the number of partitions does not improve throughput, as the extra nodes will remain idle. As a result, Kafka does not pair well with auto-scaling consumers, since the number of active consumers is effectively limited by the number of partitions. This makes Kafka less flexible in dynamic scaling scenarios compared to messaging systems like FIFO SQS, where logical partitioning allows for more granular consumer scaling.

Comparison of Messaging Brokers

Feature Standard SQS FIFO SQS Apache Kafka
Message Retention Up to 14 days Up to 14 days Configurable (default: 7 days)
Pub-Sub Support via SNS via SNS Native via consumer groups
Message Ordering Best-effort ordering Guaranteed within a message group Guaranteed within a physical partition (shard)
Batch Processing Supports batches of up to 10 messages Supports batches of up to 10 messages Efficient large-batch commits
Write Throughput Effectively unlimited 300 messages/second per message group Scalable via physical partitions (millions of messages/second achievable)
Read Throughput Unlimited 300 messages/second per message group Scalable via physical partitions (millions of messages/second achievable)
DLQ Support Built-in Built-in but can disrupt ordering Supported via connectors but can disrupt ordering of a physical partition
Poison Pill Isolation Isolated to individual messages Isolated to message groups Can block an entire physical partition
Replay Capability Not supported Not supported Supported with offset rewinding
Autoscaling Consumers Unlimited Limited by the number of message groups (i.e. nearly unlimited in practice) Limited by the number of physical partitions (shards)

Messaging Patterns and Their Influence on Broker Selection

In distributed systems, messaging patterns define how services communicate and process information. Each pattern comes with unique requirements, such as ordering, scalability, error handling, or parallelism, which guide the selection of an appropriate message broker. This discussion focuses on three common messaging patterns: Command Pattern, Event-Carried State Transfer (ECST), and Event Notification Pattern, and examines how their characteristics align with the capabilities of popular brokers like Amazon SQS and Apache Kafka. This framework can also be applied to evaluate other messaging patterns and determine the best-fit message broker for specific use cases.

The Command Pattern

The Command Pattern is a design approach where requests or actions are encapsulated as standalone command objects. These commands are sent to a message broker for asynchronous processing, allowing the sender to continue operating without waiting for a response.

This pattern enhances reliability, as commands can be persisted and retried upon failure. It also improves the availability of the producer, enabling it to operate even when consumers are unavailable. Additionally, it helps protect consumers from traffic spikes, as they can process commands at their own pace.

Since command processing often involves complex business logic, database operations, and API calls, successful implementation requires reliability, parallel processing, auto-scaling, and effective handling of poison pills.

Key Characteristics

Multiple Sources, Single Destination: A command can be produced by one or more services but is typically consumed by a single service. Each command is usually processed only once, with multiple consumer nodes competing for commands. As a result, pub/sub support is unnecessary for commands.

High Throughput: Commands may be generated at a high rate by multiple producers, requiring the selected message broker to support high throughput with low latency. This ensures that producing commands does not become a bottleneck for upstream services.

Autoscaling Consumers: On the consumer side, command processing often involves time-consuming tasks such as database writes and external API calls. To prevent contention, parallel processing of commands is essential. The selected message broker should enable consumers to retrieve commands in parallel and process them independently, without being constrained by a small number of parallel workstreams (such as physical partitions). This allows for horizontal scaling to handle fluctuations in command throughput, ensuring the system can meet peak demands by adding consumers and scale back during low activity periods to optimize resource usage.

Risk of Poison Pills: Command processing often involves complex workflows and network calls, increasing the likelihood of failures that can result in poison pills. To mitigate this, the message broker must support high cardinality poison pill isolation, ensuring that failed messages affect only a small subset of commands rather than disrupting the entire system. By isolating poison pills within distinct message groups or partitions, the system can maintain reliability and continue processing unaffected commands efficiently.

Broker Alignment

Given the requirements for parallel consumption, autoscaling, and poison pill isolation, Kafka is not well-suited for processing commands. As previously discussed, Kafka’s rigid number of physical partitions cannot be scaled dynamically. Furthermore, a poison pill can block an entire physical partition, potentially disrupting a large number of the application’s users.

If ordering is not a requirement, standard SQS is an excellent choice for consuming and processing commands. It supports parallel consumption with unlimited throughput, dynamic scaling, and the ability to isolate poison pills using a Dead Letter Queue (DLQ).

For scenarios where ordering is required and can be distributed across multiple logical partitions, FIFO SQS is the ideal solution. By strategically selecting the message group ID to create numerous small logical partitions, the system can achieve near-unlimited parallelism and throughput. Moreover, any poison pill will only affect a single logical partition (e.g., one user of the application), ensuring that its impact is isolated and minimal.

Event-carried State Transfer (ECST)

The Event-Carried State Transfer (ECST) pattern is a design approach used in distributed systems to enable data replication and decentralized processing. In this pattern, events act as the primary mechanism for transferring state changes between services or systems. Each event includes all the necessary information (state) required for other components to update their local state without relying on synchronous calls to the originating service.

By decoupling services and reducing the need for real-time communication, ECST enhances system resilience, allowing components to operate independently even when parts of the system are temporarily unavailable. Additionally, ECST alleviates the load on the source system by replicating data to where it is needed. Services can rely on their local state copies rather than making repeated API calls to the source. This pattern is particularly useful in event-driven architectures and scenarios where eventual consistency is acceptable.

Key Characteristics

Single Source, Multiple Destinations: In ECST, events are published by the owner of the state and consumed by multiple domains or services interested in replicating the state. This requires a message broker that supports the publish-subscribe (pub-sub) pattern.

Low Likelihood of Poison Pills: Since ECST involves minimal business logic and typically avoids API calls to other services, the risk of poison pills is negligible. As a result, the use of a Dead Letter Queue (DLQ) is generally unnecessary in this pattern.

Batch Processing: As a data-replication pattern, ECST benefits significantly from batch processing. Replicating data in large batches improves performance and reduces costs, especially when the target database supports bulk inserts in a single operation. A message broker that supports efficient large-batch commits, combined with a database optimized for batching, can dramatically enhance application performance.

Strict Ordering: Strict message ordering is often essential in ECST to ensure that the state of a domain entity is replicated in the correct sequence. This prevents older versions of an entity from overwriting newer ones. Ordering is particularly critical when events carry deltas (e.g., “set property X”), as out-of-order events cannot simply be discarded. A message broker that supports strict ordering can greatly simplify event consumption and ensure data integrity.

Broker Alignment

Given the requirements for pub-sub, strict ordering, and batch processing, along with the low likelihood of poison pills, Apache Kafka is a great fit for the ECST pattern.

Kafka allows consumers to process large batches of messages and commit offsets in a single operation. For example, 10,000 events can be processed, written to the database in a single batch (assuming the database supports it), and committed with one network call, making Kafka significantly more efficient than Amazon SQS in such scenarios. Furthermore, the minimal risk of poison pills eliminates the need for DLQs, simplifying error handling. In addition to its batching capabilities, Kafka’s partitioning mechanism enables increased throughput by distributing events across multiple shards.

However, if the target database does not support batching, writing data to the database may become the bottleneck, rendering Kafka’s batch-commit advantage less relevant. For such scenarios, funneling messages from Kafka into FIFO SQS or using FIFO SNS/SQS without Kafka can be more effective. As discussed earlier, FIFO SQS allows for fine-grained logical partitions, enabling parallel processing while maintaining message order. This design supports dynamic scaling by increasing the number of consumer nodes to handle traffic spikes, ensuring efficient processing even under heavy workloads.

Event Notification Pattern

The Event Notification Pattern enables services to notify other services of significant events occurring within a system. Notifications are lightweight and typically include just enough information (e.g., an identifier) to describe the event. To process a notification, consumers often need to fetch additional details from the source (and/or other services) by making API calls. Furthermore, consumers may need to make database updates, create commands or publish notifications for other systems to consume. This pattern promotes loose coupling and real-time responsiveness in distributed architectures. However, given the potential complexity of processing notifications (e.g. API calls, database updates and publishing events), scalability and robust error handling are essential considerations.

Key Characteristics

The characteristics of the Event Notification Pattern overlap significantly with those of the Command pattern, especially when processing notifications involves complex and time consuming tasks. In these scenarios, implementing this pattern requires support for parallel consumption, autoscaling consumers, and isolation of poison pills to ensure reliable and efficient processing. Moreover, the Event Notification Pattern necessitates PubSub support to facilitate one-to-many distribution of events.

There are cases when processing notifications involve simpler workflows, such as updating a database or publishing events to downstream systems. In such cases, the characteristics of this pattern align more closely with those of the ECST pattern.

It should also be noted that different consumers of the same notification may process notifications differently. It’s possible that one consumer needs to apply complex processing while another is performing very simple tasks that are unlikely to ever fail.

Broker Alignment

When the characteristics of the notifications consumer align with those of consuming commands, SQS (or FIFO SQS) is the obvious choice. However, if a consumer only needs to perform simple database updates, consuming notifications from Kafka may be more efficient because of the ability to process notifications in batches and Kafka’s ability to perform large batch commits.

The challenge with notifications is that it’s not always possible to predict the consumption patterns in advance, which makes it difficult to choose between SNS vs Kafka when producing notifications.

To gain more flexibility, at EarnIn we have decided to use Kafka as the sole broker for publishing notifications. If a consumer requires SQS properties for consumption, it can funnel messages from Kafka to SQS using AWS event bridge. If a consumer doesn’t require SQS properties, it can consume directly from Kafka and benefit from its efficient batching capabilities. Moreover, using Kafka instead of SNS for publishing notifications also provides consumers the ability to leverage Kafka’s replay capability, even when messages are funneled to SQS for consumption.

Furthermore, given that Kafka is also a good fit for the ECST pattern and that the command pattern doesn’t require PubSub, we had no reasons left to use SNS. This allowed us to standardize on Kafka as the sole PubSub broker, which significantly simplifies our workflows. In fact, with all events flowing through Kafka, we were able to build tooling that allowed us to replicate Kafka events to a DataLake, which can be leveraged for debugging, analytics, replay / backfilling and more.

Conclusion

Selecting the right message broker for your application requires understanding the characteristics of the available options and the messaging pattern you are using. Key factors to consider include traffic patterns, auto-scaling capabilities, tolerance to poison pills, batch processing needs, and ordering requirements.

While this article focused on Amazon SQS and Apache Kafka, the broader decision often comes down to choosing between a queue and a stream. However, it is also possible to leverage the strengths of both by combining them.

Standardizing on a single broker for producing events allows your company to focus on building tooling, replication, and observability for one system, reducing maintenance costs. Consumers can then route messages to the appropriate broker for consumption using services like EventBridge, ensuring flexibility while maintaining operational efficiency.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


8,861 Shares in MongoDB, Inc. (NASDAQ:MDB) Acquired by Connor Clark & Lunn …

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Connor Clark & Lunn Investment Management Ltd. purchased a new position in shares of MongoDB, Inc. (NASDAQ:MDBFree Report) in the 4th quarter, according to the company in its most recent disclosure with the Securities and Exchange Commission (SEC). The institutional investor purchased 8,861 shares of the company’s stock, valued at approximately $2,063,000.

Other large investors also recently bought and sold shares of the company. Hilltop National Bank grew its stake in shares of MongoDB by 47.2% during the 4th quarter. Hilltop National Bank now owns 131 shares of the company’s stock valued at $30,000 after buying an additional 42 shares during the period. Avestar Capital LLC lifted its stake in shares of MongoDB by 2.0% in the 4th quarter. Avestar Capital LLC now owns 2,165 shares of the company’s stock valued at $504,000 after purchasing an additional 42 shares in the last quarter. Aigen Investment Management LP grew its holdings in shares of MongoDB by 1.4% during the 4th quarter. Aigen Investment Management LP now owns 3,921 shares of the company’s stock worth $913,000 after purchasing an additional 55 shares during the period. Perigon Wealth Management LLC increased its position in MongoDB by 2.7% during the 4th quarter. Perigon Wealth Management LLC now owns 2,528 shares of the company’s stock worth $627,000 after purchasing an additional 66 shares in the last quarter. Finally, MetLife Investment Management LLC lifted its position in MongoDB by 1.6% during the third quarter. MetLife Investment Management LLC now owns 4,450 shares of the company’s stock valued at $1,203,000 after buying an additional 72 shares in the last quarter. Hedge funds and other institutional investors own 89.29% of the company’s stock.

Analyst Upgrades and Downgrades

Several research firms have recently issued reports on MDB. Wedbush dropped their price objective on shares of MongoDB from $360.00 to $300.00 and set an “outperform” rating for the company in a research report on Thursday, March 6th. KeyCorp downgraded MongoDB from a “strong-buy” rating to a “hold” rating in a report on Wednesday, March 5th. Rosenblatt Securities restated a “buy” rating and set a $350.00 price objective on shares of MongoDB in a research note on Tuesday, March 4th. Stifel Nicolaus lowered their target price on MongoDB from $425.00 to $340.00 and set a “buy” rating for the company in a research note on Thursday, March 6th. Finally, Oppenheimer reduced their price target on MongoDB from $400.00 to $330.00 and set an “outperform” rating on the stock in a research report on Thursday, March 6th. One analyst has rated the stock with a sell rating, seven have given a hold rating and twenty-three have assigned a buy rating to the company. Based on data from MarketBeat, the company has a consensus rating of “Moderate Buy” and an average price target of $319.87.

Check Out Our Latest Stock Report on MDB

MongoDB Trading Down 2.3 %

NASDAQ:MDB opened at $188.68 on Wednesday. The firm has a market capitalization of $14.05 billion, a PE ratio of -68.86 and a beta of 1.30. MongoDB, Inc. has a 52 week low of $173.13 and a 52 week high of $387.19. The stock has a 50 day moving average of $254.38 and a 200-day moving average of $271.46.

MongoDB (NASDAQ:MDBGet Free Report) last issued its quarterly earnings data on Wednesday, March 5th. The company reported $0.19 EPS for the quarter, missing analysts’ consensus estimates of $0.64 by ($0.45). The company had revenue of $548.40 million for the quarter, compared to the consensus estimate of $519.65 million. MongoDB had a negative return on equity of 12.22% and a negative net margin of 10.46%. During the same quarter in the previous year, the company earned $0.86 earnings per share. As a group, analysts predict that MongoDB, Inc. will post -1.78 EPS for the current fiscal year.

Insider Buying and Selling

In related news, CEO Dev Ittycheria sold 8,335 shares of the stock in a transaction that occurred on Friday, January 17th. The stock was sold at an average price of $254.86, for a total value of $2,124,258.10. Following the completion of the transaction, the chief executive officer now directly owns 217,294 shares of the company’s stock, valued at $55,379,548.84. This represents a 3.69 % decrease in their position. The sale was disclosed in a filing with the SEC, which is available through the SEC website. Also, Director Dwight A. Merriman sold 3,000 shares of MongoDB stock in a transaction on Thursday, January 2nd. The shares were sold at an average price of $237.73, for a total value of $713,190.00. Following the sale, the director now directly owns 1,117,006 shares in the company, valued at $265,545,836.38. The trade was a 0.27 % decrease in their position. The disclosure for this sale can be found here. Insiders have sold a total of 43,139 shares of company stock valued at $11,328,869 in the last quarter. Company insiders own 3.60% of the company’s stock.

MongoDB Profile

(Free Report)

MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

Further Reading

Want to see what other hedge funds are holding MDB? Visit HoldingsChannel.com to get the latest 13F filings and insider trades for MongoDB, Inc. (NASDAQ:MDBFree Report).

Institutional Ownership by Quarter for MongoDB (NASDAQ:MDB)

This instant news alert was generated by narrative science technology and financial data from MarketBeat in order to provide readers with the fastest and most accurate reporting. This story was reviewed by MarketBeat’s editorial team prior to publication. Please send any questions or comments about this story to contact@marketbeat.com.

Before you consider MongoDB, you’ll want to hear this.

MarketBeat keeps track of Wall Street’s top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis. MarketBeat has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on… and MongoDB wasn’t on the list.

While MongoDB currently has a Moderate Buy rating among analysts, top-rated analysts believe these five stocks are better buys.

View The Five Stocks Here

(Almost)  Everything You Need To Know About The EV Market Cover

Looking to profit from the electric vehicle mega-trend? Enter your email address and we’ll send you our list of which EV stocks show the most long-term potential.

Get This Free Report

Like this article? Share it with a colleague.

Link copied to clipboard.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Prominent Vector Database Market Trend for 2025: Increasing – openPR.com

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Vector Database Market

Vector Database Market

What market dynamics are playing a key role in accelerating the growth of the vector database market?
The expected growth of the vector database market in the coming years is predicted to be driven by the rising adoption of cloud platforms. Such platforms are comprised of the computer’s operating system and server components in a data center connected to the internet. This allows both hardware and software products to independently exist and operate at scale. High-dimensional data can be successfully retrieved and stored using vector databases on cloud platforms, facilitating real-time analytics, deep learning, and custom content suggestions. This is all thanks to the scalable and flexible infrastructure provided by cloud platforms. For example, Eurostat, an intergovernmental organization based in Europe, claimed that in December 2023, 42.5% of EU businesses utilized cloud computing, mainly for e-mail system hosting and electronic file storage. This was a 4.2% rise from 2021. Consequently, the growing use of cloud platforms is fueling the advancement of the vector database market.

Get Your Vector Database Market Report Here:
https://www.thebusinessresearchcompany.com/report/vector-database-global-market-report

How will the growth rate of the vector database market shape industry trends by 2034?
The scale of the vector database market has expanded considerably in the past years. It is projected to escalate from $2.46 billion in 2024 to approximately $3.04 billion in 2025 with a compound annual growth rate (CAGR) of 23.7%. This significant growth during the historical period can be credited to the influx of geospatial data, the surge in location-based services (lbs), the increasing use in smart cities, the onset of the 5g network, and the rising demand for real-time spatial analytics.

Predictions suggest a significant expansion in the vector database market in the upcoming years. By 2029, it is projected to reach a market value of $7.13 billion, with a hefty compound annual growth rate of 23.7%. This foreseeable growth in the designated forecast period can be traced back to the expansion of digital mapping, telecommunication network planning, gps and satellite technology, environmental and natural resource management, as well as logistics and transportation planning. Some noticeable trends to watch for in the said forecast period comprise spatial analytics and business intelligence, alliance with cloud services, real-time geospatial applications, smart infrastructure advancement, and cross-industrial collaboration.

Get Your Free Sample Now – Explore Exclusive Market Insights:
https://www.thebusinessresearchcompany.com/sample.aspx?id=13754&type=smp

What trends are poised to drive the future success of the vector database market?
Leading companies in the vector database market are increasingly integrating vector databases with technologies such as AI vector similarity search to significantly improve generative AI and significantly enhance the productivity of developers. AI vector similarity search is a process where artificial intelligence (AI) algorithms are used to identify and find similar vectors in a dataset. As an illustration, Oracle Corporation, a computer technology company based in the USA, made known its decision to introduce semantic search functionalities using AI vectors into Oracle Database 23c in September 2023. The AI Vector Search suite consists of a cutting-edge vector data type, vector indexes, and vector search SQL operators that permit the Oracle Database to store images, documents, and other types of unstructured information as vectors and utilize these vectors to perform quick likeness queries. The integration of the AI vector similarity search in Oracle Database 23c allows for combined semantic and business data search, resulting in extremely precise replies quickly and securely. These new features also facilitate the use of Retrieval Augmented Generation (RAG), a revolutionary generative AI process that employs large language models (LLMs) and private corporate data to answer natural language questions.

Which primary segments of the vector database market are driving growth and industry transformations?
The vector database market covered in this report is segmented –

1) By Database Type: Relational Vector Databases, NoSQL Vector Databases, NewSQL Vector Databases
2) By Offering: Solutions, Services
3) By Deployment Mode: On-Premises, Cloud-Based
4) By Industry: Financial Services, Healthcare And Life Sciences, Retail And E-Commerce, Manufacturing, Telecommunications, Government And Public Sector, Energy And Utilities, Transportation And Logistics, Media And Entertainment, Other Industries

Subsegments:
1) By Relational Vector Databases: Traditional Relational Databases With Vector Support, Enhanced Query Capabilities
2) By NoSQL Vector Databases: Document-Based NoSQL Vector Databases, Key-Value NoSQL Vector Databases, Column-Family NoSQL Vector Databases
3) By NewSQL Vector Databases: Scalable NewSQL Solutions, Distributed NewSQL Systems

Unlock Exclusive Market Insights – Purchase Your Research Report Now!
https://www.thebusinessresearchcompany.com/purchaseoptions.aspx?id=13754

North America was the largest region in the vector database market in 2024 and is expected to be the fastest-growing region in the forecast period. The regions covered in the vector database market report are Asia-Pacific, Western Europe, Eastern Europe, North America, South America, Middle East, Africa.

Who are the influential players reshaping the vector database market landscape?
Major companies operating in the vector database market report are Google LLC (Alphabet Inc.), Microsoft Corporation, Amazon Web Services Inc., Alibaba Cloud, Elastic N.V., MongoDB Inc., FD Technologies PLC, DataStax Inc., Redis Ltd., Vector AI, GSI Technology Inc., Chroma DB, Vald, SingleStore, OpenSearch, Rockset Inc., PlanetScale, Kinetica DB Inc., Qdrant, ClickHouse Inc., Clarifai Inc., Pinecone Systems Inc., Vespa ai, Marqo AI, Activeloop, Zilliz, Milvus, S2Search Australia Pty Ltd., Weaviate

Customize Your Report – Get Tailored Market Insights!
https://www.thebusinessresearchcompany.com/customise?id=13754&type=smp

What Is Covered In The Vector Database Global Market Report?

•Market Size Forecast: Examine the vector database market size across key regions, countries, product categories, and applications.
•Segmentation Insights: Identify and classify subsegments within the vector database market for a structured understanding.
•Key Players Overview: Analyze major players in the vector database market, including their market value, share, and competitive positioning.
•Growth Trends Exploration: Assess individual growth patterns and future opportunities in the vector database market.
•Segment Contributions: Evaluate how different segments drive overall growth in the vector database market.
•Growth Factors: Highlight key drivers and opportunities influencing the expansion of the vector database market.
•Industry Challenges: Identify potential risks and obstacles affecting the vector database market.
•Competitive Landscape: Review strategic developments in the vector database market, including expansions, agreements, and new product launches.

Learn More About The Business Research Company
With over 15000+ reports from 27 industries covering 60+ geographies, The Business Research Company has built a reputation for offering comprehensive, data-rich research and insights. Armed with 1,500,000 datasets, the optimistic contribution of in-depth secondary research, and unique insights from industry leaders, you can get the information you need to stay ahead.
Our flagship product, the Global Market Model (GMM), is a premier market intelligence platform delivering comprehensive and updated forecasts to support informed decision-making.

Contact Us:
The Business Research Company
Europe: +44 207 1930 708
Asia: +91 88972 63534
Americas: +1 315 623 0293
Email: info@tbrc.info

Follow Us On:
LinkedIn: https://in.linkedin.com/company/the-business-research-company
Twitter: https://twitter.com/tbrc_info
YouTube: https://www.youtube.com/channel/UC24_fI0rV8cR5DxlCpgmyFQ

Connect with us on:
LinkedIn: https://in.linkedin.com/company/the-business-research-company,
Twitter: https://twitter.com/tbrc_info,
YouTube: https://www.youtube.com/channel/UC24_fI0rV8cR5DxlCpgmyFQ.

Contact Us
Europe: +44 207 1930 708,
Asia: +91 88972 63534,
Americas: +1 315 623 0293 or
Email: mailto:info@tbrc.info

Learn More About The Business Research Company
With over 15,000+ reports from 27 industries covering 60+ geographies, The Business Research Company has built a reputation for offering comprehensive, data-rich research and insights. Our flagship product, the Global Market Model delivers comprehensive and updated forecasts to support informed decision-making.

This release was published on openPR.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Building Your First Platform Team in a Fast Growing Startup

MMS Founder
MMS Jessica Andersson

Article originally posted on InfoQ. Visit InfoQ

Transcript

Andersson: Once upon a time, there was a startup. This startup was running code in production, because that’s how we get our product out. They have been running for a couple of years, and they started to identify the need for all the things DevOps. They knew that they wanted to keep developer focus on delivering software value, and they knew that they wanted to do all them things cloud. Recognize this story? I think this is common for a lot of people.

They realized that the solution was to invest into a base platform and platform engineering in order to solve the problem of CI/CD, runtime, and observability. It’s been evolving ever since. I’m here to tell you that you need a platform regardless of the size of your organization. I would even go as far as to say, you already have a platform, you just don’t know it yet. You need to treat your platform as a product. It’s no different than delivering the software product that your company is probably surviving on. Trust is a currency, and you need to treat it as such, because otherwise you will fail. Being a small company, the tradeoffs that you make will be the key to your success.

Background

I’m Jessica. I’m the product area lead for developer experience or engineering enablement at Kognic, the startup that I’m working at. I’m also a CNCF ambassador and a speaker at conferences. Before joining Kognic, I was a platform engineer working at another company and delivering platform as a service to more than 40 teams globally across the world. We were focusing mainly on Kubernetes as a platform and logging as a service. We had other things that we provided as well. I first heard of Kognic from a friend of mine who worked there when he sent me a job ad with a motivation, like, “You do cloud and Kubernetes. How about this one?” Who can resist that? It read, “Wanted, Head of DevOps”. Let’s give it a chance. I like my friend. He’s nice. I start reading the ad and I realize what they describe is something that actually sounds fairly similar to what I’m already doing.

I set up a meeting with our co-founder, Daniel, and we met up in a cafe looking like this, actually. We took a cup of coffee and we talked about it. I told him about all the things that I’ve been doing with trying to empower and enabling different development teams to solve their needs more efficiently. He told me about their situation and how far they had gotten, that they had some software running, but they realized that they need to invest more into making it continue to run smoothly over time, and solve some of the common needs of how to operate it in production. I told him my other hot take that I don’t dare to put in text, is that, I think a DevOps team is a glorified operations team, and I don’t work with operations. It’s nothing wrong doing that, but I feel that I really enjoy being a platform team more because I can affect a lot of people and I can try to make their lives easier.

Empowering and Enabling Product Teams

As a platform team, making people’s lives easier, someone wrote on Twitter, “Being a platform engineer is the closest that I will probably ever be to become a 10x developer”. This was all the rage when I was looking for this job. We also know that by empowering and enabling our product teams, they can focus on the things that they care about a lot, which is delivering product value. We tried to look at the needs for this head of DevOps, the reason why this ad was out. What Daniel described the need of the company was that they needed to do quick iterations because they didn’t really know exactly what the product would end up being in the long run. This is a startup. You’re trying to figure out what is the perfect fit for my product. They wanted to do quick iterations. They also wanted to have a lot of flexibility in changing direction if they discovered this is not the way we want to go. They wanted to maintain the developer focus on delivering value. They didn’t want all the developers to understand everything about Kubernetes.

I don’t want them to have to do that either because there’s a lot of things to know. They also knew that they wanted to keep a cloud native modern tech stack. We saw on the hype scale that cloud native was right up there with generative AI. We also talked about developers and operations versus DevOps. I’ve done a talk only about this thing previously. I think the main thing is that when you have a situation where you have a developer and operations team as we had at my previous company and we transformed that into DevOps, we have several reasons of doing so. We discovered that having the developers focusing only on pushing out new features and code changes was very disconnected from operating them in production because we started to see a lot of issues such as failures in production. We got long time to get new things pushed out because operations were busy firefighting. They were pushing back when developers wanted to deploy more things.

Operations had a problem getting the feedback back to the developers and prioritize the fixes for solving the things failing in production. It was more or less just throwing things over the wall and hoping it all works out. It did not work out.

At my previous company, we did a large effort in trying to transform into DevOps and have all the product teams work with an end-to-end full application lifecycle workflow. When we talk about end-to-end, like only in the full application lifecycle, we can also talk about how we get there. We get there through having empowered product teams. If you haven’t read Empowered Product Teams by Marty Cagan, it’s still a really good book, and there’s a lot of great ideas in it. You don’t have to read all of it. There’s a lot of blog posts that summarize some of the main points, or talk to someone smart around you that actually read it. That also works for me. Check it out if you haven’t. Marty Cagan describes empowered product teams as being about ordinary people delivering extra-ordinary products. You want to take any product teams, empower them so that they can focus on delivering great products.

Difference between, as I mentioned, the developers pushing features and empowered product teams can be described as, product teams, they are cross-functional. They have all the functionality or all the skillsets that they need in order to deliver their part, their slice of the product. They might have product managers, they might have designers, engineering, whatnot, they need to deliver their slice. They are also measured by outcomes and not output. Output is, I did a thing, yes, me. Outcome is, I made an impact. There’s a difference in that. We want to optimize for making good outcomes rather than a lot of output. They’re also empowered to figure out the best way to solve the problems that they’ve been asked to solve. This is a quote from the blog post that describes exactly this. It says that solving problems in ways our customers love, yet work for our business. It’s not all about only focusing on making customers happy, about doing it in such a way that the business can succeed, because we’re all here to earn money.

Very much related to this is “Team Topologies” by Matthew Skelton and Manuel Pais. I will focus on it because I think this is strongly related and this is something that we looked at on how to structure our teams. Stream-aligned teams have also a slice of the cake. They have like a you build it, you run it thing. They have a segment of the business domain, and they’re responsible for that, end-to-end. Then you have the enabling team. I said, I offer engineering enablement. There’s a reason why it’s here. They work on trying to help and unblock the stream-aligned teams. They are there to figure out the capabilities that we need to make in order to improve the life of the stream-aligned teams.

Then we have the platform team, and they are supposed to build a compelling internal product to accelerate delivery by the stream-aligned teams. We’re here to empower and enable our stream-aligned teams to deliver good outcomes to create business value. As a small company and a small organization, I argue that you probably can’t afford to have both an enabling team and a platform team. In our case, we decided to combine these two. We decided that we should have both engineering enablement and platform engineering within the same function.

Given what I told you about empowered product teams and trying to focus on good outcomes, do you think we should have a DevOps team or a platform team? It’s a given. We’re going for the platform team. That’s why I’m here. Back to the coffee shop, me and Daniel, we talked for longer than we set off time. We’re both big talkers. In the end, we think that this might be something and we decided to give it a go. In June 2020, I joined Kognic with the mission of starting up a platform engineering team and try to solve the problem of empowering the product teams to deliver more value.

By the time we had the platform team hired because there was new hiring going on, the engineering team had grown to 31 people. Four of these were working in the platform team. That means that about 13% of our headcount were dedicated to platform and internal tooling. The reason why I tell you is not because 13 is a magic number, I just thought it could be nice to know, like put it in numbers. Tell us what you really were doing. Being a small company, this is 13% of the capacity, whatever you want to call it, that we had to deliver new value, and we thought that we could actually gain this in more empowerment.

Implicit Platform

We had a team. We could start with a platform. We didn’t start from scratch, because, as I said, we were running code in production. It turns out, if you’re running code in production, you already have an implicit platform. The first thing that I had to do was try to figure out what is already here, what do we have right now. This platform, it exists, but it’s less structured and less intentional. This is what happens when you don’t do an intentional effort in trying to build your platform. There were some really good things in place, but a lot of it had happened in the way of, we need something for this, let’s do it, and then go back to the things that I’m really supposed to be working on. Meaning that we had Kubernetes running, but it had not been upgraded since it was started. Yes, that’s true. We had other things that were working good enough to solve the immediate need, but maybe not good enough to empower the teams fully. We had a lot of things in place.

They were done with the best effort, good enough approach, and we needed to turn this around. I’m going to show you what we had in our implicit platform. This is not me telling you this is what you should run in any way, but I want you to have this in mind. We use Google Cloud Platform as our cloud provider. We use GKE, Kubernetes running on top of it for our runtime. We had CircleCI for our CI. We are writing code in TypeScript and Scala and Python. We had bash scripts for deploying to production. We also had InfluxDB and Grafana for observability. You can see that there’s nothing here about logs because I don’t think we work with logs at this point in time.

A Base Platform

What do I mean when I say platform? Because this is what we were hired to fix. This is where we started out and what we wanted to build. I define this as a base platform. This is, for me, a new term. Focus on solving the basic needs. We’re talking about CI/CD. You want to build, package, and distribute your code. You want to have it run somewhere in what’s equivalent to production for you. You want to have the observability to be able to know in case something goes wrong so that you can operate and maintain your applications. Without these three, it’s really hard to have something continuously deployed to production and keep working in production. I see those as the bare necessities of platform. With this concept in mind, we want to take our implicit platform and turn it into an intentional platform.

Our platform, it changed a tiny bit, not much. You can see it’s basically the same picture that I showed you before. We took Google Cloud and we introduced infrastructure as code to make sure that we have resources created in the same way and that it’s easy to upgrade when we want to make changes to how we utilize them. We improved the separation of concern for cloud resources. There was a lot of reusing the same database instances for staging and production and other things. We also took Kubernetes and we upgraded it and applied security patches, and then continuously maintain it. Kubernetes does several releases per year, so it’s a lot of years to keep up. CircleCI, we were running a monorepo where all the code was, for both TypeScript, Scala, and Python. We broke it apart and we reduced the build times a lot. We went from 40-plus minutes to less than 5 minutes.

We also introduced GitHub Actions in order to have smaller, more efficient jobs because our developers really felt those were easy to integrate with. We didn’t even provide it as a platform. It was just something they started using and then we adopted it. We removed InfluxDB and replaced it with OpenTelemetry and automagic instrumentation of applications. When I say automagic, I really mean automagic. If you haven’t looked at it, it’s as good as they say, and I didn’t believe it until we tried it. Then we removed the bash scripts for deployments and we introduced Argo CD and GitOps for better version controlling, and control and easier upgrades and rollbacks of applications.

Platform as a Product

How did we go about getting to this place? We treated our platform as a product. It’s an important part. I think the first thing you need to do is to understand your market. This is your empowered product teams. What do they need to do? How do they work today? What pain points do they have? You need to understand, what would be the good direction to go with this? You need to iterate and validate your solutions. You can’t just go away for a year and do something and come back and deploy it, and hope everyone is happy. Because you need to always constantly work together with the teams in order to make sure that you have something good. If you’re new to working with a product mindset, I can recommend looking at something that is called a Double Diamond, where you have two different phases. One is you go broad for problem discovery, and then you narrow down on a problem solution.

Then you go broad on a solution discovery and then narrow down on a solution decision, and then you iterate on that. When we look at our platform team, we do similar things as our empowered product teams, meaning that we try to be cross-functional. We try to figure out, what capabilities do we need in our team in order to deliver the things that we need? We are looking at cloud infrastructure. We need that skill. We also need containers and Kubernetes skills because there’s a lot to learn about those.

Observability is a big thing. It’s good to have that skill. Also, the enablement and the teaching. You need to teach your empowered product teams to adopt and work with these services that you provide. You need to be able to show them new ways of working. You need people that can actually both teach and communicate with other people. Communication is actually one of the bullets that we had in job ads for skills we’re looking for. You also need product management skill combined into this team. Obviously, since we’re doing product. If you want to learn more about working with product thinking for platform teams, I can recommend checking out this talk by Samantha Coffman. It was at KubeCon + CloudNativeCon EU in Paris. The recording is up on YouTube. Check it out. She does a really good description of it and she has really concrete examples of what it means to figure out what the real problem is rather than fixing the symptoms.

Finding the Right Problems

Talking about figuring out what the real problems are, remember what we wanted to achieve? We wanted to achieve quick iterations, flexibility to change direction, and maintain a developer focus on delivering product value. Given that, we wanted to figure out what is holding us back from achieving that. Let’s start with understanding the market. Autonomous teams and empowered product teams, I have trouble separating those terms. Where does one end, where does another start? Autonomous teams is something that we have talked a lot about at Kognic. One thing that it says is that autonomous teams can deliver, beginning to earn value, and with minimum supervision. It’s similar to empowered product teams: do all the things that you need to solve the problem, but also with the caveat of minimum supervision. They’re in charge of defining the daily tasks and the work processes that need. It’s very similar.

If we think about autonomy and the free choice, we can’t force our product teams to use our platform because then we are removing the autonomy and the freedom to choose. As a platform team, it’s very important that we try to make it enticing and something they want to use. We can have things as our platform as a default setting, but maybe we can’t hinder them from choosing something else. To achieve that, we want to create a paved road for the developers. What does that even mean? What is paved road? We want to empower and enable our product teams or our stream-aligned teams to deliver valuable outcomes. For every decision and every action they need to take that doesn’t build towards that, we can view that as a cost that takes away from the outcomes that they are able to deliver. If we provide a paved road, something that is easy to follow, they don’t have to make a decision of, how do I want to deploy to production? They can follow the already paved road of that.

Then we can solve the basic needs of building, running, and operating applications. We allow our teams to focus on the things that make a difference, and we reduce that cost. This paved road should be effortless. It should not cost them energy to stay on the paved road because then your paved road is not that great. As a platform team, I mentioned Kubernetes has a lot of upgrades. How many of you do migrations continuously, because I feel like we are always in a migration from one thing to another? If we as a platform team make those migrations feel cumbersome or heavy to do, people would try to like, “Maybe I don’t have to migrate. Maybe I can do something else. Maybe I don’t have to follow this thing that platform is now forcing me to do again”. You need to make those things effortless so that people can maintain on the paved road without adding more energy.

Is autonomy limitless? Are there boundaries to what you’re allowed to do as an autonomous team? Who is responsible for setting those limits? If everyone decides to go for their own solution, it will be really hard to be a platform team. Like you say, this is how you should deploy your applications. Then the team goes like, I think I have a better way. There are two reasons for that: either your platform is shit or your culture is shit. Both of those is something that you should try to figure out as soon as possible so you can address it. I also think that there are occasions where autonomy is good, but like if you have people running around freely, just doing the things that they find really interesting, it will be very costly in the long run because it will happen that you have to do something. With a very diverse codebase, it’s super hard to handle that as a platform team and as an organization. The longer you wait to figure out where the limits for your autonomy goes, the harder it will be to address it once you decide to do it.

There are things that might be good to put as not optional for the paved road. When I talk about that, I usually think about compliance and I think about security. Everyone loves compliance and security. I’m sure you do because I do know that I do. A paved road or a platform is something that can really help you figure those things out. If you make it easy for the teams to be compliant and be secure by building innately into your platform, you can reduce the load for them to do so, and be able to focus on the things that they want to do, like valuable outcomes. I think there are situations where the paved road might not be optional and that you can build it into the platform in order to solve that.

Back to finding the right problem. If we build a paved road that enables quick iterations, flexibility change while allowing product teams to focus on product value while staying empowered, if we want to do that, then we need to figure out what is holding us back from doing so right now. We need to figure out what the right problems are. We knew that with our limited resources, being a small team, 31 people, 4 people in platform, we needed to figure out what we wanted to focus on and be intentional with what we invest to. We want to take our implicit platform, apply strategy in order to make it intentional. We wanted to reduce the pain points and the time sinks, and improve developer experience to increase our ability to deliver product value.

Problem Statements

I have some problem statements that we can use as a tool when asking ourselves what should we focus on. The first thing is like, teams are blocked from performing the tasks that they need to do. Maybe they have to wait for someone to help them in order to move forward. This is, of course, bad. I’m only listing bad things here. The second one could be like the task that the team performed takes a long time and they are hindered from moving on until it’s done. The third thing is the tasks teams perform are unreliable and prone to failure. I’m going to give you three examples of where this applied to us. The first one was DNS. DNS was not failing, but DNS was blocking. When our teams wanted to deploy a new service and they wanted to attach a DNS to it, they had to go and ask one or two people that can help them create a DNS record and give it back. They were blocked from moving on until they got that support.

Something that was taking a very long time, I mentioned before, we had a monorepository with a lot of long builds. You had to wait for the build to build your package so you could deploy to production. We had build times of over 40 minutes. This was taking a lot of time and hindering people from moving forward. When it comes to unreliable and failures, we had deploying to production with bash scripts. Because there was a lot of hidden functions within this bash script that was not clear, it was a black box to the developers, and it failed several times a week. It was becoming painful. The team members were not sure how to figure it out themselves. They couldn’t know for sure, even if it seemed to go fine, if it actually also got reproduced in production. It was prone to errors. It was unreliable.

This was something that they were not able to solve themselves. They were hindered from moving forward. We looked at these tasks and we figured out what we should focus on. Hint, we took all three of those and tried to look at. We tried to look at our implicit platform. We tried to figure out where can we streamline it, where can we upgrade it, and where can we improve it in order to remove those pain points and time sinks? When we have tackled how to solve those problems, we also need to figure out how to roll this out to the teams, and how we can get them started using it, and how can we gain adoption of a platform.

Trust is Currency

Which nicely leads me to the next section, which says, trust. Gaining adoption is closely related to the amount of trust your product teams have in you as a platform team. As eng says, trust is currency and you should treat it as such. You have to gain some currency before you can spend it. Credibility is a currency and it is what you earn and spend as a platform team. When we talk about trust, trust goes up and trust goes down. When I say up, I mean up in the organization. You have to keep your trust with leadership because they are the ones that decide to continue to invest into your platform team. If you look at the budget, you’re just cost. If you’re unlucky as well, you also get the cloud bill on your cost part of their budget, and then it looks like you’re very expensive. You need to build trust with your organization that you are actually improving things so that you can keep doing it. It’s already been talked about, and DORA metrics is something that you can work with in order to show some kind of improvement and show what value to deliver.

This link goes to the “Accelerate” book which is written by Dr. Nicole Forsgren, Jez Humble, and Gene Kim. If we think about DORA metrics, they are four metrics and they are focused on deployment frequency, how often do you get new things into production? Lead time for changes, how long does it take that you start working on something until it actually reaches an end user? Mean time to recovery, if you have a failure, how quickly do you recover? Change failure rate, how often does a change lead to failure? Those four metrics is something that you measure your empowered product teams on but can be a nice indicator on your effect as a platform team. If you think about down, you want to have trust with your product teams in order to gain adoption of your platform.

If they don’t trust you to solve their pain points, then you need to figure out why they don’t trust you and what you can do to change that. I would suggest starting with something easy but painful or time consuming. Make that work so much easier for them, and then go on to the next thing. Start small, start building up credibility, because when you have built some trust, people will start coming to you and then you will have an easier time understanding their point of view. For us, something that we did, the DNS thing, we introduced external DNS into Kubernetes which means that you can use Kubernetes configuration in order to allocate a DNS record. This was very easy for the developers to understand how to use and it was very quick for them to start using it as well, meaning that from one day to another basically, they were no longer blocked by anyone ever when wanting to change DNS.

Once you have tackled some of the small things, you can go on to the bigger things, and then you will probably spend some credits so you can earn them back again. Based on my experience, these are some of the things that you can do in order to earn or spend the credits. When we talk about earning credits, we can talk about removing pain points, and really anything that is painful for developers will do. As a platform team, it’s good to be approachable and helpful. You want people to reach out to you so you can learn about what they are doing. Something that we do for this, is that in Slack, we have a team channel that is team platform engineering in which we have a user group that is a goalkeeper, platform engineering goalkeeper. Teams know that they can ping this goalkeeper with questions regarding the platform and to get help to figure out how they can solve something in case something breaks and they need help understanding what went wrong, they can do that.

If they want help understanding how they can utilize some part of the platform, they can do that. By being very approachable and helpful, and by approachable, I mean there are no stupid questions, we know this. Also make sure that they understand it. Be nice. Be kind. If someone comes to you with a question and you go like, here’s a link to wiki. Do you think they will ask you again? They will probably be like, no, they don’t want to help. If you go like, we wrote this part about it but if there’s anything that’s unclear, please let me know and we can work on it together. That’s more approachable. That is something that makes people want to come back and ask again. You can still give them the link to wiki because like, check the documentation. You can do it in a kind way so people want to reach out again.

You want to be proactive. You want to be able to fix some things before people have to ask, especially if it’s something that you really should know, like something is broken in your platform. It would be nice if you know it before some developers come and ask you, why is this thing not working anymore? You need to understand the team perspective, like, where do they come from? What do they know? What do they not know? What do they want to achieve? In spending credits, we have enforcing processes. Sometimes you can’t avoid it, like compliance, like security. It costs you credits. In my experience, teams really don’t like when you tell them you have to do this just because. Be mindful of how you enforce your processes.

Also, blocking processes. Empowered teams, they know that they are empowered. They know they’re supposed to be. If you take that away from them, you’re blocking them from being empowered, they’re not going to like it. Migrations, we can’t get away from it, but depending on how you perform them, it will cost you credits. It might even cost you credits even if you do it well. Assumptions, I know everyone working on my platform team is really smart and capable at what they’re doing. I also know that there are several times when we made assumptions of what the teams needed and how they wanted it to work, and we were wrong. It’s very easy to take your world view and project it on something where it doesn’t really fit. Make sure that you validate your assumptions and understand the team perspective in combination with your assumptions. Otherwise, you might be spending credits.

I want to tell you, this could be our trust credit score over time, from June 2020 up until now. Please don’t mind the time warp. It was hard. I don’t have any numbers as well because we haven’t been tracking this, but it could look like this. On the y-axis, we have the trust credits. On the x-axis, we have the time. You can see that we did a really big drop in trust. This is when we were making the assumption, enforcing a process, not talking to the teams to anchor or understand their perspective before migrating everyone to a new way of working. I’m talking about how we introduced Argo CD and GitOps instead of a bash script for deploying into Kubernetes.

For us, it was clear that everyone wants to work with GitOps, because you have it version controlled. It’s very nice. You have control always running. You can follow the trail of everything. It’s easy to do rollbacks and all the things. We knew this. This was clear. The whole industry is talking about how this is a good way of working, but we did not anchor it with the teams. We did not understand how they viewed working with the bash script and interacting with it. We took something away from them, we forced something on them, and we did not make them understand why.

In the long run, we actually managed to gain back some trust on this because this change and the new process that we enforced, it proved itself time over time. In the long run, we gained more trust than we spent. I would rather not have that dip if I were doing it again, because I think it could have been avoidable, and I think we could have mitigated it through spending more time understanding the developers and making sure they’ve anchored the change before performing it. In the long run, great investment.

Small Teams and Tradeoffs

Speaking of investment, as a small team, you will have to do tradeoffs. The link goes to the CNCF landscape. The CNCF landscape is a map of all the open-source projects that is under the Cloud Native Computing Foundation umbrella. I don’t know the number, but if you zoom in, it looks like this. There’s a lot of project icons, and they are structured into different areas. Being a small team, you will not be able to use all these tools. You need to figure out what you need for your use case. You need to be mindful of what you take on, because if you take on too many things, it will be hard for you to maintain the speed, and it will be hard for you to adapt once you have to really work with the business value. Let’s say you’re working and you have some slack time, and so you go like, “What should we do now? How about this cool thing over here that I found? I think this would be a really nice feature for our product teams. I think they would really love it. Let’s do that”.

Then you start. Then you make everyone start using it. Then you get a new version, and you have to migrate everyone. Then suddenly the business comes and says like, we need you to fix this thing, because we need to add this capability into our platform. You go like, but we are working on this nice-to-have thing over here. You have must needs that you will not be able to address because suddenly you have filled all your time with nice-to-haves. Be mindful of what you take on. Be mindful that the open-source community is both a power and a risk, because there’s a risk of drowning into a lot of things, but it’s also the power of standing on the shoulders of other people and utilizing what they already have done. Have reservation, but make use of the things that you must have. Ask yourself, what can you live without? What do I not need?

For us, we realize we can live without a service mesh. A service mesh is a dedicated infrastructure layering for facilitating service-to-service communications within Kubernetes or other things. It’s really nice. You can get these fancy maps where you can see the graph of services talking to each other and all the things. You can do network policies, all them things. Really nice, but not a must for us. We don’t need it. In a similar way, we don’t need mutual TLS between applications in our Kubernetes cluster, because that’s not the main concern for us right now. Caveat, I really love Backstage.io as a project, but we don’t need a developer portal. It can be extremely nice to have. It can solve many issues that you have, but as a small company, we don’t have those pain points that motivate people to start using Backstage.

We don’t need to invest into a developer portal. Design system. Starting out, design system is like clear standards and component libraries that you can reuse for the frontend. Starting out, we did not want to invest into this because we didn’t see the need. Actually, in the last year, we have started to invest into a design system. It’s really valuable. We started out with the components that were mostly used throughout the application, and we started by standardizing those. Not every component is in the design system, but the ones that are, are used a lot, which is really nice for our frontend developers and our designers who can work together in collaborating on how they want the standard to work. Starting out, ask yourself, what things can you live without? What is on your nice-to-have list, but maybe not worth investing into?

Summary

With this knowledge, if you want to get started on your own paved road, what should your paved road contain? When you know what your business needs are, what your team needs are, how you can continuously build trust with your organization and your teams, and what tradeoffs you’re willing to make, then you’re ready to start paving your own road of empowered product teams. Do remember, you already have a platform. Start investing strategically into it. You should treat your platform as a product and unleash new capabilities. Trust is a currency, and you use it to gain adoption from your product teams. Tradeoffs is a key to success. Pick the right ones, and you can win again and again.

Questions and Answers

Participant 1: You started with your personal journey, and I appreciate that a lot. Forgive me for saying so. It seems to me like the deal was already in place when you joined the new company. You didn’t have to fight for a consensus. You didn’t have to convince the company to start building a platform team. Myself, I’m in a different situation. I’m still fighting that battle, and I am exactly against the things you were saying. We are very small as a company. Maybe we just need a bigger DevOps team. Based on my experience, based on my reading, it seems to me like to win these arguments, what one would have to do is, small POCs improve value, but you also need a little bit of help from the top down. I need to manage upwards. I need to manage downwards. I’m looking for some advice, basically.

Andersson: Eb was talking about change, how to drive change without having the authority. He talked about finding allies for your cause, and I think finding allies in the leadership is what you need to do. Maybe not someone directly in your line, if you have trouble convincing them. Find someone else in leadership that will listen to you and that can be an ally for you. Then we had a talk about bits and bots and something else around DevOps, and she talked about how the different team structures can look. I think she made a really good case for why a DevOps team is not the way to go. She had a really nice dumpster picture on the DevOps team and everything. Check that talk out if you haven’t. I think you can use that as a motivation for why a big DevOps team is not the solution.

Then I think, yes, it really helped to have our co-founder, Daniel, convinced before joining the company. He knew they needed to change something. He wasn’t sure how to approach it. Talking together, we could come to a shared vision of what that would look like, which was very useful.

Participant 2: Luckily, for us, we’re on the right path towards this, so we just started something like this. I’m trying to know, did you have something like a roadmap from the beginning? How long did it take you to achieve this?

Andersson: I’m a really bad product manager because I never had a roadmap that I successfully maintained, unfortunately. Luckily, no one really blamed me for it either. What we did mainly was, it took about a little bit over half a year from me joining before we actually had a team in place. There was a lot of recruitment and getting to know the company first. That’s where I spent the first time. Then when the team joined, we started to remove the blockers because there were some things that the teams were not able to do. Those were often quicker fixes, so we started with that. Within a year from June 2020, we had changed the DNS thing and we had changed the GitOps thing, but we had not started on the monorepo and the build times. Half of the things I’ve told you now was basically within the first year. This second half spread out over the last three years, but also all the things that I did not mention happened in the last three years.

Participant 3: If I’m a developer within your company, what does the platform look like? Is it a website? Is it documentation, API? If I want to deploy something on your platform as a developer, where do I go? How does it work on a day-to-day basis?

Andersson: As a developer wanting to use our platform, it’s back to the keynote thing. If you want to introduce a Kanban for a team that never worked with Agile, or Jira, or one of these things, start simple. They used an Excel sheet. We used mainly GitHub repositories where it was like, this is how you can copy-paste code to get started. It’s not a fancy platform in any way. It’s more like, here’s where you put your Kubernetes configuration and here’s what you can copy-paste to get started. Here’s where you can clone a GitHub repository to get that base application. It’s a little bit crude still, but it’s still making it more streamlined and everything. Boilerplates, we are currently working on rolling that out. It takes a while. Boilerplates are part of the fancy part of the platform. Bare necessities is just, make it run.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Google DeepMind Unveils Gemini Robotics

MMS Founder
MMS Daniel Dominguez

Article originally posted on InfoQ. Visit InfoQ

Google DeepMind has introduced Gemini Robotics, an advanced AI model designed to enhance robotics by integrating vision, language, and action. This innovation, based on the Gemini 2.0 framework, aims to make robots smarter and more capable, particularly in real-world settings.

One of the key features of Gemini Robotics is its embodied reasoning, which allows robots to understand and react to their environment in a more human-like way. This capability is crucial for robots to adapt quickly in dynamic and unpredictable environments. Gemini Robotics enables robots to perform a wider range of tasks with greater precision and adaptability, which are significant advancements in robotic dexterity.

Google DeepMind is also developing the next generation of humanoid robots partnering with Apptronik, which have the potential to work alongside humans in various environments, including homes and offices. The concept of steerability is emphasized, referring to the responsiveness of robots to human commands and environmental changes, enhancing their versatility and ease of use.

Safety and ethics are top priorities, with measures such as collision avoidance and force limitation integrated into the AI models. The ASIMOV dataset, inspired by Isaac Asimov’s Three Laws of Robotics, aims to improve safety in robotic actions, ensuring robots operate ethically and safely around humans.

Comments from various sources reflect excitement and optimism highlighting its adaptability and generalization, calling it a step toward genuine usefulness in robotics, moving beyond mere automation.

Educator and business leader Patrick Egbunonu posted on X:

Imagine robots intuitively packing lunchboxes, handling delicate items, or assembling products efficiently—without extensive custom programming.

Others note its impressive dexterity and instruction-following, suggesting it could be a pivotal advancement. Web discussions, like those on Reddit, draw parallels to a ChatGPT moment for robotics, though some argue it needs broader consumer access to truly revolutionize the field. 

User ogMackBlack shared on Reddit:

The ChatGPT moment in robotics, to me at least, will be the moment regular people like us will be able to purchase them robots for personal use or have Gemini taking control of physical stuff autonomously at home via an app.

Google DeepMind’s work expands the capabilities of robotics technology, pushing its development forward. While experts recognize its potential to connect cognitive processing with physical action, some remain skeptical about its immediate real-world impact, especially when compared to high-profile demonstrations from competitors like Tesla’s Optimus.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Azure Database for Mysql Trigger for Azure Functions in Public Preview

MMS Founder
MMS Steef-Jan Wiggers

Article originally posted on InfoQ. Visit InfoQ

Microsoft has recently introduced a public preview of Azure Database for MySQL trigger for Azure Functions. With these triggers, developers can build solutions that track changes in MySQL tables and automatically trigger Azure Functions when rows are created, updated, or deleted.

Azure Functions is Microsoft’s serverless computing offering. It allows developers to build and run event-driven code without managing infrastructure. Within functions, triggers and bindings are defined. Triggers define how a function runs and can pass data into it. At the same time, bindings connect tasks to resources, allowing input and output data handling – a setup that enables flexibility without hardcoding access to services.

Azure Functions has several triggers such as Queue, Timer, Event Grid, Cosmos DB, and Azure SQL. Microsoft has introduced another one for the Azure Database for MySQL in preview, which bindings monitor the user table for changes (inserts, updates) and invokes the function with updated row data. The Azure Database for MySQL bindings was available in a public preview earlier.

Sai Kondapalli, a program manager at Microsoft, writes in a tech Community blog post:

Similar to the Azure Database for MySQL Input and Output bindings for Azure Functions, a connection string for the MySQL database is stored in the application settings of the Azure Function to trigger the function when a change is detected on the tables.

For the trigger to work, it is necessary to alter the table structure to enable change tracking on an existing Azure Database for MySQL tables to use trigger bindings for an Azure function. A data table will look like this:

ALTER TABLE employees
ADD COLUMN az_func_updated_at TIMESTAMP 
DEFAULT CURRENT_TIMESTAMP 
ON UPDATE CURRENT_TIMESTAMP;

According to the documentation, the Azure MySQL Trigger bindings use “az_func_updated_at” and column data to monitor the user table for changes. Based on the employee’s table, the C# function would look like this:

using System.Collections.Generic;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.MySql;
using Microsoft.Extensions.Logging; 

namespace EmployeeSample.Function
{
    public static class EmployeesTrigger
    {
        [FunctionName(nameof(EmployeesTrigger))]
        public static void Run(
            [MySqlTrigger("Employees", "MySqlConnectionString")]
            IReadOnlyList<MySqlChange> changes,
            ILogger logger)
        {
            foreach (MySqlChange change in changes)
            {
                Employee employee= change. Item;
                logger.LogInformation($"Change operation: {change.Operation}");
                logger.LogInformation($"EmployeeId: {employee.employeeId}, FirstName: {employee.FirstName}, LastName: {employee.LastName}, Company: {employee. Company}, Department: {employee. Department}, Role: {employee. Role}");
            }
        }
    }
}

With the Azure Database for MySQL trigger, developers could build solutions that enable real-time analytics by automatically updating dashboards and triggering alerts with new data. This would allow automated workflows with seamless integration into other Azure services for MySQL data processing. Additionally, it enhances compliance and auditing by monitoring sensitive tables for unauthorized changes and logging updates for security purposes.

While Azure Database for MySQL triggers for Azure Functions offers powerful automation capabilities, developers should consider:

  • Scalability: High-frequency updates may lead to function execution bottlenecks. Implementing batching or filtering logic can mitigate performance concerns.
  • Supported Plans: The feature is currently only available on premium and dedicated Azure Function plans.
  • Compatibility: Ensure that the MySQL version used is compatible with Azure’s bindings and trigger mechanisms.

Microsoft’s investments in MySQL include bindings and triggers in Functions, as well as supporting a newer version of MySQL for Azure database offering, resiliency, migration, and developer experience, as announced at Ignite.

Lastly, developers can find examples of the Azure Database for MySQL Triggers in a GitHub repository.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Project Leyden Ships Third Option for Faster Application Start with JEP 483 in Java 24

MMS Founder
MMS Karsten Silz

Article originally posted on InfoQ. Visit InfoQ

In Java 24, JEP 483, Ahead-of-Time Class Loading & Linking, under the auspices of Project Leyden, starts Java applications like Spring PetClinic up to 40% faster without code changes or new application constraints. It needs a training run to build a cache file that ships with the application. With GraalVM Native Image and CRaC, applications start 95-99% faster but face more constraints. Since JVM initialization is very expensive, Leyden plans more improvements.

JEP 483 extends Java’s Class-Data Sharing (CDS). On every startup, the JVM processes the same Java classes from the application, libraries, and the JDK the same way. CDS stores the results of reading and parsing those classes in a read-only cache file. JEP 483 adds loaded and linked classes to that cache and calls it “AOT cache.”

The training run only records the AOT configuration. It’s another step to create the AOT cache. This example uses a Java compiler benchmark picked by Leyden:

java ‑XX:AOTMode=record ‑XX:AOTConfiguration=app.aotconf ‑cp JavacBenchApp.jar JavacBenchApp 50
java ‑XX:AOTMode=create ‑XX:AOTConfiguration=app.aotconf ‑XX:AOTCache=app.aot ‑cp JavacBenchApp.jar

The AOT cache app.aot file is then ready to use:

java ‑XX:AOTCache=app.aot ‑cp JavacBenchApp.jar JavacBenchApp 50

On an Apple M1 MacBook Pro, the resulting 23 MBytes AOT cache leads to a 26% faster startup. The more classes an application loads, the higher the potential speed-up from the AOT cache. That is why frameworks like Spring Boot may especially benefit from JEP 483.

Project Leyden may combine the two steps for the AOT cache creation in the future. The Quarkus framework already does that today.

The training run could be a production run, but should at least mirror production as much as possible. Using the AOT cache requires the same JDK version, operating system, CPU architecture (such as Intel x64 or ARM), class path, and Java module options as the training run, though additional classes can be used. JEP 483 cannot cache classes from user-defined class loaders and does not work with JVMTI agents that rewrite class files using ClassFileLoadHook or call the AddToBootstrapClassLoaderSearch or AddToSystemClassLoaderSearch APIs.

GraalVM Native Image is an AOT compiler ​​that moves compilation and as much initialization as possible to build time. It produces native executables that start instantly, use less RAM, and are smaller and more secure. But these executables also have principal constraints that do not affect most applications, need longer build times, have a more expensive troubleshooting process, and require more configuration. GraalVM started in Oracle Labs, but its two Java compilers may join OpenJDK.

The OpenJDK project, Coordinated Restore at Checkpoint (CRaC), takes an application memory snapshot during a training run and uses it later, similar to how JEP 483 creates and uses the AOT cache. But unlike JEP 483, CRaC only runs on Linux and requires all files and network connections to be closed before taking a snapshot and then re-opened after restoring it. That’s why it needs support from the JDK and the Java framework. While most frameworks support CRaC, only two downstream distributions of OpenJDK, Azul and Bellsoft, do. And the CRaC memory snapshot may pose security risks, as it contains passwords and credentials in clear text and is susceptible to hacking attacks.

Introduced in June 2020, the goal of Project Leyden is “to improve the startup time, time to peak performance, and footprint of Java programs.” Initially, Leyden wanted to introduce the “concept of static images to the Java Platform,” such as from GraalVM Native Image, but after two years with no public activity, it instead pivoted to optimizing the JIT compiler. JEP 483 is the first result of that pivot shipping.

In an October 2024 blog post, Juergen Hoeller, senior staff engineer and Spring Framework project lead at Broadcom, spoke of a “strategic alignment with GraalVM and Project Leyden.” JEP 483 appears to prove that: Spring and Spring Boot are the only Java frameworks mentioned, and the Spring PetClinic sample application is one of the two examples. Oracle’s Per Minborg, consulting member of technical staff, Java Core Libraries, also gave a joint presentation with Spring team member Sébastien Deleuze from Broadcom in October 2024, where unreleased improvements reduced the PetClinic startup time even further.

InfoQ reached out to learn how some Java frameworks plan to support JEP 483. Here are their answers in alphabetical order of the framework name. Some answers were edited for brevity and clarity.

The Helidon team shared a blog post with benchmarks of JEP 483, CRaC, and GraalVM Native Image. It used an application in the two Helidon flavors: Helidon SE and Helidon MP. The GraalVM Native Image speed-up below uses Profile-Guided Optimization (PGO), which also requires a training run.

Application Type JEP 483 Speed-Up CRaC Speed-Up GraalVM Native Image Speed-Up
Helidon SE 67% 95% 98%
Helidon MP 62% 98% 98%

Max Rydahl Andersen, Distinguished Engineer at Red Hat, Quarkus, and Sanne Grinovero, Quarkus founding engineer and senior principal software engineer at Red Hat, from Quarkus, said the following:

We’re glad to see Project Leyden progressing. Quarkus fully supports JEP 483 since it’s integrated into the Java VM. The biggest challenge is the training run, which can be complex – especially in containerized environments.

To simplify this, we’ve made it possible to “boot” Quarkus just before the first request and then package applications with the AOT cache. This follows a similar approach to our AppCDS support.

If your JVM supports it, you can try it with:

mvn package ‑DskipTests ‑Dquarkus.package.jar.appcds.enabled=true ‑Dquarkus.package.jar.appcds.use-aot=true

Then run:

cd target/quarkus-app/
java ‑XX:AOTCache=app.aot ‑jar quarkus-run.jar

This makes it easy to get the AOT cache, as long as you are aware of the limitations around the JDK, OS, and architecture.

This provides a noticeable boost in startup time. However, project Leyden is not complete yet, and we’re looking forward to several improvements which are not available yet.

As an example, early previews of Leyden had a significant tradeoff: While it started more efficiently, the memory consumption was also higher. And since Quarkus users care about memory, we didn’t want to recommend using it until such aspects were addressed. The Quarkus team is working very closely with the Red Hat engineers working on OpenJDK, so we are confident that such aspects are being addressed. In fact, memory consumption has already improved significantly compared to the early days, and more improvements are scheduled.

Support for custom class loaders is another big ticket on our wishlist. Speeding up classes loaded by the system class loader is great, as that accelerates the JDK initialization. But application code and Quarkus extensions are loaded by a custom class loader, so only a subset of the application currently benefits from Leyden. We’ll keep working both on our side and in collaboration with the OpenJDK team to push this further.

We’re also exploring ways to make it more practical for containerized environments, where a training run isn’t always a natural fit.

So yes, Quarkus supports Leyden and the AOT cache introduced in JEP 483, but we’re just at the beginning of a longer journey of improvements.

Sebastien Deleuze from Spring had the following to say:

The Spring team is excited that Java 24 exposes the first benefits of Project Leyden to the JVM ecosystem for wider consumption. The AOT Cache is going to supercharge CDS that is already supported by Spring Boot. We are looking forward to further evolution likely to come in future Java versions.

The Micronaut team has not responded to our request to provide a statement.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.