Presentation: How GitHub Copilot Serves 400 Million Completion Requests a Day

MMS Founder
MMS David Cheney

Article originally posted on InfoQ. Visit InfoQ

Transcript

Cheney: GitHub Copilot is the largest LLM powered code completion service in the world. We serve hundreds of millions of requests a day with an average response time of under 200 milliseconds. The story I’m going to cover in this track is the story of how we built this service.

I’m the cat on the internet with glasses. In real life, I’m just some guy that wears glasses. I’ve been at GitHub for nearly five years. I’ve worked on a bunch of backend components, which none of you know about, but you interact with every day. I’m currently the tech lead on copilot-proxy, which is the service that connects your IDEs to LLMs hosted in Azure.

What is GitHub Copilot?

GitHub is the largest social coding site in the world. We have over 100 million users. We’re very proud to call ourselves the home of all developers. The product I’m going to talk to you about is GitHub Copilot, specifically the code completion part of the product. That’s the bit that I work on. Copilot does many other things, chat, interactive refactoring, things like that, and so on. They broadly use the same architecture and infrastructure I’m going to talk to you about, but the details vary subtly. GitHub Copilot is available as an extension. You install it in your IDE. We support most of the major IDEs, VS Code, Visual Studio, obviously, the pantheon of IntelliJ IDEs, Neovim, and we recently announced support for Xcode. Pretty much you can get it wherever you want. We serve more than 400 million completion requests. That was when I pitched this talk. I had a look at the number. It’s much higher than that these days.

We peak at about 8,000 requests a second during that peak period between the European afternoon and the U.S. work day. That’s our peak period. During that time, we see a mean response time of less than 200 milliseconds. Just in case there is one person who hasn’t seen GitHub Copilot in action, here’s a recording of me just working on some throwaway code. The goal, what you’ll see here is that we have the inbuilt IDE completions, those are the ones in the box, and Copilot’s completions, which are the gray ones, which we notionally call ghost text because it’s gray and ethereal. You can see as I go through here, every time that I stop typing or I pause, Copilot takes over. You can see you can write a function, a comment describing what you want, and Copilot will do its best to write it for you. It really likes patterns. As you see, it’s figured out the pattern of what I’m doing. We all know how to make prime numbers. You pretty much got the idea. That’s the product in action.

Building a Cloud Hosted Autocompletion Service

Let’s talk about the requirements of how we built the service that powers this on the backend, because the goal is interactive code completion in the IDE. In this respect, we’re competing with all of the other interactive autocomplete built into most IDEs. That’s your anything LSP powered, your Code Sense, your IntelliSense, all of that stuff. This is a pretty tall order because those things running locally on your machine don’t have to deal with network latency. They don’t have to deal with shared server resources. They don’t have to deal with the inevitable cloud outage. We’ve got a pretty high bar that’s been set for us. To be competitive, we need to do a bunch of things.

The first one is that we need to minimize latency before and between requests. We need to amortize any setup costs that we might make because this is a network service. To a point, we need to avoid as much network latency as we can because that’s overhead that our competitors that sit locally in IDEs don’t have to pay. The last one is that the length and thus time to generate a code completion response is very much a function of the size of the request, which is completely variable. One of the other things that we do is rather than waiting for the whole request to be completed and then send it back to the user, we work in a streaming mode. It doesn’t really matter how long the request is, we immediately start streaming it as soon as it starts. This is quite a useful property because it unlocks other optimizations.

I want to dig into this connection setup is expensive idea. Because this is a network service, we use TCP. TCP uses the so-called 3-way handshake, SYN, SYN-ACK, ACK. On top of that, because this is the internet and it’s 2024, everything needs to be secured by TLS. TLS takes between 5 to 7 additional legs to do that handshaking, negotiate keys in both directions. Some of these steps can be pipelined. A lot of work has gone into reducing these setup costs and overlaying the TLS handshake with the TCP handshake. These are great optimizations, but they’re not a panacea. There’s no way to drive this network cost down to zero. Because of that, you end up with about five or six round trips between you and the server and back again to make a new connection to a service.

The duration of each of those legs is highly correlated with distance. This graph that I shamelessly stole from the internet says about 50 milliseconds a leg, which is probably East Coast to West Coast time. Where I live, on the other side of an ocean, we see round trip times far in excess of 100 milliseconds. When you add all that up and doing five or six of them, that makes connection setup really expensive. You want to do it once and you want to keep it open for as long as possible.

Evolution of GitHub Copilot

Those are the high-level requirements. Let’s take a little bit of a trip back in time and look at the history of Copilot as it evolved. Because when we started out, we had an extension in VS Code. To use it, the alpha users would go to OpenAI, get an account, get their key added to a special group, then go and plug that key into the IDE. This worked. It was great for an alpha product. It scaled to literally dozens of users. At that point, everyone got tired of being in the business of user management. OpenAI don’t want to be in the business of knowing who our users are. Frankly, we don’t want that either. What we want is a service provider relationship. That kind of thing is what you’re used to when you consume a cloud service. You get a key to access the server resources. Anytime someone uses that key, a bill is emitted. The who is allowed to do that under what circumstances is entirely your job as the product team. We’re left with this problem of, how do we manage this service key? Let’s talk about the wrong way to do it.

The wrong way to do it would be to encode the key somehow in the extension that we give to users in a way that it can be extracted and used by the service but is invisible to casual or malicious onlookers. This is impossible. This is at best security by obscurity. It doesn’t work, just ask the rabbit r1 folks. The solution that we arrived at is to build an authenticating proxy which sits in the middle of this network transaction. The name of the product is copilot-proxy because it’s just an internal name. I’m just going to call it proxy for the rest of this talk. It was added shortly after our alpha release to move us out of this period of having user provided keys to a more scalable authentication mechanism.

What does this workflow look like now? You install the extension in your IDE just as normal, and you authenticate to GitHub just as normal. That creates a kind of OAuth relationship, like each where there’s an OAuth key which identifies that installation on that particular machine for that person who’s logged in at that time. The IDE now can use that OAuth credential to go to GitHub and exchange that for what we call a short-lived code completion token. The token is just like a train ticket. It’s just an authorization to use it for a short period of time. It’s signed. When the request lands on the proxy, all we have to do is just check that that signature is valid. If it’s good, we swap the key out for the actual API service key, forward it on, and stream back the results. We don’t need to do any other further validation. This is important because it means for every request we get, we don’t have to call out to an external authentication service. The short-lived token is the authentication.

From the client’s point of view, nothing’s really changed in their world. They still think they’re talking to a model, and they still get the response as usual. This token’s got a lifetime in the order of minutes, 10, 20, 30 minutes. This is mainly to limit the liability if, say, it was stolen, which is highly unlikely. The much more likely and sad case is that in the cases of abuse, we need the ability to shut down an account and therefore not generate new tokens. That’s generally why the token has a short lifetime. In the background, the client knows the expiration time of the token it was given, and a couple of minutes before that, it kicks off a refresh cycle, gets a new token, swaps it out, the world continues.

When Should Copilot Take Over?

We solved the access problem. That’s half the problem. Now to talk about another part of the problem. As a product design, we don’t have an autocomplete key. I remember in Eclipse, you would use command + space, things like that, to trigger the autocomplete or to trigger the refactoring tools. I didn’t use that in the example I showed. Whenever I stop typing, Copilot takes over. That creates the question of, when should it take over? When should we switch from user typing to Copilot typing? It’s not a straightforward problem. Some of the solutions we could use is just a fixed timer. We hook the key presses, and after each key press, we start a timer. If that timer elapses without another key press, then we say, we’re ready to issue the request and move into completion mode.

This is good because that provides an upper bound on how long we wait and that waiting is additional latency. It’s bad, because it provides a lower bound. We always wait at least this long before starting, even if that was the last key press the user was going to make. We could try something a bit more science-y and use a tiny prediction model to look at the stream of characters as they’re typed and predict, are they approaching the end of the word or are they in the middle of a word, and nudge that timer forward and back. We could just do things like a blind guess. Any time there’s a key press, we can just assume that’s it, no more input from the user, and always issue the completion request. In reality, we use a mixture of all these strategies.

That leads us to the next problem, which is, despite all this work and tuning that went into this, around half of the requests that we issued are what we call typed through. Don’t forget, we’re doing autocomplete. If you continue to type after we’ve made a request, you’ve now diverged from the data we had and our request is now out of date. We can’t use that result. We could try a few things to work around this. We could wait longer before a request. That might reduce the number of times that we issue a request and then have to not use the result. That additional latency, that additional waiting penalizes every user who had stopped and was waiting. Of course, if we wait too long, then users might think that Copilot is broken because it’s not saying anything to them anymore. Instead, what we’ve built is a system that allows us to cancel a request once they’ve been issued. This cancellation request using HTTP is potentially novel. I don’t claim it to be unique, but it’s certainly the first time I’ve come across this in my career.

Canceling a HTTP Request

I want to spend a little bit of time digging into what it means to cancel a HTTP request. You’re at your web browser and you’ve decided you don’t want to wait for that page, how do you say, I want to stop, I want to cancel that? You press the stop button. You could close the browser tab. You could drop off the network. You eat your laptop away. These are all very final actions. They imply that you’re canceling the request because you’re done. Either you’re done with using the site or you’re just frustrated and you’ve given up. It’s an act of finality. You don’t intend to make another request. Under the hood, they all have the same networking behavior. You reset the TCP stream, the connection that we talked about setting up on previous slides. That’s on the browser side. If we look on the server side, either at the application layer or in your web framework, this idea of cancellation is not something that is very common inside web frameworks.

If a user using your application on your site presses stop on the browser or if they Control-C their cURL command, that underlying thing translates into a TCP reset of the connection. On the other end, in your server code, when do you get to see that signal? When do you get to see that they’ve done that? The general times that you can spot that the TCP connection has been reset is either when you’re reading the request body, so early in the request when you’re reading the headers in the body, or later on when you go to start writing your response back there.

This is a really big problem for LLMs, because the cost of the request, that initial inference before you generate the first token, is the majority of the cost. That happens before you produce any output. All that work is performed. We’ve done the inference. We’re ready to start streaming back tokens. Only then do we find that the user closed the socket and they’ve gone. As you saw, in our case, that’s about 45% of the requests. Half of the time, we’d be performing that inference and then throwing the result on the floor, which is an enormous waste of money, time, and energy, which in AI terms is all the same thing.

If this situation wasn’t bad enough, it gets worse. Because cancellation in HTTP world is the result of closing that connection. In our case, in the usage of networking TCP to talk to the proxy, the reason we canceled that request is because we want to make another one straightaway. To make that request straightaway, we don’t have a connection anymore. We have to pay those five or six round trips to set up a new TCP TLS connection. In this naive idea, in this normal usage, cancellation occurs every other request on average. This would mean that users are constantly closing and reestablishing their TCP connections in this kind of signaling they want to cancel and then reestablishing connection to make a new request. The latency of that far exceeds the cost of just letting the request that we didn’t need, run to completion, and then just ignoring it.

HTTP/2, and Its Importance

Most of what I said on the previous slides applies to HTTP/1, which has this one request per connection model. As you’re reading on this slide, HTTP version numbers go above number 1, they go up to 2 and 3. I’m going to spend a little bit of time talking about HTTP/2 and how that was very important to our solution. As a side note, copilot-proxy is written in Go because it has a quite robust HTTP library. It gave us the HTTP/2 support and control over that that we needed for this part of the product. That’s why I’m here, rather than a Rustacean talking to you. This is mostly an implementation detail. HTTP/2 is more like SSH than good old HTTP/1 plus TLS. Like SSH, HTTP/2 is a tunneled connection. You have a single connection and multiple requests multiplexed on top of that. In both SSH and HTTP/2, they’re called streams. A single network connection can carry multiple streams where each stream is a request. We use HTTP/2 between the client and the proxy because that allows us to create a connection once and reuse it over and again.

Instead of resetting the TCP connection itself, you just reset the individual stream representing the request you made. The underlying connection stays open. We do the same between the proxy and our LLM model. Because the proxy is effectively concatenating requests, like fanning them in from thousands of clients down onto a small set of connections to the LLM model, we use a connection pool, a bunch of connections to talk to the model. This is just to spread the load across multiple TCP streams, avoid networking issues, avoid head-of-line blocking, things like that. We found, just like the client behavior, these connections between the proxy and our LLM model are established when we start the process and they leave assuming there’s no upstream problems.

They remain open for the lifetime of the process until we redeploy it, so minutes to hours to days, depending on when we choose to redeploy. By keeping these long-lived HTTP/2 connections open, we get additional benefits to the TCP layer. Basically, TCP has this trust thing. The longer a connection is open, the more it trusts it, the more it allows more data to be in fly before it has to be acknowledged. You get these nice, warmed-up TCP pipes that go all the way from the client through the proxy, up to the model and back.

This is not intended to be a tutorial on Go, but for those who do speak it socially, this is what basically every Go HTTP handler looks like. The key here is this req.Context object. Context is effectively a handle. It allows efficient transmission of cancellations and timeouts and those kind of request-specific type meta information. The important thing here is that the other end of this request context is effectively connected out into user land to the user’s IDE. When, by continuing to type, they need to cancel a request, that stream reset causes this context object to be canceled. That makes it immediately visible to the HTTP server without having to wait until we get to a point of actually trying to write any data to the stream.

Of course, this context can be passed up and down the call stack and used for anything that wants to know, should it stop early. We use it in the HTTP client and we make that call onwards to the model, we pass in that same context. The cancellation that happens in the IDE propagates to the proxy and into the model effectively immediately. This is all rather neatly expressed here, but it requires that all parties speak HTTP/2 natively.

It turns out this wasn’t all beer and skittles. In practice, getting this end-to-end HTTP/2 turned out to be more difficult than we expected. This is despite HTTP/2 being nearly a decade old. Just general support for this in just general life was not as good as it could be. For example, most load balancers are happy to speak HTTP/2 on the frontend but downgrade to HTTP/1 on the backend. This includes most of the major ALB and NLBs you get in your cloud providers. It, at the time, included all the CDN providers that were available to us. That fact alone was enough to spawn us doing this project. There are also other weird things we ran into.

At the time, and I don’t believe it’s been fixed yet, OpenAI was fronted with nginx. nginx just has an arbitrary limit of 100 requests per connection. After that, they just closed the connection. At the request rates that you saw, it doesn’t take long to chew through 100 requests, and then nginx will drop the connection and force you to reestablish it. That was just a buzz kill.

All of this is just a long-winded way of saying that the generic advice of, yes, just stick your app behind your cloud provider’s load balancer, it will be fine, didn’t work out for us out of the box. Something that did work out for us is GLB. GLB stands for GitHub Load Balancer. It was introduced eight years ago. It’s one of the many things that has spun out of our engineering group. GLB is based on HAProxy. It uses HAProxy under the hood. HAProxy turns out to be one of the very few open-source load balancers that offers just exquisite HTTP/2 control. I’ve never found anything like it. Not only did it speak HTTP/2 end-to-end, but offered exquisite control over the whole connection. What we have is GLB being the GitHub Load Balancer, which sits in front of everything that you interact with in GitHub, actually owns the client connection. The client connects to GLB and GLB holds that connection open. When we redeploy our proxy pods, their connections are gracefully torn down and then reestablished for new pods. GLB keeps the connection to the client open. They never see that we’ve done a redeploy. They never disconnected during that time.

GitHub Copilot’s Global Nature

With success and growth come yet more problems. We serve millions of users around the globe. We have Copilot users in all the major markets, where I live in APAC, Europe, Americas, EMEA, all over the world. There’s not a time that we’re not busy serving requests. This presents the problem that even though all this HTTP/2 stuff is really good, it still can’t change the speed of light. The round-trip time of just being able to send the bits of your request across an ocean or through a long geopolitical boundary or something like that, can easily exceed the actual mean time to process that request and send back the answer. This is another problem. The good news is that Azure, through its partnership with OpenAI, offers OpenAI models in effectively every region that Azure has. They’ve got dozens of regions around the world. This is great. We can put a model in Europe, we can put a model in Asia. We can put a model wherever we need one, wherever the users are. Now we have a few more problems to solve.

In terms of requirements, we want users, therefore, to be routed to their “closest” proxy region. If that region is unhealthy, we want them to automatically be routed somewhere else so they continue to get service. The flip side is also true, because if we have multiple regions around the world, this increases our capacity and our reliability. We no longer have all our eggs in one basket in one giant model somewhere, let’s just say in the U.S. By spreading them around the world, we’re never going to be in a situation where the service is down because it’s spread around the world. To do this, we use another product, again, that spun out of GitHub’s engineering team, called octoDNS. octoDNS, despite its name, is not actually a DNS server. It’s actually just a configuration language to describe DNS configurations that you want. It supports all the good things: arbitrary weightings, load balancing, splitting, sharing, health checks. It allows us to identify users in terms of the continent they’re in, the country.

Here in the United States, we can even work down to the state level sometimes. It gives us exquisite control to say, you over there, you should primarily be going to that instance. You over there, you should be primarily going to that instance, and do a lot of testing to say, for a user who is roughly halfway between two instances, which is the best one to send them to so they have the lowest latency? On the flip side, each proxy instance is looking at the success rate of requests that it sees and it handles. If that success rate drops, if it goes below the SLO, those proxy instances will use the standard health C endpoint pattern. They set their health C status to 500. The upstream DNS providers who have been programmed with those health checks notice that.

If a proxy instance starts seeing its success rate drop, they vote themselves out of DNS. They go take a little quiet time by themselves. When they’re feeling better, they’re above the SLO, they raise the health check status and bring themselves back into DNS. This is now mostly self-healing. It turns a regional outage when we’re like, “All of Europe can’t do completions”, into a just reduction in capacity because traffic is routed to other regions.

One thing I’ll touch on briefly is one model we experimented with and then rejected because it just didn’t work out for us was the so-called point of presence model. You might have heard it called PoP. If you’re used to working with big CDNs, they will have points of presence. Imagine every one of these dots on this map, they have a data center in, where they’re serving from. The idea is that users will connect and do that expensive connection as close to them as possible and speed up that bit.

Then those CDN providers will cache that data, and if need to, they can call back to the origin server. In our scenario, where I live in Asia, we might put a point of presence in Singapore. That’s a good equidistant place for most of Asia. A user in Japan would be attracted to that Singapore server. There’s a problem because the model is actually still hosted back here on the West Coast. We have traffic that flows westward to Singapore only to turn around and go all the way back to the West Coast. The networking colloquialism for this is traffic tromboning. This is ok for CDN providers because CDN providers, their goal is to cache as much of the information, so they rarely call back to the origin server. Any kind of like round tripping or hairpinning of traffic isn’t really a problem.

For us doing code completions, it’s always going back to a model. What we ended up after a lot of experimentation was the idea of having many regions calling back to a few models just didn’t pay for itself. The latency wasn’t as good and it carried with it a very high operational burden. Every point of presence that you deploy is now a thing you have to monitor, and upgrade, and deploy to, and fix when it breaks. It just didn’t pay for us. We went with a much simpler model which is simply, if there is a model in an Azure region, we colocate a proxy instance in that same Azure region and we say that is the location that users’ traffic is sent to.

A Unique Vantage Point

We started out with a proxy whose job was to do this authentication, to authenticate users’ requests and then mediate that towards an LLM. It turns out it’s very handy to be in the middle of all these requests. Some examples I’ll give of this are, we obviously look at latency from the point of view of the client, but that’s a very fraught thing to do. It’s something I caution you, it’s ok to track that number, just don’t put it on a dashboard because some will be very incentivized to take the average of it, or something like that. You’re essentially averaging up the experience of everybody on the internet’s requests, from somebody who lives way out on bush on a satellite link to someone who lives next to the AMS-IX data center, and you’re effectively trying to say, take the average of all their experiences. What you get when you do that is effectively the belief that all your users live on a boat in the middle of the Atlantic Ocean.

This vantage point is also good, because while our upstream provider does give us lots of statistics, they’re really targeted to how they view running the service, their metrics. They have basic request counts and error rates and things like that, but they’re not really the granularity we want. More fundamentally, the way that I think about it, to take food delivery as an example, use the app, you request some food, and about 5 minutes later you get a notification saying, “Your food’s ready. We’ve finished cooking it. We’re just waiting for the driver to pick it up”. From the restaurant’s point of view, their job is done, they did it, their SLO, 5 minutes, done. It’s another 45 minutes before there’s a knock on the door with your food. You don’t care how quickly the restaurant prepared your food. What you care about is the total end-to-end time of the request. We do that by defining in our SLOs that the point we are measuring is at our proxy. It’s ok for our service provider to have their own metrics, but we negotiate our SLOs as the data plane, the split where the proxy is.

Dealing with a Heterogeneous Client Population

You saw that we support a variety of IDEs, and within each IDE, there is a flotilla of different client versions out there. Dealing with the long tail of client versions is the bane of my life. There is always a long tail. When we do a new client release, we’ll get to about 80% population within 24 to 36 hours. That last 20% will take until the heat death of the universe. I cannot understand why clients can use such old software. The auto-update mechanisms are so pervasive and pernicious about getting you to update, I don’t quite understand how they can do this, but they do. What this means is that if we have a bug or we need to make a fix, we can’t do it in the client. It just takes too long, and we never get to the population that would be successful with rolling out that fix. This is good because we have a service that sits in the middle, the proxy that sits in the middle, that we can do a fix-up on the fly, hopefully.

Over time, that fix will make it into the client versions and they’ll sufficiently roll out, there’ll be a sufficient enough population. An example of this, one day out of the blue, we got a call from a model provider that said, you can’t send this particular parameter. It was something to do with log probabilities. You can’t send that because it’ll cause the model to crash, which is pretty bad because this is a poison pill. If a particular form of request will cause a model instance to crash, it will blow that one out of the water and that request will be retried and it’ll blow the next one out of the water and keep working its way down the line. We couldn’t fix it in the client because it wouldn’t be fast enough. Because we have a proxy in the middle, we can just mutate the request quietly on the way through, and that takes the pressure off our upstream provider to get a real fix so we can restore that functionality.

The last thing that we do is, when we have clients that are very old and we need to deprecate some API endpoint or something like that, rather than just letting them get weird 404 errors, we actually have a special status code which triggers logic in the client that puts up a giant modal dialog box. It asks them very politely, would they please just push the upgrade button?

There’s even more that we can do with this, because logically the proxy is transparent. Through all of the shenanigans, the clients still believe that they’re making a request to a model and they get a response. The rest is transparent. From the point of view of us in the middle who are routing requests, we can now split traffic across multiple models. Quite often, the capacity we receive in one region won’t all be in one unit. It might be spread across multiple units, especially if it arrives at different times. Being able to do traffic splits to combine all that together into one logical model is very handy. We can do the opposite. We can mirror traffic. We can take a read-only tap of requests and send that to a new version of the model that we might be either performance testing, or validating, or something like that.

Then we can take these two ideas and mix and match them and stack them on top of each other and make A/B tests, experiments, all those kinds of things, all without involving the client. From the client’s point of view, it just thinks it’s talking to the same model it has yesterday and today.

Was It Worth the Engineering Effort?

This is the basic gist of how you build a low latency code completion system with the aim of competing with IDEs. I want to step back and just ask like, as an engineering effort, was this worth it? Did the engineering effort we put into this proxy system pay for itself? One way to look at this is, for low latency, you want to minimize hops. You certainly want to minimize the number of middlemen that are in the middle, the middleware, anything that’s kind of in that request path adding value but also adding latency. What if we just went straight to Azure instead, we had clients connect straight to Azure? This would have left authentication as the big problem, as well as observability. They would have really been open questions. It would have been possible to teach Azure to understand GitHub’s OAuth token. The token that the IDE natively has from GitHub could be presented to Azure as an authentication method. I’m sure that would be possible. It would probably result in Azure building effectively what I just demonstrated on this.

Certainly, if our roles were reversed and I was the Azure engineer, I would build this with an authentication layer in front of my service. Some customer is coming to me with a strange authentication mechanism, I’m going to build a layer which converts that into my real authentication mechanism. We would have probably ended up with exactly the same number of moving parts just with more of them behind the curtain in the Azure side. Instead, by colocating proxy instances and model instances in the same Azure region, we, to most extent, ameliorated the cost of that extra hop. The inter-region traffic is not free, it’s not zero, but it’s pretty close to zero. It’s fairly constant in terms of the latency you see there. You can characterize it and effectively ignore it.

War Stories

I’m going to tell you a few more war stories of what’s happened over the life of this product just to emphasize that the value of having this intermediary really paid for itself over and over. One day we upgraded to a new version of the model which seemed to be very attracted to a particular token. It really liked emitting this token. It was some end of file marker, and it was something to do with a mistake in how it was trained that it just really liked to emit this token. We can work around this essentially in the request by saying, in your response, this very particular token, weight it down negative affinity, never want to see it. If we didn’t have an intermediary like the proxy to do that, we would have had to do that in the client. We would have had to do a client rollout which would have been slow and ultimately would not have got all the users.

Then the model would have been fixed and we’d have to do another client change to reverse what we just did. Instead, it was super easy to add this parameter to the request on the fly as it was on its way to the model. That solved the problem immediately and it gave us breathing room to figure out what had gone wrong with the model training and fix that without the Sword of Damocles hanging over our head.

Another story is that, one day I was looking at the distribution of cancellation. For a request that was cancelled, how long did it live until it was cancelled? There was this bizarre spike effectively at 1 millisecond, effectively immediately. It was saying, a lot of requests come from the clients and are immediately cancelled. As in, you read the request and then instantly afterwards the client is like, I’m sorry, I didn’t mean to send that to you, let me take it back. The problem is by that time we’ve already started the process of forwarding that to Azure and they’re mulling on it. We immediately send the request to Azure and then say to them, sorry, I didn’t mean to send that to you. May I please have it back? Cancellation frees up model resources quicker but it’s not as cheap as just not sending a request that we know we’re going to cancel is.

It took us some time to figure out what was exactly happening in the client to cause this fast cancellation behavior, but because we had the proxy in the middle, we could add a little check that just before we made the request to the model, we would check, has it actually been cancelled? There were mechanisms in the HTTP library to ask that question. We saved ourselves making and then retracting that request. Another point talking to metrics, from the metrics that our upstream model provider provides us, we don’t get histograms, we don’t get distributions, we barely get averages. There would be no way we would have been able to spot this without our own observability at that client proxy layer. If we didn’t have the proxy as an intermediary, we still could have had multiple models around the world.

As you saw, you can have OpenAI models in any Azure region you want. We would just not have a proxy in front of them. We probably would have used something like octoDNS to still do the geographic routing, but it would have left open the question of what do we do about health checks. When models are unhealthy or overloaded, how do we take them out of DNS? What we probably would have had to do is build some kind of thing that’s issuing synthetic requests or pinging the model or something like that, and then making calls to upstream DNS providers to manually thumbs up and thumbs down regions. HTTP/2 is critical to the Copilot latency story. Without cancellation, we’d make twice as many requests and waste half of them. It was surprisingly difficult to do with off-the-shelf tools.

At the time, CDNs didn’t support HTTP/2 on the backend. That was an absolute non-starter. Most cloud providers didn’t support HTTP/2 on the backend. If you want to do that you have to terminate TLS yourself. For the first year of our existence of our product, TLS, like the actual network connection, was terminated directly on the Kubernetes pod. You can imagine our security team were absolutely overjoyed with this situation. It also meant that every time we did a deploy, we were literally disconnecting everybody and they would reconnect, but that goes against the theory that we want to have these connections and keep them open for as long as possible.

GitHub’s Paved Path

This is very GitHub specific, but a lot of you work for medium to large-scale companies, you probably have a tech stack that is, in GitHub paths, we call it the paved path. It is the blessed way, the way that you’re supposed to deploy applications inside the company. Everything behind GLB, everything managed by octoDNS made our compliance story. You can imagine, we’re selling this to large enterprise companies. You need to have your certifications. You need to have your SOC 2 tick in the box. Using these shared components really made that compliance story much easier. The auditors say, this is another GLB hosted service, using all the regular stuff, not exactly a tick in the box but got a long way towards solving our compliance story. The flip side is because these are shared components rather than every individual team knowing every detail up and down of terminating TLS connections on pods hosted in Kubernetes clusters that they run themselves, we delegate that work to shared teams who are much better at it than that.

Key Takeaways

This is a story of what made Copilot a success. It is possible that not all of you are building your own LLM as a service-service. Are there broader takeaways for the rest of you? The first one is, definitely use HTTP/2. It’s dope. I saw a presentation by the CTO of Fastly, and he viewed HTTP/2 as an intermediary. He really says HTTP/3 is the real standard, the one that they really wanted to make. From his position as a content delivery partner whose job is just to ship bits as fast as possible, I agree completely with that. Perhaps the advice is not use HTTP/2, the advice would probably be something like, use something better than HTTP/1. If you’re interested in learning more, if you look that up on YouTube, that’s a presentation by Geoff Huston talking about HTTP/2 from the point of view of application writers and clients, and how it totally disintermediates most of the SSL and the middle VPN nonsense that we live with day to day in current web stuff.

The second one is a Bezos quote, if you’re gluing your product together from parts from off-the-shelf suppliers and your role in that is only supplying the silly putty and the glue, what are your customers paying you for? Where’s your moat? As an engineer, I understand very deeply the desire not to reinvent the wheel, so the challenge to you is, find the place where investing your limited engineering budget, in a bespoke solution, is going to give you a marketable return. In our case, it was writing a HTTP/2 proxy that accelerated one API call. We’re very lucky that copilot-proxy as a product is more or less done, and has been done for quite a long time, which is great because it gives our small team essentially 90% of our time to get dedicated to the operational issues of running this service.

The last one is, if you care about latency, if your cloud provider is trying to sell you this siren song that they can solve your latency issues with their super-fast network backbone. That can be true to a point, but remember the words of Montgomery Scott, you cannot change the laws of physics despite what your engineering title is. If you want low latency, you have to bring your application closer to your users. In our case that was fairly straightforward because code completion at least in the request path is essentially stateless. Your situation may not be as easy. By having multiple models around the globe, that turns SEV1 incidents into just SEV2 alerts. If a region is down or overloaded, traffic just flows somewhere else. Those users instead of getting busy signal, still get a service albeit at a marginally higher latency. I’ve talked to a bunch of people, and I said that we would fight holy wars over 20 milliseconds. The kind of latencies we’re talking about here are in the range of 50 to 100 milliseconds, so really not noticeable for the average user.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


TikTok’s Native Cross-Platform UI Framework Lynx Goes Open Source

MMS Founder
MMS Sergio De Simone

Article originally posted on InfoQ. Visit InfoQ

TitTok matrix ByteDance has open-sourced Lynx, a collection of frameworks and tools enabling the creation of native, cross-platform mobile apps using Web markup, CSS, and JavaScript. Lynx aims to deliver native performance thanks to its custom JavaScript engine and pixel-perfect UI rendering.

Lynx architect Xuan Huang explains that, inspired by Chromium, Flutter, and React Native, Lynx aims to deliver native experiences at scale and faster by addressing the growing complexity of diverse form factors and multiple-platform support which leads to “rebuild[ing] the same experiences multiple times, leading to wasted effort, siloed teams, and delayed time-to-market”.

Lynx followed a similar spirit—you can think of it as an “alternative Web tailored for app development”. It aims to honor the assets of web technologies while taking an opinionated approach, supporting web-like APIs and adding constraints and extensions with clear intent.

To achieve this goal, Lynx adopts markup and CSS to enable Web developers to use their skills for mobile app development. Lynx natively supports CSS animations and transitions, CSS selectors and variables for theming, and modern CSS visual effects like gradients, clipping, and masking.

On the other hand, Lynx departs from the usual single-threaded nature of the Web by statically enforcing a two-threaded model where the main thread is dedicated to privileged, synchronous, non-blocking tasks, while a background thread is used for user code. According to Huang, this approach is what enables Lynx to render the first app frame almost instantaneously, thus optimizing for the “time to first frame” (TTFF) metric, and to power highly responsive interfaces by efficiently handling high-priority events and gestures on the main thread.

Lynx is comprised of several components, including Lynx core engine; ReactLynx, a React-based frontend framework to create declarative UIs; Rspeedy, a bundler based on Rspack, a webpack-compatible bundler implemented in Rust to maximize performance; PrimJS, an optimized JavaScript engine; and Lynx DevTool, an Electron-based debugger. Lynx also offers a Web frontend to run Lynx apps in the browser.

In a Syntax FM podcast, ByteDance engineer Zack Jackson described Lynx as ByteDance’s own version of React, which powers all of their apps’ UI to maintain a unified architecture across diverse teams. In the official announcement, however, Huang emphasized that Lynx is not limited to React and other frontends will be open-sourced in the future.

Lynx is used for the Search panel of the TikTok apps and for TikTok Studio, the content creation and management app for TikTok creators, among other things.

Hacker News user suzakus, who described themselves as working at ByteDance, stated that some of the most important parts of the iOS and Android clients are written in C++ to be portable across platforms, while others are written in Kotlin or Swift, including most UI components. While this statement seems to reduce the prevalence of Lynx across ByteDance apps, it is compatible with Huang’s description.

As a side note, Huang explained on Hacker News why ByteDance decided to reuse the name of a popular command-line Web browser, which also happens to be the oldest web browser still maintained, for its project.

The Lynx project was originally named independently without thinking this far ahead. Since so much code and so many users already rely on it, we decided to stick with the name rather than change it just for open-sourcing.

As a final note, Huang stressed ByteDance will open-source more Lynx components, including additional UI components, the custom renderer, and other frontend frameworks, and that the framework will extend to other platforms such as Desktop, TV, and IoT devices.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Podcast: Simplify Your System By Challenging The Status-Quo And Learning From Other Ecosystems

MMS Founder
MMS Max Rydahl Andersen

Article originally posted on InfoQ. Visit InfoQ

Transcript

Olimpiu Pop: Hello everybody. I am Olimpiu Pop, a Java veteran editor for InfoQ, but I typically write about data and machine learning. Tonight, we are together with Max Rydahl Anderson, talking about how we can complicate our lives as software developers, principals and other roles we might have, especially now in the era of AI. And let’s hear it from Max. Max, who are you, and what projects are you currently working on?

Max Rydahl Andersen: Hey, yes, I’m Max. Max Rydahl Anderson, as you perfectly pronounced.

Olimpiu Pop: Almost.

Max Rydahl Andersen: I worked in the software industry for, I don’t know, 25 years. 20 or so of them have been in professional open source, started in JBoss, then Red Hat and I began in Hibernate. I’ve done the SIEM and tooling. I spent 10 to 15 years in Eclipse tooling and had IntelliJ tooling, VS code, and that kind of thing. And then I was off for a few years to do other stuff and go. And then, I took a year break, which triggered a lot of things we talked about today, like realising… getting away from all the complexity and realising, man, it’s complex when you come back. And what I’ve lately been doing is I’m a co-lead on Quarkus, and my side project is called JBang. Both of them are about making it fun to develop Java again, so to speak.

The basics of Software Engineering are more crucial than ever in the Gen AI Age [01:53]

Olimpiu Pop: Okay, thank you for the introduction. That’s an awe-inspiring path you walked, and it’s not even … Getting started, right? So, nowadays, everything is about AI. If you put on a chocolate AI, it’ll become more popular. How does that change the way we used to build applications?

Max Rydahl Andersen: So that one is fun. So, on the one hand, I think what’s happening is fantastic, and when I saw it the first time, I was like, “Wait, what is this?” And to see a lot of opportunities in it. I’m pro-AI, but I’m anti-the AI hype. So, this AI will enable many interesting ways to interact with IT and allow many people who did not before to use IT to utilise it better. And I’m not another person who thinks that “Oh, this AI will remove our jobs”, at least not in the term because as more people can use computers, the back end needs to do more. And I have this … I think I tweeted this out at some point. I said, “If AI disappears tomorrow, the world will continue to work. But if Java disappeared, the world will stop”.

Because all this stuff that AI does is interacting with these systems out there. Today, it might be a change in the future, but it’s very much the thing. So I’m frustrated about all the hype that says that you can do everything if you can just do AI. You need to understand system thinking and engineering to do it well.

Olimpiu Pop: Let me conclude somehow with what I heard from you. Mainly, it’s a tool, another tool in our toolbox.

Max Rydahl Andersen: Yes.

Olimpiu Pop: But we still need the basics.

Max Rydahl Andersen: Yes.

Generative AI Will Complement Or Improve the Existing Tools, not replace them [03:59]

Olimpiu Pop: System thinking, critical thinking. And the interesting part is the fact that we have to be product developers now. It’s something that I have always preached about, but people are like, “Okay, I know Java, or I know JavaScript, and it’s more than enough”. Now, we need to understand everything between you and the customer and the system because that will be the thing that will allow you to build a system that needs to be done. And then, obviously, you can just do some stuff with generative AI, but you still have to create the box or the guardrails for that.

Max Rydahl Andersen: Yes, to me, the AI is definitely a tool, but I compare it to … When SQL came around to access databases, it’s a thing that people use everywhere now in all different systems, all different ways to access data. It’s kind of been a standard way to get data. And LLMs or AI I think is this tool that yes, if you only think about prompt engineering, like you’ve had this chat view on it, then it’s definitely a tool you can use to learn stuff with. But once you start applying it, you also realize, “Oh, I can use this to get started writing my email. I can use it to write my code, I can use it to write my workflow. Oh, I can use it in doing stuff for me, which are boring tasks”.

Well, anyway, for me, this thing, it will enable people to do more. Therefore, it’s not AI who’s going to take the work away from you, it’s someone else who is using AI to do your job better. It’s just leveling up what we can do and enable way more people to participate in software development, which is amazing but also super scary because it means now that people before, there’s a natural filter because they had all these good ideas but couldn’t execute and they need to go find someone who could turn it into code. That filter is no longer there. You can literally just … Over time, there’ll be AIs that can help people with crazy ideas get implemented and some of them will build amazing things.

But man, there’ll also be even more people creating stuff that is horrible and unmaintainable and perform really bad. And that’s why I keep saying that you still need to have a good understanding of how things work. And at least as far as I can see, all the limits of LLMs lies in … it’s not going to come with like new ideas, but it is going to help someone with good ideas to maybe get some other ideas and get some, “Hey, there’s something related to this and therefore execute faster”. It’s a very interesting concept. Yes, and to me, I think it’s going to be a tool we use everywhere, but I’m bullish on the, “Oh, it’s going to remove all the developers”.

Because remember when the mobile first came around like, hey, now everyone is going to do mobile apps and desktop is no longer. But if anything else, yes, we have all mobile phones now, but we have even more desktop apps and even more powerful software that runs locally.

Olimpiu Pop: Yes, I agree with what you’re saying is that currently it seems that the mobile apps, and even, if you look at mobile tablet and the laptop or PC or whatever, the whole ecosystem, you do different things for different stuff. I don’t know, maybe I’ll just text you a plain message, but if I need to write a more comprehensive message, probably I’ll just still pick the full keyboard to do it. And that’s my feeling also with the LLMs, they are very good at particular tasks or how I envision them is another entry point for your application. It’s a simple way of interacting and that’s something that you have to build on top and obviously, crunching other numbers and allowing us to keep up with all the information that comes.

Max Rydahl Andersen: Yes, 100% and another analogy is the stuff, remember when there’s only assembly code or even punch cards, very few people can do it, but the system was, there’s more and more IT, we got assembler then we got high level languages. Every time we expanded the number of people who could do this stuff. And every time we say, now it’s simpler, so we don’t need … We can do more with less people, but we’re still growing and growing and growing and there’ll be more and more demand and AI is literally just doing the same thing. At least that’s the way I see it.

Taking time off and experimenting with different ecosystem, allows you to see things differently [08:27]

Olimpiu Pop: I agree with that. There are some things that they’ll be useful for and I like it. For instance, one scenario that I’m looking currently at is using LLMs for configuring some complex IoT applications because it’s a pain and that will allow us to make things a lot easier and more secure. But obviously, there are steps that are validating those things implemented separately. But given that now, we are sure that we’ll have a job tomorrow, and you still need to write code, let’s go back to the other projects that you’re involved with. What we are discussing is that we as a branch, as a craftsmanship, we have a tendency of complicating things and theoretically, we should make things a lot simpler. And that’s something that I know that you strongly believe in.

Max Rydahl Andersen: Yes, so one thing is I said I spent 15, 20 years in the industry. And then, I took a year off and I remember the first thing that … When I started, I said, I’m not going to go and use computer for three months or six months even. And on my first day I fell and broke my ankle so I couldn’t do anything. So I actually had to use something and said, no, I’m not going to go and look at Java. Let me try something else. I did Python in my old life so to speak. And I went back to that and I actually did some home assistant IoT stuff and I realized, “Oh, this is actually fun”. It was like there’s some pleasure in this non-type safe world, but I also like, man, it’s also complicated, but it was fun.

And then, I had a year off, didn’t do much Java, but I heard about my friends in Red Hat who were looking at this project that looked at Java differently. But I was like, “Nope, I won’t touch it. I’ll hear what it does but won’t touch it”. So I came back a year later, and had to go and install Java, learn how to run the build tools and maybe even do that stuff again. And I was just, “Man, what have we created?” So one of those things was I came back and I worked on a project called Quarkus then, and what Quarkus does is it came out of, “Hey, Java is dying, or that was at least the theory becaus,e oh, we can’t run in the cloud”. And we looked at why you can’t run in the cloud.

And that’s because it runs on these less resource machines and Java is kind of built for having access to many cores and lot of memory. And we talked to a dedicated team, how you just make it faster, just make Java faster, it’ll be fine. But I say, well, the best way to make Java faster is to have less classes because what takes time is every class coming up. And if you do a Hello World or something else, it starts very fast and doesn’t use muscle memory. But even better was that we realised, “Hey, the same application runs” and a lot of Jakarta EE or Java EE at the time, Spring apps, Wildfly, Seam, they were basically building on booting up the VM, doing a lot of compression at runtime and then wire things up automatically.

Like we call this simple? Simple and powerful, but in reality, you’re basically doing the same thing again and again. And every little dynamic thing, every little small thing was cool and useful, but we were basically doing what’s called a thousand paper cuts, like every little … Totally, we just piled up these things. And in Quarkus, the approach we take there is like let’s just do everything at build time or move everything at build time. And then, suddenly, when you do everything on build time, all the parsing and handling of confirmation, we can throw away and leave behind the result. And then suddenly, you get a JVM starting up in milliseconds or less than a second and then you can apply it native image on top and then it goes in faster.

And that flipping things around and not dragging in all the complexity we built over the last 25 years, was an eye-opener for me because suddenly I was like, “Wait, oh yes, we should just do stuff at build time” and off we go. And then, on top of that, so this gave us good runtime performance, which was the main thing that I think people know Quarkus for now. And also, what was interesting back then, but on top of that, because we were so fast at doing this stuff and we did build time, we have this information about the product, we can do stuff at development time too. So we actually could do hot reload or live reload. You get this, let’s say the Python PSP experience because you can just kind of experiment, which is a thing, we are not that used to in Java.

And that was an eye-opener for me that suddenly we can do this. Where before, my past of tool experience was, “Oh, I have to write a IDE plugin that does the live reload or handle this stuff”. And we were fairly popular, but it wasn’t until I actually went away from the tooling space and realized, “Yes, we had 100,000 whatever users per month on this stuff back in whatever year it was”. But I realized we had way more people using EAP and WildFly. I realized most people never got to have that benefit of quick turnaround because it requires to install an IDE and a specific plugin and again, bringing it back to the runtime, this makes the runtime a bit more complicated. But for the user it’s a much simpler experience. It’s like, it just works. And that’s a big thing for me at least. Yes.

The New Generation of Application Frameworks (micronaut and quarkus) reinvision the way Java should work faster and to improve the developer experience [14:16]

Olimpiu Pop: I remember a presentation that Holly Cummins gave a couple of years back, months back, I don’t remember exactly when. And I resonated with her experience and she was mentioning that like you, like me at a given point of time, we left Java behind and we did something else. We used another ecosystem. And the powerful stuff was that you can see the results relatively quick. For instance, in JavaScript or in HTML or Node.js or whatever, you just refresh, and everything was just popping up. And then, looking at how Quarkus is approaching that, or let’s not hold it only for Quarkus, but let’s call it the third generation of web frameworks. Because if you look at it, we used to have the heavyweights of the enterprise Java back in the day, the AGBs and so on so forth.

And then, you had Spring that wanted to be a Spring, but now, it felt more heavyweight and now the Spring seems to be the heavyweight, the champion. And then, you have the Davids coming behind the scenes, the Quarkus, the Micronauts that are trying to do the things differently and it actually makes sense and it’s closer to what we started putting in the modular system. So we take with us only what we actually need and when we need it. And on top of that, it’s also the ecosystem where you’re just writing the tests, you save them and in behind the scenes the validation is happening. So now, it’s something that is more fluent, it’s closer … For me, maybe to continuous deployment mentality that’s more popular nowadays.

Max Rydahl Andersen: Yes. The continuous testing as we call it, or we have it under the name of Developer Joy. You should feel like you can just experiment and try things out. And we talked about AI for example, this is a thing that when we started doing this thing, we had the hot reload, we had the continuous testing. We added dev services with this notion of if you haven’t configured anything, let’s say you added a Postgres dependency or a Kafka dependency or something, because of Quarkus, we only add stuff you need. We can derive that you want this and if you haven’t told us what you wanted to connect to, like Postgres, we’ll just create one for you. So this is where we use test containers or we call it dev service, which users test containers to start up either a dark container or even a native image and then again, it lets you get started very quickly.

And the latest thing we’ve done here is actually in the AI space, just to talk about AI again. In Quarkus with the LangChain4J and stuff enabled, again, it can do all this thing like hot reload, continuous testing, native image, that stuff, but it also comes with, if you set up your AI service with OpenAI or what’s the next or Azure or whatever, we have a chat UI for you in the dev UI. So you don’t even have to write code or try it out. You can actually try it out indirectly in the dev UI. But if you are more to local models using Olova, we’ll actually fetch that model for you and off you go. So the same kind of automatic setup we can derive from what you have included and what you have configured or rather what you hadn’t configured.

This is like, I’ve tried to explain it bunch of times and I’m probably also failing at it again because what I tend to see is some people get it, but it’s only when people actually try it out and they realise, “Oh, this magic happens”. It is really fun to see that people who are used to using EAP, Spring, any other framework in the last 20 years in Java, they expect it to be hard. So people, they don’t even try out the dev mode because they want to use the Maven build and they used to do Maven build, run the jar, Maven build run the jar. And in the beginning, there was people who were visibly upset on Twitter and whatever it was there that, “Hey, you are doing this hard reload stuff, I want to do tester and development first”.

And that’s where we came from … We made a continuous testing so you can do both testing first and this exploration. My point is that the people … even today we can get to … most people that use Quarkus, sees all this stuff and then, all the good stuff, but there’s a lot of people who just use Quarkus as if it’s any other old system and they never actually realize the power that’s in Quarkus because they just use it as they used to use any other complex setup.

Forgot the old paradigms and ways of working and approach new tools and philosophies open-minded to face a revolution [18:58]

Olimpiu Pop: So we should come more open-minded to change the old paradigms because one of the things that come with, let’s say more tenured software developers like you, like me, that you are above 20 years experience in the field. I am a bit below under that, but more or less around 20 years in the field, we are quite proud of the shortcuts that we know with our ID and the way how we build this stuff.

Max Rydahl Andersen: Yes, we have what I call survivor bias, but to use this stuff we had to do all these complicated things and therefore you should do it too. And when someone comes up with a simpler solution, it is hard to realize, “Oh wait, this is actually better”. And given the rural situation and how fast-paced thing is, I don’t blame people for missing it because there’s just limited time to try things out. But it is something I’ll remind people about again and again, just give it a try and see how it works. And I heard multiple times now where people who had this user case where they had some PHP and some JavaScript and a few other things and it was getting to be a mess and they tried to roll Quarkus in and suddenly they realized, “Oh, we can actually do all the stuff you’re doing there in Quarkus and all the hot reload”.

And that was because it was people who were not used to Java, they just tried using it as they were used to in JavaScript and stuff and they were like, “Oh actually I can actually do this”. And they get all the benefit of a Java ecosystem and all the stuff is there and everyone is happy.

Olimpiu Pop: Yes, I agree with that, but there is always the right thing that is usually on the medicine that is very, very small and not everybody sees it. And in my experience, it’s very nice when everything happens magically but point, there are those, the 99 percentile when things are not working as you expect and then, you have to go into a debug mode. What happens with those simple, because actually what … all the simple stuff that we are mentioning, they are very complex behind the scenes, but it was built so well that it allows us to build simple stuff. What happens in that particular case? Will we drop the tool and then, run for the old one or what’s your experience in that?

Max Rydahl Andersen: I can say what kind of mentality or approach we take in Quarkus is because there is machinery behind it. You’re doing build-time emulation, there’s bytecode generation. As an end user, you don’t have to realize it, but sometimes you hit a wall and you get exposed to some of this stuff. So one thing we try to tend to do is anything … like exceptions been in around Java since forever but since forever, how many versions was it that Java has just NullPointerException and no explanation of the context. It was added in ’21 or ’20, I can’t remember. At some point, they added the, NullPointer because variable X is null, right? Simple little piece of context makes a world of difference. Imagine if that had been around like 20 years ago and we do the same thing in Quarkus.

Any exception we find where we just say, error. We try to add that little bit of context to help you, similar if you run in dev mode, because that’s the thing we can with Quarkus. We can run in dev mode so we can actually handle errors differently than in part mode. So when an error occurs in production, we will kind of hide it because you don’t want to leak detail through a running system. But in dev mode we know it’s not running live and we’ll actually give you the exception. And in the recent version, we actually even extract the source code so you can see it on screen like this is where the issue is and you can click on the source and go over. Again, all this, without any IDE installed. And we just try and enable people to be able to solve the issues with context and in some cases, we even go and explain this is the kind of code you need to write or look for these annotations.

And then we try and make sure that it’s easy to find all the debug tools that are there, enable logging kind of thing. And the speed of how fast we can restart, makes it so that you might have an error but you can try 50 different things in 50 seconds where before it might be able to take, like you do three or four things and it’ll take a whole day and yes, that speed in itself enables much more debuggability, is that a word?

Olimpiu Pop: It is now. We just coined it so we have it on tape.

Max Rydahl Andersen: Yes.

Having Reliable Runtime Environments With Comprehensible Exception Handling will Make Your Software Run Anywhere, even devices roaming the Ocean Floor [23:45]

Olimpiu Pop: Just thinking about what you mentioned about the exception and the way Java handles the exceptions. I am reminding myself about what James Gosling was saying a couple of years back in Andrup, at Devoxx Belgium. He was saying, ” Everybody is complaining about why we have typed exceptions in Java? Why must we put a catch exception everywhere and wrap everything in that? And then, he was just saying that when he initially started building it, it was thought about embedded software and that embedded software was running on the floor of the ocean. He said, okay, it wasn’t that easy to go and just press the reset button on those submarines under the sea. So that was the reasoning, and I felt that that’s how we usually do.

It’s like the pictures were this is how we thought about it like software developers, but this is how they user use it and even Java was using that way. Now, my feeling is that it’s moving in a different rhythm. The community is more involved and with this bi-early cadence it’s much easier. But how is it is to maintain such a cadence? Because on the other side, when you’re building simple stuff, as you said, it might be a pain in the back.

Max Rydahl Andersen: Yes. Man, it was a turning point for Java when they moved to the six months release cadence because they did get into a mode of “Oh, things can now evolve and change”, which was super frustrating in early days. It took forever to get stuff going in Java. And in JBoss, in Hibernate and in WildFly, we’ve had to do tons of workarounds because of the limit, like the JVM was just not moving. For example, the whole async IO in Java came fair enough … it came fairly late and when it came, it was subpar compared to Netty and other ways to do things. So Java has always been good because of a really stable OpenJDK and the ability to hook things in and rewire things. So it could be innovation in the higher layers. Now, we’ve got multi-line strains, we’ve got records. We’ve got streams. I’m missing a few virtual threads.

All those are definitely helping. And in Quarkus we’ve picked up all of them, but we have to realize that the people we are selling to, the people that actually are paying our salaries. They are companies who do not afford to update to latest, greatest everywhere, especially on the JVM level because that’s considered part of the OS. So the funny thing is that people really want to stay back on the old version of Java, but they’re fine upgrading to a newer version of the framework because people are used to it, later. And that leaves us in a funny situation that right now, we’ve moved to … our base is Java 17. But we recommend Java 21, but here in a few months, Java 25 is coming out. And in normal cases, they’ll be fine.

But in Java 25, they’ve done a few things. Not just in 25, I mean, between 21 and 25, they’ve removed something like security manager, which actually makes it so that parts of the Java ecosystem just cannot run on 25 anymore. It’s going to be fixed but it’s going to take some time. But I mean that even if we wanted to move up to 21, we still will have parts of the Java ecosystem that could just not run on 25. So even if we want to move to 25, we can’t because this still works for … So all that stuff will still exist. But we’ve done this in Quarkus that we do basically releases every month, like multiple times a month actually. We do a minor every month and sell micros depending on bug fix and that kind of thing.

And that has allowed us to have fast iteration, but we have a stable version called LTS release. So we kind of do what is done in Java, like this tip-and-tale system, but we are not doing what the OpenUK team is telling us to do or for pushing last year was this, do have a release on the latest version of Java, have a version of Quarkus that runs … That has a baseline of 25 and then just maintains an old one for 21 for example. That requires just too many resources to make happen. So instead, what we do is we survive by using older version of Java in the framework, but we can still support newer stuff like 21. Virtual threads were supported even when we were on Java 11, we had support for Java 19 virtual threads. Java 14 records were supported before we moved up from Java 11.

Whatever the new specs are, we will have support for them, but we won’t be able to use them ourselves in Quarkus and that’s the price we pay so we can be on the edge but also, still support the older … I’m not going to say older, but slower moving systems.

Legacy systems need to be updated faster than before, to avoid their exploitation by using AI [29:05]

Olimpiu Pop: In those places that usually make the world spin, because I worked in finance for an extended period of time and I was surprised that people were still happy with software that they bought 10 15 years ago because it was doing the job that they always did and it was just incredible.

Occasionally, when there was some kind of regulation change, they came and asked for an update, but mainly because they needed that feature. But the things are working. I call them the plumbing systems, you don’t care about the plumbing underneath your house as long as it’s working as expected, but you’ll definitely care when the water is just popping out.

Max Rydahl Andersen: Yes, that’s just the reality of things, right? We have all the versatile Quarkus to be known in production and we help them, but Quark is still … All things considered, still a new kid on the block.

Olimpiu Pop: Yes. The new kid on the block.

Max Rydahl Andersen: So we still have the benefit of being able to move up, but we are introducing like LTS and longer LTS and over time, it’ll get longer. But we’ve been very aggressive of not guaranteeing a version of let’s say Quarkus 32. We’ll not add any new features to it. We’ll maintain and do security fixes to a certain level, but all new developments move up because it is a new world. We are encouraging developers and companies to no matter what, build your infrastructure and UCI so you can do this continuous testing. So updating should not be a problem for you. We believe that will cost less to develop and it’ll be cheaper to maintain, but eventually, some project will be like, “Hey, we can’t keep updating, we’ll just stay on this long-term, so forth, and then they’ll also not need any new features.

That’s fine, we’ll just keep maintaining that. Yes, this is the thing I think is exciting over the last 20 years is that software companies, even the older legacy ones are getting into the mindset of being able to update more aggressively, which is a thing. With the whole AI phrase, I’m still waiting for … Everyone talks about all the cool stuff AI can do, but if you can do that, it can also exploit CVEs way faster than anything else. So being able to upgrade is going to be a key thing.

Olimpiu Pop: Yes, definitely. But getting back to the initial point of our discussion about simple versus complex.

Max Rydahl Andersen: Yes.

Java Devs Fought So many Battles with complexity, that they don’t see Love, Peace and Quiet Life when they see it [31:30]

Olimpiu Pop: Now, we discussed about the frameworks and it’s obviously the new generations are a lot simpler than what used to have and it seems that the simplicity is moving forward and forward, but then, let’s look at the other ways of doing stuff like you are the father of JBang as well.

Max Rydahl Andersen: Yes.

Olimpiu Pop: JBang, my feeling is that it’s a love-hate relationship. I know people that love it and then, the other ones that are like, “Oh, max did this thing and it’s a pain in the neck because of security, blah blah blah, blah, blah, blah, blah”. Is JBang a simpler thing or it adds complexity?

Max Rydahl Andersen: Well, the way I think about it, there’s so many Java devs who’ve kind of fought so many battles of complexity, that they just can’t trust peace, love, and joy when they see it, right? When it’s peaceful and quiet, you just can’t believe that this is true. It has to be complex. The way I see it is JBang is just enough Java and you can say I’ve removed stuff that was often never needed or rather … I haven’t added stuff unless someone found a need for it. So JBang started again, this came up from being gone a year and coming back and I had to learn Java again. I was like, this is horrible. How can we do this to people? So this was basically … This was like Java 8 was the new thing back then. And I hadn’t done real Java since Java 6 at the time.

So I had to learn on streams and other stuff. I knew what they were, but I haven’t used them. So I said, Hey, let’s just try and have a side project for a while. And I saw someone using our built system, was using Kotlin scripts like K script I think it’s called. And they basically just writing Kotlin because it small single scripts because they could maintain … The Java people knew how to do that. And I looked at the code, I was like, this is just Java, why don’t we make that work in Java? And that’s where it started. And it was like, “Oh, when I built it was like, wait, I can now experiment”. Those that don’t know, Jbang is … The original idea was a single Java file and you can add declare dependency in the top like slash steps log for J, slash steps walkers, whatever. And I was like, “Oh wait, I can now experiment. I can try new libraries in seconds”.

Rather than star dot E. New job project, find dependency … I could just literally just boot up from it. And then Taco, who’s my partner in crime and later version of JBang, started adding this automatic downloading of the ADK. So you didn’t even have to have Java installed before starting with JBang. So this whole having to pick which version of Java you download. It doesn’t matter. You just go and get adoption of time or Temurin, the latest version. And now, you could start with Java with just having notepad on the machine, which is a good old Java back in 20 years ago. Then the next thing was I can actually install the VS code Java editing for you. So suddenly JBang was not just able to explore new Java libraries.

But I could give it to a student and I could install and enable Java anywhere, whether that was on my grandma’s PC or my new laptop or some cloud IDE out there because they were just these simulant and it just grew from there, That JBang is nothing but a wrap around Java C and Java and JShell for those who want to do interactive stuff. And I just kept that principle. So the cool thing is almost anything that the Java team has come up with, I either already had in JBang or can assimilate to JBang. So JBang works from Java 8 to Java … the latest early access version. It could honestly work back to Java 6 or even older, but it starts being hard to compile stuff in that range. But fundamentally, it is a very thin layer on top.

And it’s funny, this love hate thing, some people love it, some people hate it because it doesn’t come with a default six nested directories for source main Java IO, Acme green, XYZ. I can just have one file and this can be small script or it can be a full end microservice written in Caucus. I like it and I’m biased, but-

All the hacks and tricks of other software ecosystems like Node and Python were used as inspiration to make Jbang the easiest system to bootstrap in the world [36:05]

Olimpiu Pop: That’s the magic of the world. We have to live with each other even if we have different opinions, right? So JBang would be a proper tool to just bootstrap yourself if you want to do Java development, as long as you have an internet connection more or less, and then just run it and then everything will just work out of the box.

Max Rydahl Andersen: I have this claim, in Caucus, you have to claim that if you find out that Caucus is slower than spring, that’s a bug because fundamentally it cannot be the way it’s architecture. And with JBang, I’d say if you find any development environment in the world that is easier to install than this one, I consider that a bug. I’m a heavy user of Python. I’m heavy user of Node.js and well, I’m not saying here, but for front end development I’ve used it and go. And all those tricks and tools they use to do things, I’ve applied to JBang. So it just becomes so easy to set up and it can self bootstrap, but this is the fight I’m fighting. People are saying, JBang is not part of Java, so I can’t use it. And I’m like, Maven is not part of Java or Gradle is not part of Java. You use that.

That’s because people are like, that’s what I’m used to. You look at Python PIP install is not part of … Well it is now, but PIP started as something that’s outside Python. NPM packs managers are not part of the original Node.js setup. It’s funny. People just, no, this doesn’t fit into the complexity of Java and therefore it’s kind of an outcast.

Olimpiu Pop: Yes, well that’s usually the way … there are the people that are pioneers. And then, if you look at the hype cycle as well, then you have the really early adopters, but most of the people are in the middle. They’re either late adopters or they are just waiting for the scale to be in the other direction.

Max Rydahl Andersen: Yes.

Olimpiu Pop: How did you build JBang so that you ensure that things are safe but also simple? Because simple, it’s obviously they are.

Max Rydahl Andersen: Well, that’s the thing. We can still do more, but JBang itself is using the Maven resolver itself. So anything that the Maven Resolver does to security is there. So JBang doesn’t do anything that is outside normal Maven, so to speak. One thing I haven’t enabled yet, but I do want to get to is shard, like verifying of the shard signature, but Maven today, and even Gradle doesn’t do that by default because it slows things down. But stuff that JBang does that is unrelated to Maven is, for example, I can say JBang a file on disk. Again, no different than you can do with Java. You can say Java, single file Java, it will run what’s on disk. So that’s no less or worse than Java.

But then I can use a URL, JBang URL. And people go like, “Oh, I can run random stuff on internet”, but if you go look, JShell actually also can run URLs. People don’t know. You can go to JShell HPS something. There’s no check whatsoever in doing that. In JBang, we have this notion of trusted URLs so that the first time you go and get it will ask you, is this a thing you trust? And you say yes, then it’s fine.

Olimpiu Pop: So that means that you have to whitelist it more or less, right?

Max Rydahl Andersen: Yes. It’s kind of like, if you ever use VS code and it opens a URL, it says, do you trust this link? And then you say yes. And then from that point on, you can access that just fine. You can also trust the whole domain and that kind of thing. So I have that kind of notion. So in that sense, I feel like it’s safer than even NPM and PIP and other stuff out there. And I’m sure there’s stuff you might have missed and stuff, but I encourage people if they see there’s a security concern, do open it. But as far as I know, we are not doing anything, not done anywhere else. And the places we could do stuff, we’ve added stuff in. This is the fun thing. People realize, what’s it called, this supply chain attack.

That you can add stuff like, “Hey, you can patch a Maven plugin”. If you somehow can get that released to a repository that you’re billing from, then boom, you have an attack vector. And people look at JBang and they realize, “Oh, I can do any kind of stuff here”. But that’s because it’s so easy to do where it’s just as easy in Maven or Gradle to do things like there’s a lot of other products, I guess reproducers just attach to the project and even me, I just look quickly and go, “Okay, I’ll just run it”. But anyone could have put some code in there that are tricky. So I’m getting more and more cautious about that. But the world of software is a dangerous place to live, let’s just say.

Always learn from other ecosystems and improve the way how you work. Use inspiration from everywhere [41:00]

Olimpiu Pop: Fair enough. One last thing that you think that we as an industry are doing too complex things when there are simple alternatives?

Max Rydahl Andersen: Well, again, for me, we started a bit in the beginning, but it’s something that’s on my mind again and again, is this thing that just because there is this hype, people are just attracted to it like moths from somewhere and they don’t realize that, hey, you know what? There’s actually already stuff out there that can do this stuff. Combine the two instead, you’ll a better result. But also at the same time, I also very much aware of applying it to myself. Being like, “Hey, I’m so used to things, how it works in Java, remember to go look and how it is actually in the Python land and see how it is”. Because there is actually also simplicity over there. There’s a reason why it’s so big. So just as I’ve said, hey don’t diss Java and diss the Java ecosystem, it can actually be simplified and be good.

There’s still stuff to learn from that new world. There’s complexity everywhere. There’s also some place hidden in some of these new places coming up.

Olimpiu Pop: So just listening to you, one metaphor was coming over and over in my head, or maybe it’s not a metaphor, it’s more a comparison, is that whenever I was working with JavaScript and my colleagues from JavaScript side were coming and they’re like, “Check out this new thing with, I don’t know, Nest.js or whatever other new stuff”. And I said, “Yes, well that’s dependency injection and it’s in Java for ages or mappers, or arms like hibernate”. Well, that’s already existing for a long time, but my feeling is that also this old dog, that Java and that’s therefore forever can learn new stuff from the newer generations. And that’s very nice to see these symbiosis in the ecosystem.

Max Rydahl Andersen: A hundred percent. The day we don’t learn from others and improve them there, that’s the day that Java dies and that’s why I think Java is very much alive because we keep learning and keep adapting. It’s a different world and we can definitely learn and improve and we are. At least, I’m trying to do my best in helping in that space.

Olimpiu Pop: Great. Thank you, Max, for the time. Thank you for sharing all these thoughts. Any last things that you would like to share?

Max Rydahl Andersen: Well, if you find some of this stuff interesting, definitely check out Quarkus and explore. Try it out at least. And yes, for JBang, anyone who’s doing Java, whether it’s with Quarkus or Spring or Micronaut or Python or Kotlin or Groovy and-

Olimpiu Pop: Whatever other things are there.

Max Rydahl Andersen: JBang actually has a place to do because … Give it a try. Tell me what you think, even if you don’t like it. Learn from it, every time.

Olimpiu Pop: Great. Thank you for taking the time, Max. Thank you everybody. And just make sure that you listen to the In InfoQ podcast. We have great things to share.

Mentioned:

About the Author

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Spring News Roundup: Milestone Releases of Boot, Security, Auth Server, GraphQL, Integration, AMQP

MMS Founder
MMS Michael Redlich

Article originally posted on InfoQ. Visit InfoQ

There was a flurry of activity in the Spring ecosystem during the week of March 17th, 2025, highlighting milestone releases of: Spring Boot, Spring Security, Spring Authorization Server, Spring for GraphQL, Spring Integration, Spring AMQP, Spring for Apache Kafka and Spring Web Services.

Spring Boot

The third milestone release of Spring Boot 3.5.0 delivers bug fixes, improvements in documentation, dependency upgrades and many new features such as: a new LLdapDockerComposeConnectionDetailsFactory class that adds ServiceConnection support for Light LDAP Implementation for Authentication; improved support for OpenTelemetry by correctly using the service.namespace service attribute; and improved support for Spring Batch with enhancements and new properties. More details on this release may be found in the release notes.

Similarly, the release of Spring Boot 3.4.4 and 3.3.10 (announced here and here, respectively) provides bug fixes, improvements in documentation, dependency upgrades and an important change where support for the Apache Portable Runtime (APR) for Tomcat is now disabled by default with applications running on JDK 24 and above to prevent the JDK from issuing warnings. Further details on these releases may be found in the release notes for version 3.4.4 and version 3.3.10.

Spring Framework

The release of Spring Framework 6.2.5 provides bug fixes, improvements in documentation, one dependency upgrade and new features such as: the comment() method, defined in the ServerResponse.SseBuilder interface, now allows an empty comment; and an instance of the FormHttpMessageConverter class should throw HttpMessageNotReadableException when the HTTP form data is invalid because it is a more specific exception that will allow developers to better target and react to invalid request payloads. More details on this release may be found in the release notes.

Spring Cloud

The release of Spring Cloud 2024.0.1, codenamed Mooregate, features bug fixes and notable updates to sub-projects: Spring Cloud Kubernetes 3.2.1; Spring Cloud Function 4.2.2; Spring Cloud OpenFeign 4.2.1; Spring Cloud Stream 4.2.1; and Spring Cloud Gateway 4.2.1. This release is based upon Spring Boot 3.4.3. Further details on this release may be found in the release notes.

Spring Security

The third milestone release of Spring Security 6.5.0 delivers bug fixes, dependency upgrades and new features such as: support for RFC 9068, JSON Web Token (JWT) Profile for OAuth 2.0 Access Tokens; deprecation of the ConfigAttribute interface as modern Spring Security APIs no longer share a common interface to represent configuration values; and support for automatic context-propagation with Micrometer. More details on this release may be found in the release notes.

Spring Authorization Server

The second milestone release of Spring Authorization Server 1.5.0 ships with bug fixes, dependency upgrades and new features such as: improvements to the JdbcOAuth2AuthorizationService class that define and use constants for the SQL parameter mapping values; and support for RFC 9126, OAuth 2.0 Pushed Authorization Requests. Further details on this release may be found in the release notes.

Spring for GraphQL

The first milestone release of Spring for GraphQL 1.4.0 provides dependency upgrades and new features such as: an alignment with the GraphQL over HTTP draft specification; and improved Federation support by upgrading to Apollo GraphQL Federation 5.3.0. More details on this release may be found in the release notes.

Spring Integration

The third milestone release of Spring Integration 6.5.0 delivers bug fixes, improvements in documentation, dependency upgrades and new features such as: enabling the LastModifiedFileListFilters class to discard files that have aged out; and removal of the deprecated getSendTimeout() and setSendTimeout() methods, previously defined in the PollerMetadata class. Further details on this release may be found in the release notes.

Spring Modulith

The third milestone release of Spring Modulith 1.4.0 delivers bug fixes, dependency upgrades and new features such as: integration tests using the @ApplicationModuleTest annotation may now consume bean instances of classes declared in test sources; and registration of the AssertablePublishedEvents interface in tests using the Spring Framework ApplicationContext interface if AssertJ is on the classpath. More details on this release may be found in the release notes.

Similarly, the release of Spring Modulith 1.3.4 and 1.2.10 provide dependency upgrades and a resolution to a severe performance regression in JavaPackage class when testing an instance of the Documenter class. Further details on these releases may be found in the release notes for version 1.3.4 and version 1.2.10.

Spring Batch

The release of Spring Batch 5.2.2 provides bug fixes, improvements in documentation, dependency upgrades and improvements such as: the addition of AOT hints in infrastructure artifacts and core listeners that were previously missing; and an improved ChunkProcessor interface as it is now annotated with the Java @FunctionalInterface. More details on this release may be found in the release notes.

Spring AMQP

The second milestone release of Spring AMQP 4.0.0 delivers bug fixes, dependency upgrades and new features such as: support for the AMQP 1.0 protocol on RabbitMQ with a new spring-rabbitmq-client module; and support for RPC in the new RabbitAmqpTemplate class. Further details on this release may be found in the release notes.

Spring for Apache Kafka

The first milestone release of Spring for Apache Kafka 4.0.0 delivers bug fixes, improvements in documentation, dependency upgrades and new features such as: a migration of all the former org.springframework.lang nullability annotations to the JSpecify-based null safety improvements; and improved performance of the acknowledge(int index) method and an override of the createRecordList() methods, defined in the KafkaMessageListenerContainer class. This version is compatible with Spring Framework 7.0.0-M3. More details on this release may be found in the release notes.

Spring for Apache Pulsar

The release of Spring for Apache Pulsar 1.2.4 and 1.1.10 provides notable respective dependency upgrades to: Spring Framework 6.2.4 and 6.1.18; Project Reactor 2024.0.4 and 2023.0.16; and Micrometer 1.14.5 and 1.13.12. These releases are included in Spring Boot 3.4.4 and 3.3.10, respectively. Further details on these releases may be found in the release notes for version 1.2.4 and version 1.1.10.

Spring Web Services

The first milestone release of Spring Web Services 4.1.0 delivers bug fixes, dependency upgrades and new features such as: a reinstatement of support for Apache Axiom as the recent release of Axiom 2.0.0 now supports Jakarta EE; and the deprecation of the WsConfigurerAdapter class as it is no longer needed due to the introduction of default methods. More details on this release may be found in the release notes.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MongoDB, Inc. (NASDAQ:MDB) Shares Acquired by Virtu Financial LLC – MarketBeat

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Virtu Financial LLC increased its stake in shares of MongoDB, Inc. (NASDAQ:MDBFree Report) by 101.7% in the 4th quarter, according to the company in its most recent Form 13F filing with the SEC. The firm owned 20,202 shares of the company’s stock after purchasing an additional 10,186 shares during the quarter. Virtu Financial LLC’s holdings in MongoDB were worth $4,703,000 as of its most recent filing with the SEC.

Other institutional investors have also made changes to their positions in the company. B.O.S.S. Retirement Advisors LLC acquired a new stake in shares of MongoDB during the fourth quarter worth approximately $606,000. Geode Capital Management LLC lifted its stake in shares of MongoDB by 2.9% in the third quarter. Geode Capital Management LLC now owns 1,230,036 shares of the company’s stock worth $331,776,000 after buying an additional 34,814 shares during the last quarter. Charles Schwab Investment Management Inc. lifted its stake in shares of MongoDB by 2.8% in the third quarter. Charles Schwab Investment Management Inc. now owns 278,419 shares of the company’s stock worth $75,271,000 after buying an additional 7,575 shares during the last quarter. Union Bancaire Privee UBP SA acquired a new stake in shares of MongoDB in the fourth quarter worth $3,515,000. Finally, Nisa Investment Advisors LLC lifted its stake in shares of MongoDB by 428.0% in the fourth quarter. Nisa Investment Advisors LLC now owns 5,755 shares of the company’s stock worth $1,340,000 after buying an additional 4,665 shares during the last quarter. 89.29% of the stock is currently owned by institutional investors and hedge funds.

Insider Buying and Selling at MongoDB

In related news, CEO Dev Ittycheria sold 8,335 shares of the stock in a transaction that occurred on Wednesday, February 26th. The stock was sold at an average price of $267.48, for a total transaction of $2,229,445.80. Following the completion of the sale, the chief executive officer now directly owns 217,294 shares of the company’s stock, valued at approximately $58,121,799.12. This represents a 3.69 % decrease in their position. The transaction was disclosed in a document filed with the Securities & Exchange Commission, which can be accessed through this link. Also, Director Dwight A. Merriman sold 885 shares of the stock in a transaction that occurred on Tuesday, February 18th. The stock was sold at an average price of $292.05, for a total value of $258,464.25. Following the sale, the director now directly owns 83,845 shares of the company’s stock, valued at approximately $24,486,932.25. The trade was a 1.04 % decrease in their ownership of the stock. The disclosure for this sale can be found here. In the last three months, insiders have sold 43,139 shares of company stock worth $11,328,869. Company insiders own 3.60% of the company’s stock.

MongoDB Stock Performance

MongoDB stock traded up $3.24 during trading hours on Friday, reaching $192.54. 2,184,849 shares of the company’s stock traded hands, compared to its average volume of 1,701,319. MongoDB, Inc. has a fifty-two week low of $173.13 and a fifty-two week high of $387.19. The firm has a market capitalization of $14.34 billion, a price-to-earnings ratio of -70.27 and a beta of 1.30. The firm’s 50 day moving average is $250.97 and its 200-day moving average is $269.40.

MongoDB (NASDAQ:MDBGet Free Report) last released its earnings results on Wednesday, March 5th. The company reported $0.19 earnings per share (EPS) for the quarter, missing analysts’ consensus estimates of $0.64 by ($0.45). MongoDB had a negative net margin of 10.46% and a negative return on equity of 12.22%. The business had revenue of $548.40 million for the quarter, compared to analyst estimates of $519.65 million. During the same quarter in the previous year, the company posted $0.86 earnings per share. Analysts predict that MongoDB, Inc. will post -1.78 EPS for the current year.

Wall Street Analyst Weigh In

A number of analysts have recently weighed in on MDB shares. DA Davidson boosted their price objective on shares of MongoDB from $340.00 to $405.00 and gave the stock a “buy” rating in a research note on Tuesday, December 10th. The Goldman Sachs Group reduced their price objective on shares of MongoDB from $390.00 to $335.00 and set a “buy” rating for the company in a research note on Thursday, March 6th. Tigress Financial boosted their price objective on shares of MongoDB from $400.00 to $430.00 and gave the stock a “buy” rating in a research note on Wednesday, December 18th. Truist Financial dropped their target price on shares of MongoDB from $400.00 to $300.00 and set a “buy” rating on the stock in a research report on Thursday, March 6th. Finally, Robert W. Baird dropped their target price on shares of MongoDB from $390.00 to $300.00 and set an “outperform” rating on the stock in a research report on Thursday, March 6th. Seven research analysts have rated the stock with a hold rating and twenty-three have assigned a buy rating to the stock. According to data from MarketBeat, MongoDB presently has an average rating of “Moderate Buy” and a consensus price target of $320.70.

Get Our Latest Analysis on MongoDB

About MongoDB

(Free Report)

MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

Featured Stories

Institutional Ownership by Quarter for MongoDB (NASDAQ:MDB)

Before you consider MongoDB, you’ll want to hear this.

MarketBeat keeps track of Wall Street’s top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis. MarketBeat has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on… and MongoDB wasn’t on the list.

While MongoDB currently has a Moderate Buy rating among analysts, top-rated analysts believe these five stocks are better buys.

View The Five Stocks Here

Reduce the Risk Cover

Market downturns give many investors pause, and for good reason. Wondering how to offset this risk? Enter your email address to learn more about using beta to protect your portfolio.

Get This Free Report

Like this article? Share it with a colleague.

Link copied to clipboard.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


1,797 Shares in MongoDB, Inc. (NASDAQ:MDB) Purchased by Vinva Investment Management Ltd

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Vinva Investment Management Ltd purchased a new position in MongoDB, Inc. (NASDAQ:MDBFree Report) during the 4th quarter, according to the company in its most recent 13F filing with the Securities & Exchange Commission. The fund purchased 1,797 shares of the company’s stock, valued at approximately $420,000.

Other institutional investors also recently modified their holdings of the company. B.O.S.S. Retirement Advisors LLC acquired a new position in shares of MongoDB in the 4th quarter worth $606,000. Geode Capital Management LLC increased its stake in MongoDB by 2.9% in the 3rd quarter. Geode Capital Management LLC now owns 1,230,036 shares of the company’s stock worth $331,776,000 after purchasing an additional 34,814 shares in the last quarter. Charles Schwab Investment Management Inc. raised its holdings in MongoDB by 2.8% during the 3rd quarter. Charles Schwab Investment Management Inc. now owns 278,419 shares of the company’s stock worth $75,271,000 after buying an additional 7,575 shares during the period. Union Bancaire Privee UBP SA purchased a new stake in MongoDB during the fourth quarter valued at about $3,515,000. Finally, Nisa Investment Advisors LLC boosted its holdings in shares of MongoDB by 428.0% in the fourth quarter. Nisa Investment Advisors LLC now owns 5,755 shares of the company’s stock worth $1,340,000 after buying an additional 4,665 shares during the period. Institutional investors own 89.29% of the company’s stock.

Insiders Place Their Bets

In related news, CFO Michael Lawrence Gordon sold 1,245 shares of the stock in a transaction that occurred on Thursday, January 2nd. The shares were sold at an average price of $234.09, for a total value of $291,442.05. Following the sale, the chief financial officer now directly owns 79,062 shares in the company, valued at $18,507,623.58. This trade represents a 1.55 % decrease in their position. The sale was disclosed in a legal filing with the SEC, which is available at this link. Also, insider Cedric Pech sold 287 shares of the business’s stock in a transaction that occurred on Thursday, January 2nd. The shares were sold at an average price of $234.09, for a total transaction of $67,183.83. Following the transaction, the insider now owns 24,390 shares of the company’s stock, valued at approximately $5,709,455.10. This represents a 1.16 % decrease in their position. The disclosure for this sale can be found here. In the last quarter, insiders have sold 43,139 shares of company stock valued at $11,328,869. 3.60% of the stock is currently owned by corporate insiders.

Wall Street Analysts Forecast Growth

Several brokerages recently commented on MDB. Truist Financial reduced their price target on shares of MongoDB from $400.00 to $300.00 and set a “buy” rating for the company in a research report on Thursday, March 6th. Scotiabank reaffirmed a “sector perform” rating and issued a $240.00 target price (down previously from $275.00) on shares of MongoDB in a research report on Wednesday, March 5th. Wells Fargo & Company lowered MongoDB from an “overweight” rating to an “equal weight” rating and dropped their price target for the company from $365.00 to $225.00 in a research report on Thursday, March 6th. Morgan Stanley decreased their price objective on MongoDB from $350.00 to $315.00 and set an “overweight” rating for the company in a research report on Thursday, March 6th. Finally, KeyCorp downgraded MongoDB from a “strong-buy” rating to a “hold” rating in a report on Wednesday, March 5th. Seven equities research analysts have rated the stock with a hold rating and twenty-three have assigned a buy rating to the company. According to MarketBeat, the company has a consensus rating of “Moderate Buy” and a consensus target price of $320.70.

Read Our Latest Research Report on MDB

MongoDB Price Performance

Shares of NASDAQ:MDB traded up $3.24 during midday trading on Friday, reaching $192.54. The company’s stock had a trading volume of 2,184,849 shares, compared to its average volume of 1,701,319. The business’s fifty day simple moving average is $250.97 and its 200-day simple moving average is $269.55. MongoDB, Inc. has a fifty-two week low of $173.13 and a fifty-two week high of $387.19. The company has a market capitalization of $14.34 billion, a P/E ratio of -70.27 and a beta of 1.30.

MongoDB (NASDAQ:MDBGet Free Report) last announced its earnings results on Wednesday, March 5th. The company reported $0.19 earnings per share for the quarter, missing analysts’ consensus estimates of $0.64 by ($0.45). MongoDB had a negative return on equity of 12.22% and a negative net margin of 10.46%. The company had revenue of $548.40 million during the quarter, compared to the consensus estimate of $519.65 million. During the same period in the prior year, the firm posted $0.86 earnings per share. On average, analysts forecast that MongoDB, Inc. will post -1.78 earnings per share for the current fiscal year.

MongoDB Company Profile

(Free Report)

MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

Further Reading

Institutional Ownership by Quarter for MongoDB (NASDAQ:MDB)

Before you consider MongoDB, you’ll want to hear this.

MarketBeat keeps track of Wall Street’s top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis. MarketBeat has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on… and MongoDB wasn’t on the list.

While MongoDB currently has a Moderate Buy rating among analysts, top-rated analysts believe these five stocks are better buys.

View The Five Stocks Here

20 High-Yield Dividend Stocks that Could Ruin Your Retirement Cover

Almost everyone loves strong dividend-paying stocks, but high yields can signal danger. Discover 20 high-yield dividend stocks paying an unsustainably large percentage of their earnings. Enter your email to get this report and avoid a high-yield dividend trap.

Get This Free Report

Like this article? Share it with a colleague.

Link copied to clipboard.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


GPT-4o Code Completion Model Now Available in Public Preview for VS Code Copilot

MMS Founder
MMS Aditya Kulkarni

Article originally posted on InfoQ. Visit InfoQ

Recently, GPT-4o Copilot is introduced for Visual Studio Code (VS Code) users. This AI model is built upon the GPT-4o mini foundation and includes extensive training from over 275,000 high-quality public repositories across more than 30 widely used programming languages. The enhanced training is expected to provide more accurate and contextually relevant code suggestions, with improved performance, boosting developer productivity and aiding the coding process.

This announcement was made through a changelog post on the GitHub blog. The GPT-4o Copilot model differentiates itself through some key enhancements. Utilizing a vast dataset of high-quality code offers more precise and contextually relevant code completions. Its architecture and training also enable faster and more efficient code suggestion generation. The model can support various development projects with training across over 30 programming languages.

To integrate GPT-4o Copilot into VS Code, users can access the Copilot menu in the VS Code title bar, select Configure Code Completions, and then choose Change Completions Model. Alternatively, users can choose the Command Palette by opening it and selecting GitHub Copilot: Change Completions Model. Once in the model selection menu, users can choose the GPT-4o Copilot model from the available options.

For Copilot Business and Enterprise users, administrators must first enable Editor preview features within the Copilot policy settings on github.com to grant users access to the new model. Meanwhile, for Copilot Free users, using this model will count toward their 2,000 free monthly completions. The model will soon be available to Copilot users in all JetBrains Integrated Development Environments (IDEs), further expanding its reach across different platforms.

In JetBrains IDEs, users can click the icon in the status bar, navigate to the settings dialog box for Languages & Frameworks > GitHub Copilot, and select the preferred model from the dropdown menu.

Covering this announcement as a part of the Tech Insights 2025 Week 9 newsletter, Johan Sanneblad, CEO & co-founder at ToeknTek, said,

GitHub Copilot is quickly catching up to Cursor IDE which is already packed with custom models. From what I can see there are just two main features missing in Copilot: Prompt caching for performance and a local model for code merging. Once it gets those two features I think I would feel equally at home in Visual Studio Code + Copilot as with Cursor. And for all you Java and C# users, this is the update you have been waiting for. We finally have a good code completion model with good support for C++, C# and Java.

GitHub Copilot was also in the news as it introduced Next Edit Suggestions, which can predict and suggest logical edits based on the context of ongoing changes in the code. It can identify potential modifications across an entire file, offering suggestions for insertions, deletions, and replacements. Developers can navigate through these suggestions using the Tab key, streamlining the editing process and potentially saving significant time

It’s important to note that switching the AI model does not affect the model used by Copilot Chat. The data collection and usage policy remains unchanged regardless of the chosen model, and the setting to enable or disable suggestions that match public code applies regardless of the chosen model.

User feedback is essential for refining and enhancing the GPT-4o Copilot model. Developers are encouraged to share their experiences to help improve the model’s performance and usability for all Copilot users.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


How GitHub Leverages CodeQL for Security

MMS Founder
MMS Craig Risi

Article originally posted on InfoQ. Visit InfoQ

GitHub’s Product Security Engineering team secures the code behind GitHub by developing tools like CodeQL to detect and fix vulnerabilities at scale. They’ve shared insights into their approach so other organizations can learn how to use CodeQL to better protect their own codebases.

CodeQL enables automated security analyses by allowing users to query code in a way similar to querying a database. This method is more effective than simple text-based searches as allows it to follow how data moves through the code, spot insecure patterns, and detect vulnerabilities that wouldn’t be obvious from text alone. This provides a deeper understanding of code patterns and uncovers potential security issues.

The team employs CodeQL in various ways to ensure the security of GitHub’s repositories. The standard configuration uses default and security-extended query suites, which are sufficient for the majority of the company’s repositories. This setup allows CodeQL to automatically review pull requests for security concerns.

For certain repositories, such as GitHub’s large Ruby monolith, additional measures are required. In these cases, the team uses a custom query pack tailored to specific security needs. Additionally, multi-repository variant analysis (MRVA) is used to conduct security audits and identify code patterns that warrant further investigation. Custom queries are written to detect potential vulnerabilities unique to GitHub’s codebase.

Initially, custom CodeQL queries were published directly within the repository. However, this approach presented several challenges, including the need to go through the production deployment process for each update, slower analysis times in CI, and issues caused by CodeQL CLI updates. To address these challenges, the team transitioned to publishing query packs in the GitHub Container Registry (GCR). This change streamlined the process, improved maintainability, and reduced friction when updating queries.

When developing a custom query pack, consideration is given to dependencies such as the ruby-all package. By extending classes from the default query suite, the team avoids unnecessary duplication while maintaining concise and effective queries. However, updates to the CodeQL library API can introduce breaking changes, potentially affecting query performance. To mitigate this risk, the team develops queries against the latest version of ruby-all but locks a specific version before release. This ensures that deployed queries run reliably without unexpected issues arising from unintended updates.

To maintain query stability, unit tests are written for each new query. These tests are integrated into the CI pipeline for the query pack repository, enabling early detection of potential issues before deployment. The release process involves several steps, including opening a pull request, writing unit tests, merging changes, incrementing the pack version, resolving dependencies, and publishing the updated query pack to GCR. This structured approach balances development flexibility with the need for stability.

The method of integrating the query pack into repositories depends on the organization’s deployment strategy. Rather than locking a specific version of the query pack in the CodeQL configuration file, GitHub’s security team opted to manage versioning through GCR. This approach allows repositories to automatically use the latest published version while providing the ability to quickly roll back changes if necessary.

One challenge encountered when publishing query packs in GCR was ensuring accessibility across multiple repositories within the organization. Several solutions were considered, including manually granting access permissions, using personal access tokens, and linking repositories to the package for inherited access permissions. The team ultimately implemented the linked repository approach, which efficiently managed permissions across multiple repositories without manual intervention.

GitHub’s security team writes a variety of custom queries to enhance security analysis. These queries focus on identifying high-risk APIs, enforcing secure coding practices, and detecting missing authorization controls in API endpoints. Some queries serve as educational tools rather than strict enforcement mechanisms, using lower severity levels to alert engineers without blocking deployments. This approach allows developers to assess security concerns while ensuring that the most critical vulnerabilities are addressed promptly.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Rebuilding Prime Video UI with Rust and WebAssembly

MMS Founder
MMS Alexandru Ene

Article originally posted on InfoQ. Visit InfoQ

Transcript

Ene: We’re going to talk about how we rebuilt a Prime Video UI for living room devices with Rust and WebAssembly, and the journey that got us there. I’m Alex. I’ve been a principal engineer with Amazon for about eight years. We’ve been working with Rust for a while actually in our tech stack for the clients. We had our low-level UI engine in WebAssembly and Rust for that log. Previously I worked on video games, game engines, interactive applications like that. I’ve quite a bit of experience in interactive applications.

Content

I’ll talk about challenges in this space, because living room devices are things like set top boxes, gaming consoles, streaming sticks, TVs. People don’t usually develop UIs for these devices, and they come with their own special set of challenges, so we’re going to go through those. Then I’ll show you how our architecture for the Prime Video App looked before we rewrote everything in Rust. We had a dual tech stack with the business code in React and JavaScript, and then low-level bits of the engine in Rust and WebAssembly, a bit of C++ in there as well. Then I’ll show you some code with our new Rust UI SDK and how that looks, which is what we use right now in production. We’re going to talk a little bit of how that code works with our low-level existing engine and how everything is organized. At the end, we’re going to go a little bit to results, lessons learned.

Challenges In This Space

Living room devices, so as I said, these are gaming consoles, streaming sticks, set top boxes. They come with their own challenges, and some of them are obvious. There’s performance differences that are huge. We’re talking about a PlayStation 5 Pro, super nice gaming console, lots of power, but also a USB power streaming stick. Prime Video, we run the same application on all of these device types. Obviously, performance is really important for us. We can’t quite have teams per device type, so one team that does set top boxes, then another team does gaming consoles, because then everything explodes. When you build a feature, you have to build it for everything. We were building things once and then deploying on all of these device categories here. We don’t deploy this application that I’m talking about on mobile devices, like iPhone, iOS, mobile devices don’t have this. This is just living room devices. Again, a huge range of performance. We’re trying to write our code as optimal as possible.

Usually, high performant code is code that you compile natively. Let’s say Rust compiled to native, C++ compiled to native, but that doesn’t quite cut it in this space and we’ll see why. Another pain point and challenge is hardware capabilities between these devices is a pain. As SDK developers, we need to think a lot about what are reasonable fallbacks that application developers who write the app code and the app behavior, they don’t need to think about when they write that code and every little hardware difference. We try to have some reasonable defaults. That’s not always possible, so we use patterns like feature flags and things like that to let them have a bit more control. It’s a fairly challenging thing.

Another thing is we’re trying to make this application as fast as possible with as many features as possible to every customer, but then updating native code on these device types is really hard. Part of that is these devices don’t even have app stores, most of them. Maybe it goes up with a firmware update. That’s a pain. It requires a manual process interacting with a third party that owns the platform. Even on places that do have an app store, if you try to update an app on an app store, it’s quite a challenge as well. You need to wait. It’s highly likely a manual process. We’re having this tension between code that we’re downloading over the air, like JavaScript, WebAssembly, and so on, fairly easy, and then code that works on a device that is very fast, but then really hard to update. We want to have this fast iteration cycle. Updating the app in a short amount of time is huge for us.

Again, this is how the application looks like today. I’ve been there eight years and we changed it many times. I’m sure it’s going to change again sometime as it happens with the UIs. We’ve added things to it like channels, live events, all sorts of new features that weren’t there in the starting version. Part of us being able to do that was this focus on updatability that we had all the way in the beginning. Most of these applications were in a language like JavaScript that we can change basically everything on it and add all of these features almost without a need to go and touch the low-level native code. I’ll show you the architecture and how it looks like.

Today, if a developer adds some code, changes a feature, fixes a bug, does anything around the UI, that code goes through a fully CI/CD pipeline, no manual testing whatsoever. We test on virtual devices like Linux and on physical devices where we have a device farm. Once all of those tests pass, you get this new experience on your TV in your living room. That is way faster than a native app update for that platform.

Right now, you’ll see it working and you’ll see a couple of features. This is a bunch of test profiles I was making because I was testing stuff. We have stuff like layout animation, so the whole layout gets recalculated. This is the Rust app in production today. Layout animations are a thing that was previously impossible with JavaScript and React, and now they just work. When you see this thing getting bigger, all the things get reordered on the page. These are things that are just possible right now due to the performance of Rust. Almost instant page transitions as well are things that weren’t possible with TypeScript and React due to performance constraints. This is live today and this is how it looks like, so you have an idea on what is going on in there. We’re going to get a little bit back to layout animations and those things later. For people who are not UI engineers, or don’t quite know, this is the slide that will teach you everything you need to know about UI programming.

Basically, every UI ever is a tree of nodes, and the job of a UI SDK is to manipulate as fast as possible this tree as a reaction to user inputs or some things that happen like some events. You either change properties on nodes, like maybe you animate a value like a position, and then the UI engine needs to take care of updating this tree and creating new nodes, deleting new nodes, depending on what the business logic code tells you to do. Those nodes could be view nodes that are maybe just a rectangle. Then, text nodes are quite common, and image nodes, those type of things. Nothing too complicated. It’s really annoying that it’s a tree, but we’re going to move on because we’re still having a tree, even in our Rust app we didn’t innovate there, but it’s just important to have this mental model. We call this in our UI engine, a scene tree, browsers call it a DOM, but it’s basically the same thing everywhere.

High-Level Architecture

This is the high-level architecture before we rewrote everything in Rust. As you can see, we already added Rust, I think two years, three years ago, we already had it there for the low-level UI engine. There’s a QCon talk about this journey. There’s another part which is saying JavaScript here, but actually developers write TypeScript, that has the business logic code for the Prime Video App. This is the stuff we download. This is downloaded every time the application changes. This is what we output at the end of that full CI/CD pipeline. It’s a bundle that has some WebAssembly compiled Rust code and some JavaScript that came from TypeScript and got transpiled to JavaScript. It maybe changes once per day, sometimes even more, sometimes less, depending on our pipelines and if the tests pass or not, but it’s updated very frequently on all of the devices that I spoke about, the device categories.

Then we have the stuff on device in our architecture. We’re trying to keep it as thin as possible because it’s really hard to update, so the less we touch this code, the better. It has a couple of virtual machines, some rendering backend, which mainly connects the higher-level stuff we download to things like OpenGL and other graphics APIs, networking. This is basically cURL. Some media APIs and storage and a bunch of other things, but they’re in C++. We deploy them on a device and they sit there almost untouched unless there’s some critical bug that needs to be fixed or some more tricky thing. This is how things work today.

Prime Video App, (Before) With React and WebAssembly

You might wonder, though, these are two separate virtual machines, so how do they actually work together? We’re going to go through an example of how things worked before with this tech stack. The Prime Video App here takes high-level decisions, like what to show the user, maybe some carousels, maybe arrange some things on the screen. Let’s say in this example, he wants to show some image on your TV. The Prime Video App is built with React. We call it React-Livingroom because it’s a version of React that we’ve changed and made it usable for living room devices by pairing them some features, simplifying them, and also writing a few reconcilers because we have this application that works on this type of architecture, but also in browsers because some living room devices today have just an HTML5 browser and don’t even have flash space big enough to put our native C++ engine. We needed that device abstraction here. Prime Video App says, I want to put an image node. It uses React-Livingroom as a UI SDK.

Through the device abstraction layer, we figure out, you have a WebAssembly VM available. At that point in time, instead of doing the actual work, it just encodes a message and puts it on a message bus. This is literally a command which says, create me an image node with an ID, with a URL where we download the image from, some properties with height and position, and the parent ID to put it in that scene tree. The WebAssembly VM has the engine, and this engine has low-level things that actually manage that scene tree that we talked about.

For example, the scene and resource manager will figure out, there’s a new message. I have to create a node, put it in the tree. It’s an image node, so it checks if that image is available or not. It issues a download request. Maybe it animates some properties if necessary. Once the image is downloaded, it gets decoded, uploaded to the GPU memory, and after that, the high-level renderer here, from the scene tree that could be quite big, it figures out what subset of nodes is visible on the screen and then issues commands, the C++ layer, that’s with gray, to draw pixels on the screen. At the end of it all, you’ll have The Marvelous Mrs. Maisel image in there as it should be.

This is how it used to work. When we added Rust here, we had huge gains in animation fluidity and these types of things. However, things like input latency didn’t quite improve, so the input latency is basically the time it takes from when you press a button on your remote control, in our case, until the application responds to your input. That’s what we call input latency. That didn’t improve much or at all because, basically, all of those decisions and all that business logic, like what happens as a response to an input event to the scene tree, is in JavaScript. That’s a fairly slow language, especially since some of this hardware can be as, maybe, dual-core devices with not even 1 gigahertz worth of CPU speed and not much memory.

Actually, those are medium. We have some that are even slower, so running JavaScript on those is time-consuming. We wanted to improve this input latency, and in the end, we did, but we ended up with this architecture. The engine is more or less the same, except we added certain systems that are specific to this application. For example, focus management, layout engine is now part of the engine. I didn’t put it on this slide because it goes into that scene management. On top of it, we built a new Rust UI SDK that we then use to build the application. Everything is now in Rust. It’s one single language. You don’t even have the message bus to worry about. That wasn’t even that slow anyway. It was almost instantaneous. The problem was JavaScript, so we don’t have that anymore. We’re actually not quite here because we are deploying this iteratively, page by page, because we wanted to get this out faster in front of customers, but we will get here early next year.

UI Code Using the New Rust UI SDK

This is going to be a bit intense, but here’s some UI code with Rust, and this is actually working UI code that our UI SDK supports. I’m going to walk you through it because there’s a few concepts here that I think are important. When it comes to UI programming, Rust isn’t known for having lots of libraries, and then the ecosystem is not quite there. We had to build our own. We’d use some ideas from Leptos, like the signals that I’m going to talk about, but this is how things look like today. If you’re familiar with React and SolidJS and those things, you’ll see some familiar things here.

First, you might notice, is that Composer macro over there, that gets attached to this function here that returns a composable. A composable is a reusable piece of tree, of hierarchy of nodes that we can plug in with other reusable bits and compose them, basically, together. This is our way of reusing UI. This Composer macro actually doesn’t do that much except generate boilerplate code that gives us some nicer functionality in that compose macro you see later down in the function. It allows us to have named arguments as well as nicer error messages and optional arguments that might miss for functions.

This is some quality-of-life thing. Then our developers don’t need to specify every argument to these functions, like this hello function here that just takes a name as a string. In this case, the name is mandatory, but we can have optional arguments with optional values that you don’t need to specify. Also, you can specify arguments in any order as long as you name them, and we’ll see that below. It’s just super nice quality-of-life thing. I wish Rust supported this out of the box for functions, but it doesn’t, so this is where we are.

Then, this is the core principle of our UI SDK. It uses signals and effects. The signal is a special value, so this name here will shadow the string above. Basically, this name is a signal, and that means when it changes, it will trigger effects that use it. For example, when this name changes, it will execute the function in this memo, which is a special signal, and it creates a new hello message with the new value the name has been set to. It executes the function you see here. It formats it. It concatenates, and it will give something like Hello Billy, or whatever. Then hello message is a special signal that also will trigger effects. Here you see in this function, we use the hello message.

Whenever the hello message is updated, it will trigger this effect that we call here with create effect. This is very similar to how SolidJS, or if you’re familiar with React, works. Actually, this is quite important because this is also what helps UI engineers be productive in this framework without knowing much Rust actually. The core of our UI engine is signals, effects, and memos, which are special signals that only trigger effects if the values that they got updated to are different from the previous value. By default, they just trigger the effect anyway.

Then, we have this other macro here, which is the compose macro, and this does the heavy lifting. This is where you define how your UI hierarchy looks like. Here we have a row that then has children, which are label nodes. You see here the label has a text that is either a hardcoded value with three exclamation marks as a string, or it can take a signal that wraps a string. The first label here will be updated whenever hello message gets updated. Without the UI engineer doing anything, it just happens automatically that hello message gets updated, the label itself will render the new text, and it just works. If you’re a UI engineer, this is the code you write. It’s fairly easy to understand once you get the idea. Here we have some examples, for example, ChannelCard and MovieCard are just some other functions that allow you to pass parameters like a name, a main_texture, and maybe a title, a main_texture, and so on.

Again, they could have optional parameters that you don’t see here. You can even put signals instead of those hardcoded values. It doesn’t quite matter, it’s just these will be siblings of those two labels. All the way down we have button with a text, that says Click. Then it has a few callbacks on select, on click, and stuff like that, that are functions that get triggered whenever those events happen in the UI engine. For example, whenever we select this button, we set a signal. That name gets set to a new name. This triggers a cascade of actions, hello message gets updated to hello new name. Then, the effects gets trigger because that’s a new value, so that thing will be printed.

Then, lastly, the first label you see here, will get updated to a new value. Lastly, this row has properties or modifiers, so we can modify the background color. In this case, it’s just set to a hardcoded value of blue. However, we support signals to be passed here as well. If you have a color, that’s a signal of a color. Whenever that gets set, maybe on a timer or whatever, the color of the node just gets updated and you pass it here exactly like we set this parameter. That’s another powerful way where we get behavior or effects as a consequence of business logic that happens in the UI code. This is what your engineers deal with, and it’s quite high-level, and it’s very similar to other UI engines, but it’s in Rust this time.

When we run that compose macro, this is how the UI hierarchy will look like in the scene tree. You have the row, and then it has a label. Then labels are special because they’re widgets. Composables can be built out of widgets, which are special composables our UI SDK provides to the engineers, or other composables that eventually are built out of widgets. Widgets are then built out of a series of components. This is fairly important because we use entity component systems under the hood. Components, for now, you can think of them as structures without behavior, so just data without any behavior. The behavior comes from systems that operate on these components.

In this case, this label has a layout component that helps the layout system. A base component, let’s say maybe it has a position, a rotation, things like that. RenderInfo components, this is all the data you need to put the pixels on the screen for this widget once everything gets computed. A text component, this does text layout and things like that. Maybe a text cache component that is used to cache the text in the texture so we don’t draw it letter by letter.

The important bit is that widgets are special because they come as predefined composables from our UI SDK. Then, again, composables can be built out of other composables. This row has a few children here, but eventually it has to have widgets as the leaf nodes because those are the things that actually have the base behavior. Here maybe you have a button and another image, and the button has, all the way down, a focus component. This allows us to focus the button, and it gets automatically used by that system. The image component, again, just stores a URL and maybe the status, has this been downloaded, uploaded to GPU, and so on. It’s fairly simple. Basically, this architecture in our low-level engine is used to manage complexity in behavior. We’ll see a bit later how it works. Then we had another Movie Card in that example and, again, it eventually has to be built out of widgets.

Widgets are the things that our UI SDK provides to UI developers out of the box. They can be row, columns, image, labels, stack, rich text, which is special text nodes that allows you to have images embedded and things like that. Stacks, row list, and column list, and these are scrollable either horizontally or vertically. I think we added grid recently because we needed it for something, but basically, we build this as we build the app. This is what we support now. I think button is another one of them that’s just supported here out of the box that I somehow didn’t put. Then, each widget is an entity ID. It has an ID and a collection of components. Then, the lower-level engine uses systems to modify and update the components. ECS is this entity component system. It’s a way to organize your code and manage complexity without paying that much in terms of performance. It’s been used by game engines, not a lot, but for example, Overwatch used it.

Thief, I think, was the first game in 1998 that used it as a piece of trivia. It’s a very simple idea, which is, you have entities, and these are just IDs that map to components. You have components that are data without behavior. Then you have systems, which are basically functions that act on tuples of components. It always acts on more things at the time, not on one thing at a time. It’s a bit different than the other paradigms. It’s really good to create behavior, because if you want a certain behavior for an entity, you just add the component and then the systems that need that component automatically will just work because the component is there.

Here is how it might work in a game loop. For example, these systems are on the left side and then the components are on the right side. When I say components, you can basically imagine those as arrays and entity IDs as indices in those arrays. It’s a bit more complicated than that, but that’s basically it. Then the things on the left side with the yellow thing, those are systems, and they’re basically functions that operate on those arrays at the same time. Let’s say the resource management system needs to touch base components, image components, and read from them. This reading is with the white arrow, and it will write to RenderInfo components. For example, it will look where the image is, if it’s close to the screen, look at the base component. It looks at the image component that contains the URL. It checks the image status that will be there. Is it downloaded? Has it been uploaded to the GPU? If it has been decoded and uploaded to the GPU, we update the RenderInfo components so we can draw the thing later on the screen.

For this system, you need to have all three components on an entity, at least. You can have more, but we just ignore them. We don’t care. This system just looks at that slice of an object, which is the base components, the image components, and RenderInfo components. You have to have all three of them. If you have only two without the third one, that entity just isn’t touched by this system and it does nothing, the system widget. Then we have the layout system. Of course, this looks at a bunch of components and updates one at the end. It’s quite complicated, but layout is complicated anyway. At least that complication and that complexity sits within a file or a function. You can tell from the signature that this reads from a million things, writes to one, but it is what it is. You can’t quite build layout systems without touching all of those things. Maybe we have a text cache component that looks at text components and writes to a bunch of other things.

Again, you need to have all three of these for an entity such that is updated by this system. All the way at the end, we have the rendering system that looks at RenderInfo components, reads from them. It doesn’t write anywhere because it doesn’t need to update any component. It will just call the functions in the C++ code in the renderer backend to put things on the screen. It just reads through this and then updates your screen with new pixels. It sounds complicated, but it’s a very simple way to organize behavior. This has paid dividends organizing our low-level architecture like this for reasons that we’ll see a little bit later, how and why. Not only for the new application, but also the old application because they use the same low-level engine.

Again, going back to the architecture, this is what we have, Prime Video App at the top. We’ve seen how developers write the UI with composables using our UI SDK. Then we’ve seen how the UI SDK uses widgets that then get updated by the systems, and have components that are defined in the low-level engine. This is again, downloaded. Every time we write some new code, it goes through a pipeline, it gets built to WebAssembly, and then we just execute it on your TV set top box, whatever you have in your living room. Then we have the low-level stuff that interacts with the device that we try to keep as small as possible. This is what we shipped, I think, end of August. It’s live today.

The Good Parts

Good parts. Developer productivity, actually, this was great for us. Previously, when we rewrote the engine, we had a bunch of developers who knew C++ and switched to Rust, and we had good results there. In this case, we switched people who knew only JavaScript and TypeScript to Rust, and they only knew stuff like React and those frameworks. We switched them with our Rust UI SDK with no loss in productivity. This is both self-reported and compared with. Whenever we build a feature, we have other clients that don’t use this, so, for example, like the mobile client or the web client and so on. The Rust client, actually, when we were discussing some new features to be built now on all of the clients, was, I think, the second one in terms of speed, behind web. Then even mobile developers had higher estimations than we did here. Also, we did this whole rewrite in a really short amount of time. We had to be productive. We built the UI SDK and a large part of the app quite fast.

The reason why I think this is true is because we did a lot of work in developer experience with those macros that maybe look a bit shocking if you don’t know UI programming, but actually they felt very familiar to UI engineers. They could work with it right off the bat, they don’t have to deal with much complexity in the borrow checker. Usually, in the UI code, you can clone things if necessary, or even use a Rc and things like that. You all know, this is not super optimal. Yes, we came from JavaScript, so this is fine, I promise. The gnarly bits are down in the engine, and there we take a lot of care about data management and memory and so on. In the UI code, we can afford it easy. Even on the lowest level hardware, I have some slides that you’ll see the impact of this.

Another thing in the SDK, as the SDK and engine team, we chose some constraints and they helped us build a simpler UI SDK and ship it faster. For example, one constraint our UI engine has, I might show it to you, is that when you define a label or a widget or something like that, you cannot read properties from it unless you’ve been the one setting properties. It’s impossible to read where on the screen an element ends up after layout from the UI code. You never know. You just put them in there. We calculate things in the engine, but you can’t read things unless you’ve been the one saying, this is your color, blue. Then you’re like, yes, it’s in my variable. I can read it, of course. Things like that, you can’t read. This simplified vastly our UI architecture and we don’t have to deal with a bunch of things, and slowness because of it. It seems like a shocking thing. Maybe you need to know where on the screen. No, you don’t, because we shipped it.

There was no need to know where you are on the screen, and there was no need to read a property that you haven’t set. There are certain cases where we do notify UI developers through callbacks where they can attach a function and get notified if something happens. It’s very rare. It happens usually in case of focus management and things like that. You will get a function call that you’re focused, you’re not focused anymore, and that works fine. Again, it’s a tradeoff. It has worked perfectly fine for us. That’s something that I think also has helped productivity. We only had one instance where developers asked to read a value of layout because they wanted something to grow, and maybe at 70% of the thing, they wanted something else to happen. Just use a timer and that was fixed.

Another good thing is that we iteratively shipped this. This is only because we used, I think in my view, entity component systems as the basis of our lower-level engine. That low-level engine with the systems it has and the components it has, currently supports JavaScript pages. By pages, I mean everything on the screen is in Rust or everything on the screen is in JavaScript. For example, we shipped the profiles page, which is the page you select the profile. The collections page, that’s the page right after you select the profile and you see all of the categories, all of the movies and everything. The details page, which is, once you choose something to watch, you can go to that place and see maybe episodes or just more details about the movie, and press play. We still have to move the search page, settings, and a bunch of other smaller pages. Those are still in JavaScript. This is work in progress, so we’re just moving them over. It’s just a function of time. We only have 20 people for both the UI SDK and the application. It takes a bit to move everything. It’s just time.

Another reason, it’s just work in progress. We think it was good. That entity component system managed perfectly fine to have these two running side-by-side. I don’t think we had one bug because of this. We only had to do some extra work to synchronize a bunch of state between these things, like the stack that you used to go back, the back stack and things like that, but it was worth it in the end. We got this out really fast. We actually first shipped the profiles page and then added the collections page and then the details page and then live and linear and whatnot. That’s nice.

Another good part is, in my opinion, we built tools as part of building this UI SDK. Because we built an SDK, so we had to build tools. I think one winning move here was, it’s really easy in our codebase to add a new tool, mostly because we use egui, which is this Rust immediate mode UI library. You see there like the resource manager just appears on top of the UI. This is something a developer built because he was debugging an issue where a texture wasn’t loading and he was trying to figure out, how much memory do we have? Is this a memory thing? The resource manager maybe didn’t do something right. It just made it very easy to build tools. We built tools in parallel with building the application and the UI SDK.

In reality, these are way below what you’d expect from browsers and things like that, but with literally 20% of the tools, you get 80% done. It’s absolutely true. You just need mostly the basics. Of course, we have debugger and things like that that just work, but these are UI specific tools. We have layout inspectors and all sorts of other things, so you can figure out if you set the wrong property. Another cool thing, in my opinion, so we built this, which is essentially a rewrite of the whole Prime Video App. Obviously, we’re against these things without a lot of data. One thing that really helped us make a point that this is worth it is to make a prototype that wasn’t cheating, that we showed to leadership around, this is how it feels on the device before what we did, and this is with this new thing.

Literally, features that were impossible before, like layout animations, are just super easy to do now. You see here, things are growing, layout just works, it rearranges everything. Things appear and disappear. Again, this is a layout animation here. Of course, this is programmer art, but has nothing to do with designers. We are just showcasing capabilities on a device. As you can see, things are over icons and under, it’s just a prototype, but it felt so much nicer and responsive compared to what you could get on a device that it just convinced people instantly that it’s worth the risk of building a UI in Rust and WebAssembly. Because even though we added Rust and it was part of our tech stack, we were using it for low-level bits, but this showed us that we can take a risk and try to build some UI in it.

Here are some results. This is a really low-end device where input latency for the main page for collection page was as bad as 247 milliseconds, 250 milliseconds, horrible input latency, with the orange color, this is in JavaScript. With Rust in blue, 33 milliseconds, easy. Similarly, details page, 440 milliseconds. This also includes layout time, because if you press a button as the page loads and we do layout, you might wait that much. This is max. The device is very slow. Again, 30 milliseconds, because layout animations means we need to run layout as fast as an animation frame, which is usually 16 milliseconds or 30 milliseconds at 30 FPS. It’s way faster and way more responsive. Again, that line is basically flat. It was great. Other devices have been closer to those two lines, but I picked this example because it really showcases even on the lowest-end device, you can get great results. The medium devices were like 70 milliseconds, and they went down to 16 or 33, but this is like the worst of them all. We have that.

The Ugly Parts

Ugly parts. WebAssembly System Interface is quite new. WebAssembly in general is quite new. We’re part of the W3C community. We’re working with them around features, things like that. There are certain things that are lacking. For example, we did add threads, but also there’s things that happen in the ecosystem that break our code sometimes because we use something that’s not fully standardized in production for a while. One such example was recently Rust 1.82, enabled some feature by default for WebAssembly WASI builds, that basically didn’t work on older WebAssembly virtual machines that we had in production. We basically now have a way to disable it, even if you have a new default and things like that. It’s worth it for us. That’s something to think about.

Also, WebAssembly System Interface keeps evolving and adding new features, and we’re trying to be active as part of that effort as well. It requires engineering effort. We can’t just quite take a dependency, or specifically on WebAssembly, and just be like, let’s see where this ship goes. You need to get involved in there and help with feedback, with work on features and so on. Another one we found out is panic-free code is really hard. Of course, exceptions should be for exceptional things, but that’s not how people write JavaScript. When the code panics in our new Rust app, the whole app gets just basically demolished, it crashes. You need to restart it from your TV menu. It’s really annoying. Panics shouldn’t quite happen. It’s very easy to cause a Rust panic, just access an array with the wrong index, you panic, game over. Then, that’s it. If you’re an engineer who only worked in JavaScript, maybe you’re familiar with exceptions, you can try-catch somewhere.

Even if it’s not ideal, you can catch the exception and maybe reset the customer at some nice position, closer to where they were before or literally where they were before. It’s impossible with our new app, which is really annoying. We, of course, use Clippy to ban unwraps and expect and those things. We ban unsafe code, except in one engine crate that has to interact with the lower-level bits. Again, it required a bit of education for our UI engineers to rely on this pattern of using the result type from Rust and get comfortable with the idea that there is no stack unwinding, especially there is no stack unwinding in WebAssembly, which is tied to the first point. You can’t even catch that in a panic handler. It just aborts the program. Again, this pretty big pain point for us.

In the end, obviously we shipped, so we’re happy. We almost never crashed, but it requires a bunch of work. This also generated a bunch of work on us because we were depending on some third-party libraries that were very happily panicking whenever you were calling some functions in a bit of a maybe not super correct way. Again, we would rather have results instead of panics for those cases. It led to a bit of work there that we didn’t quite expect. That’s something to think about especially in UI programming, or especially if you go, like we did, from JavaScript to Rust and WebAssembly.

The Bytecode Alliance

The Bytecode Alliance is this nonprofit organization we’re part of, a bunch of companies are part of it, and builds on open-source standards like WebAssembly, WebAssembly System Interface. Then, the WebAssembly Micro Runtime, which is the virtual machine we use, is built over there, as well as Wasmtime, which is another popular Rust one, implemented in Rust this time. WebAssembly Micro Runtime is C. It’s a good place to look at if you’re interested in using Rust in production, and especially using WebAssembly in production more specifically. In our case, it comes with Rust and everything.

Questions and Answers

Participant: You mentioned you don’t use this for your web clients. Would you think that something like this could work with using WebGL as the rendering target?

Ene: We did some comparisons on devices. There’s a bunch of pain points. First pain point is on the ones we do have to use a browser, because there’s no space on the flash, on some set top boxes. The problem is those are some version of WebKit that has no WebAssembly. That’s the big hurdle for us there. It could be possible. We did some experiments and it worked, but you do lose a few things that browsers have that we don’t. Today, it’s not worth it for us because those have very few customers. They work fairly ok in terms of comparing them to even the system UI. Even though they don’t hit these numbers, it would be a significant amount of effort to get this SDK to work on a browser.

Right now, it’s just quite simple because it has one target, the one that has the native VM. It requires a bunch of functions from the native VM that we expose that aren’t standard. Getting those would probably require to pipe them to JavaScript. Then you’re like, what’s going on? You might lose some performance and things like that. It’s a bit of a tricky one, but we’re keeping an eye on it.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Google Cloud Launches A4 VMs with NVIDIA Blackwell GPUs for AI Workloads

MMS Founder
MMS Steef-Jan Wiggers

Article originally posted on InfoQ. Visit InfoQ

Google Cloud has unveiled its new A4 virtual machines (VMs) in preview, powered by NVIDIA’s Blackwell B200 GPUs, to address the increasing demands of advanced artificial intelligence (AI) workloads. The offering aims to accelerate AI model training, fine-tuning, and inference by combining Google’s infrastructure with NVIDIA’s hardware.

The A4 VM features eight Blackwell GPUs interconnected via fifth-generation NVIDIA NVLink, providing a 2.25x increase in peak compute and high bandwidth memory (HBM) capacity compared to the previous generation A3 High VMs. This performance enhancement addresses the growing complexity of AI models, which require powerful accelerators and high-speed interconnects. Key features include enhanced networking, Google Kubernetes Engine (GKE) integration, Vertex AI accessibility, open software optimization, a hypercompute cluster, and flexible consumption models.

Thomas Kurian, CEO of Google Cloud, announced the launch on X, highlighting Google Cloud as the first cloud provider to bring the NVIDIA B200 GPUs to customers.

Blackwell has made its Google Cloud debut by launching our new A4 VMs powered by NVIDIA B200. We’re the first cloud provider to bring B200 to customers, and we can’t wait to see how this powerful platform accelerates your AI workloads.

Specifically, the A4 VMs utilize Google’s Titanium ML network adapter and NVIDIA ConnectX-7 NICs, delivering 3.2 Tbps of GPU-to-GPU traffic with RDMA over Converged Ethernet (RoCE). The Jupiter network fabric supports scaling to tens of thousands of GPUs with 13 Petabits/sec of bi-sectional bandwidth. Native integration with GKE, supporting up to 65,000 nodes per cluster, facilitates a robust AI platform. The VMs are accessible through Vertex AI, Google’s unified AI development platform, powered by the AI Hypercomputer architecture. Google is also collaborating with NVIDIA to optimize JAX and XLA for efficient collective communication and computation on GPUs.

Furthermore, a new hypercompute cluster system simplifies the deployment and management of large-scale AI workloads across thousands of A4 VMs. This system focuses on high performance through co-location, optimized resource scheduling with GKE and Slurm, reliability through self-healing capabilities, enhanced observability, and automated provisioning. Flexible consumption models provide optimized AI workload consumption, including the Dynamic Workload Scheduler with Flex Start and Calendar modes.

Sai Ruhul, an entrepreneur on X, highlighted analyst estimates that the Blackwell GPUs could be 10-100x faster than NVIDIA’s current Hopper/A100 GPUs for large transformer model workloads requiring multi-GPU scaling. This represents a significant leap in scale for accelerating “Trillion-Parameter AI” models.

In addition, Naeem Aslam, a CIO at Zaye Capital Markets, tweeted on X:

Google’s integration of NVIDIA Blackwell GPUs into its cloud with A4 VMs could enhance computational power for AI and data processing. This partnership is likely to increase demand for NVIDIA’s GPUs, boosting its position in cloud infrastructure markets.

Lastly, this release provides developers access to the latest NVIDIA Blackwell GPUs within Google Cloud’s infrastructure, offering substantial performance improvements for AI applications.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.