Presentation: Optimizing Java Applications on Kubernetes: Beyond the Basics

MMS Founder
MMS Bruno Borges

Article originally posted on InfoQ. Visit InfoQ

Transcript

Borges: Who here is a Java developer? Who’s not a Java developer, but is here because of Kubernetes? We do have a few things that are applicable to any language, any runtime. I work for Microsoft now. I worked for Oracle before. I use a Mac, and I’m a Java developer at Microsoft. I work in the Java engineering team. We have our own JDK at Microsoft. We help internal teams at Microsoft to optimize Java workloads. Anybody can give an example of where Java is used inside Microsoft? Azure has a lot of Java internally. In fact, there is a component called Azure control plane that does a lot of messaging Pub/Sub across data centers. We use Java in that infrastructure. Office? Not much Java behind Office, except maybe Elasticsearch, which is a Java technology for a few things. Anything else? PowerToys, not Java. Minecraft. Yes, people forget Minecraft is Java.

There is the non-Java version of it now, but the Java version has tens of millions of players around the world, and we have hundreds of thousands of JVM instances at Microsoft running Minecraft servers for the Java edition of Minecraft. Anything else? LinkedIn, yes. LinkedIn is the largest subsidiary at Microsoft, fully on Java, with hundreds of thousands of JVMs in production as well, even more than the Minecraft servers. Besides Minecraft, LinkedIn, Azure, we have a lot of big data. If you’re running Spark, if you’re running Hadoop, if you’re running Kafka, those things are written in Java. It’s the JVM runtime powering those things. Even if you’re not a Java developer, but you are interacting with those things, you are consuming Java technology at the end of the day. If you’re not a developer at all, but you are using, for some reason, Bing, you are using Java technologies behind the scenes, because we also have a lot of Java at Bing.

What is this talk about? We’re going to cover four things. We’re going to cover a little bit of the basics of optimizing Java workloads on Kubernetes. We’re going to look at the size of containers and the startup time. We’re going to look at JVM defaults and how ergonomics in the JVM play a significant role in your workloads. Then we’re going to talk about Kubernetes, how Kubernetes has some interesting things that may affect your Java workloads. Finally, this weird concept that I’m coming up with called A/B performance testing in production.

Size and Startup Time

Size and startup time. Who here is interested in reducing the size of your container images? What is the first reason why you want to reduce the size of container images? The download. I and other folks that I’ve seen online say that size is important, but not the most important part. Remember that you are running your infrastructure on a data center with high-speed internet between the VMs. Is the size really impacting the download of the image from your container registry to your VM where the container will run? If you can track the slowness of startup time back to download, back to the speed of that, then, yes, reduce the size. If storage is getting expensive, then reduce the size. I believe that security is even more important.

It’s about reducing the surface attack area. It’s about reducing components that are shipped in the image and may become an attacking vector. It’s about reducing what goes in so that patching and updating that image will be faster and will reduce basically all the dependencies of your system in production. For example, where do we have Log4j in production? We don’t use Log4j because we remove that dependency from everywhere, just to reduce the size. Yes, sure, but your primary goal was to reduce attacking vector. That’s an example. Easier to audit, as well. People dealing with SBOM, supply chain security, component governance, all of that good stuff in recent days, that is, in my opinion, the primary reason to reduce the size of an image.

How to reduce the image? There are three main areas. David Delabassée at Oracle presented this many times. He broke down to these three areas. There’s the base image layer. There is the application layer, the runtime layer. For base image layer, you can use slim versions of Linux distributions. You can use distroless images. Or you can just even build your own Linux base image with whatever distribution you want to base it on. Alpine is a good option as well. I’ll talk more about that in the next slide. The second layer is the Java application. You should only put the dependencies that your application really needs. It’s not just Java application, but any application, Node packages, Python packages, anything that goes in, you should really be careful about what is going into the final image. There is a trick about application layer.

You actually should break down the application layer in different layers, the layer of your container image with just the dependencies and then the layer with your application code. Why is that? Caching. You’re going to cache the dependencies. If you’re not changing the dependencies, you’re not going to build that layer again, just the application layer. If you’re a Spring developer, Spring has a plugin for that, for example. Run as a non-root user. That’s quite important. Finally, the JVM runtime, or any tech runtime. If your tech runtime has capabilities of shrinking down the runtime to something that only contains the bits required to run your application, then great. The JDK project, years ago, added that mechanism of modules. You can have a JVM runtime with only the modules of the JDK that are important to your application. If you really want to go the extra mile, GraalVM, you can build a Native Image.

Here’s a few examples of size differences of images. You have Ubuntu and Debian Full. Ubuntu doesn’t have a slim version anymore. They just have Ubuntu Full, and it’s 78 megabytes, and compares with the slim version of Debian. If you really want to cut it down, then Alpine. Alpine has some interesting things that you have to consider. It’s a musl libc library. Some libraries may not be compatible. Luckily, the JDK is compatible with musl. There are other issues in the past that got fixed, but who knows what else is there? Keep an eye on Alpine. It’s a good option, but make sure you test it. Also, it’s hard to get commercial support from cloud vendors for Alpine, so if you’re on Amazon, Google, or Azure, you may not get that support. Spring, the application layer, this is a classic Dockerfile for a Spring application. I have my JDK, interestingly enough, it’s coming from Alpine, but that’s from the Spring Boot documentation. I have my Fat JAR. I have the entrypoint, java -jar, my application. That’s great, but not the best. A better version of it would be to use the Spring user or create a custom user so you don’t run it as root.

Finally, an even better option, is to have those dependencies in different layers, so when your application changes, only that layer gets rebuilt. That optimizes your build, optimizes the download of the image as well, and so on. If you want to automate all that, you can put in your CI/CD. Hopefully you can go to the documentation on Spring, just search for Spring Boot Docker, and you’re going to get the Maven plugin to build that image for you. Last layer, the JVM or the language runtime, whatever stack you have. Here’s an example of recent modern Java days, JDK 22, 334 megabytes extracted. When you create a Java custom runtime with only the bits that are needed for your application, the JVM is only 57 megabytes. If you really want to go native, you use GraalVM Native Image, and you get less than 10 megabytes in many applications.

Let’s talk about startup time of the JVM, and potentially other languages may have capabilities like that. The JDK has a capability called class data sharing. It’s basically a binary representation of all the libraries so that it gets loaded into memory much faster. You got improvements of startup time by half, twice faster. These are the JEPs that you should look into. JEP stands for JDK Enhancement Proposal. Search for those numbers, or just class data sharing, you’re going to figure out great ideas. Some future projects in startup time that are happening in the OpenJDK world, Project Leyden, led by Oracle, and Project CRaC, led by Azul Systems. CRaC stands for checkpoint/restore. Here’s some benchmarks that Oracle ran for Leyden on Spring Boot.

You can see the blue bar is the traditional default for JDK 22, and then you have all the way down to Spring Boot AOT plus premain, which is a cached version. You do a train. You train the execution, and you get the caching of that execution, so the next time you start, it’s a lot faster. You go from 1.2 seconds down to 0.5 seconds, so significant improvement there. Then you have checkpoint/restore, which allows you to go from nearly 4 seconds for this application down to 38 milliseconds. It’s a significant improvement in startup time. Of course, this is a checkpoint/restore technology. The framework, the library, the runtime, they have to be aware of checkpoints, so you have the state of the application at a snapshotable state. Then you can take that, put in disk, and recover next time. Keep an eye on those projects. They will make significant changes in the Java ecosystem.

JVM Defaults

Let’s go into part 1, JVM defaults. The JVM has something called default ergonomics, and almost every language runtime stack has defaults. I like to say there’s always premature optimization, because the defaults tend to be a little bit conservative. They tend to be a little bit work for most applications, but that is, in essence, an optimization by itself, that it has to set how much memory we’ll use for the heap, how many threads we’ll use for a JIT compiler, and all of that based on signals from the environment. Let’s do a quick puzzle with JVM ergonomics. Here I have some mem puzzles. Let’s look at puzzle1. I’m going to run a Java application. Let’s go to processors first. I have a Java application, quite simple, public static void main, give me how many processors are there that the JVM can see. Let’s run puzzle1. Puzzle1, we run this thing. I’m on my local machine, we’ll just do docker run java and the application. This is a Mac with 10 processors. If I run this command, how many processors will the JVM see? It sees only eight. Why? Because I’m running Docker Desktop, and I configured to only allow eight processors for Docker Desktop. That was a tricky puzzle. It goes to show you how settings in the environment will affect the JVM, period.

Let’s look at puzzle2. Puzzle2, I’m setting two CPUs for the container. If I run this thing, it’s not really a puzzle, I will get two processors. Easy. What if I have a variable, not natural number, like a decimal number. Let’s look at puzzle3, 1.2 CPUs, or in Kubernetes world, 1200 millicores. How many processors does the JVM see here? Two, because the JVM will round up. If it has more than one processor, 1000 millicore, it’s 2. If it’s 2100 millicore, 3 processors, so on, so forth. Let’s go to memory. Memory is tricky. I have a program here. I’ll just show you that we’ll find out what is the garbage collector running in the JVM. There’s a lot of code here, because this code is actually compatible with lots of versions of the JVM. In recent versions, it’s a lot easier, but I wanted to have something compatible with older versions. Let’s run this program, cat puzzle1. I have one CPU with 500 megs of memory. What is the garbage collector that the JVM will select? There are five options in the JVM these days. How much memory will be set for the heap, for the memory? It’s going to be Serial, and 25% heap.

That goes to show that if you don’t tune properly the JVM, you’re going to get really bad heap configuration. Why is that? I’ll show you later. Let’s go to the next puzzle. I won’t spend too much time here with puzzles. We have two CPUs and 2 gig of memory. Let’s go to puzzle2, you got 25%. I want to do a quick change here, because I think I made a mistake on my puzzle. Let’s run this puzzle again. Let’s give 2 gig. Let’s go back to puzzle1, one CPU, 2 gig. Which garbage collector, and how much heap? You got 25% and Serial. There is a little puzzle. This one is two CPUs with 1792 megabytes, which garbage collector? You will get G1. If you reduce 1 megabyte, then you get Serial, because of 1 megabyte. This is inside the source code of the JVM, this logic. That’s the threshold. It goes to show how complicated things are on this thing. I remember where I missed the puzzle1. Puzzle1 it was wrong on one thing. Let’s do 200 meg. Two hundred meg, the heap size will not be 25%, it will be less. It will be around 50%.

This is the math of the JVM, at least, other language runtimes may have different algorithms. The default heap size for any environment with less than 256 megabytes will be 50%. That’s the default heap size. Then you have pretty much a stable line of 127 megabytes up to 512 megabytes. Then above that, the heap size is set to 25%. If you’re just running your application in the cloud and not configuring heap size, which most people actually do really well, generally, they don’t have to concern about this that much, unless you are not tuning the JVM at a minimum. This comes from a time of when the JVM was designed for environments that were shared with different processes. In the container world, the JVM is now set to take advantage of as much resources available in that environment. We have to inform the JVM manually that, JVM, you actually have access to all those resources. The defaults of the JVM have not been enhanced ever since. There are lots of projects happening now. Microsoft is involved in some of them. Google is involved, and Oracle is involved in enhancing ergonomics and defaults of the JVM for container environments. At the end of the story, don’t just -jar your application, otherwise you’re going to be wasting resources, and that’s money.

Garbage collectors. There are lots of garbage collectors in the JVM. You want to be aware of them. There is actually one extra garbage collector I did not put in the list. It’s called the Epsilon GC. The Epsilon GC is a garbage collector that does not collect anything. It’s great for benchmarking applications where you want to eliminate the GC from the equation, and you just want to benchmark your application performance without the behavior of the GC. Does it help for production, traditional business applications? Not much. For compiler engineers and JVM engineers, it’s really helpful. When you are running things in the cloud, you got to keep in mind that no matter how much memory you give to an application on a container, there are certain areas of the memory that the JVM or the workload will consume as the same amount. It’s called the metaspace, or code cache, things like that the JVM will need regardless of how many objects you have in the heap. That’s why, when you have two containers that are considered small, they are using the same amount of non-heap memory no matter what, pretty much.

Then the heap is different. You have to keep that in mind for a lot of things that we’re going to talk next. Here’s how you configure the JVM. We do provide some recommendations, set to 75% of memory limit, usually. If you want to make things a lot easier for you, you can use memory calculators, you have Buildpacks in the Packeto project. Packeto has this memory calculator for building the container image of Java workloads. They have Buildpacks for other languages as well. Those Buildpacks usually tend to come with optimizations for containers that most often the runtime doesn’t have. For example, here we have how the heap gets calculated automatically for you, so you don’t have to set -Xmx and other things. It goes on with what are other areas that are important to our application to tune eventually. Check out the Packeto Buildpacks. Here you have the Java example.

Kubernetes (YAML Land)

Part 2, Kubernetes, or I like to call YAML land. Anybody here familiarized with vertical pod autoscaler? Who is familiarized with horizontal pod autoscaler? Horizontal pod autoscaler is great. Works for almost everything. It’s the classic, throw money at a problem and you just put more computing power, and scale out your application. When I see keynotes, people say, we have 200 billion transactions in our system, ok, tell me how many CPUs you have. Nobody says that. Tell me how many cores are running behind the scenes. How many cores per transaction are you actually spending on? That is the true scaling of your system. From a business perspective, 200 billion transactions are great. I got a lot of revenue, but what’s your margin? Nobody will tell you the CPU per transaction, because that will give people an idea of margin. For us engineers, thinking about cost, about scaling, that is actually important. Horizontal pod autoscaler is great. You can scale out based on rules, but it’s not a silver bullet. I’ll give you an example.

There is a story of this company in Latin America. They were complaining about Java workloads. They were saying, this is slow, takes too much memory, too much CPU. We have to have 20 replicas of the microservice because it doesn’t scale well. I was like, if I you’re running 20 replicas of a JVM workload, there’s something fundamentally wrong in your understanding of the JVM runtime and how it behaves. If you don’t give enough resources to the JVM, what happens? The parts of the JVM runtime that the developers don’t touch, like garbage collectors, JIT compilers, will suffer. If that is suffering, yes, scaling out is a great solution, but it’s not the most effective solution in many cases. Then this company went ahead and migrated to Rust. Great alternative to Java, but it required six months of migration work. The funny thing is, the performance could have been solved in a day just by properly understanding the JVM, tuning the JVM, and redistributing the resources in their cluster. No, they chose the hard route because they actually wanted to code in Rust, because it’s fun. I respect that.

Vertical pod autoscaler is an interesting technology in Kubernetes, where especially now with version 1.7+, you have something called InPlacePodVerticalScaling. It allows the pod to increase amount of resources of a container without restarting the container. It’s important that the runtime that is running inside understands that more resources were given, and the runtime has to be able to take advantage of that, so the JVM still today doesn’t have that great capability, but that’s in the works.

The interesting thing about vertical pod autoscaler is something that Google offers in their GitHub, and I can use that on Azure as well, called kube-startup-cpu-boost. It allows a container to have access to more resources up to a certain time, up to a certain policy, up to a certain rule that you put down in your 1500 lines of YAML, magic will work, and you’re going to have a JVM that, for example, has more access to CPU and memory to start up, but then you can actually reduce CPU and stabilize. Because the JVM, yes, it will have significant work in the first few hours of the JVM, when the system is being hit and the JIT compiler is working and optimizing the code in time. Then after a while, the CPU usage doesn’t change much. It actually goes down. That’s when you figure out, this is how much time I can give to that workload to have CPU boost. Search for kube-startup-cpu-boost. You’re going to have a nice experience.

What else do we have here? Java on Kubernetes. What is the main issue that people here who are running or monitoring or deploying Java on Kubernetes have? What is your main issue? CPU throttling is a major problem. You have an application that is given not enough CPU, but there is a GC in there. There’s a JIT compiler in there. There are certain elements of your runtime stack that will require CPU time beyond what your business application is doing. I’ll give you an example. Let’s walk through. If you don’t understand CPU throttling and how that impacts the JVM and even other runtimes with garbage collectors like .NET, Node, and Go, they have garbage collectors, maybe not at the same scale, perhaps, as the JVM, but it does impact. Let’s say you set the CPU limit to 1000 millicore, and that’s in a system with a CPS period of 100 milliseconds, by default. An application has access to 1000 millicore for every 100 milliseconds. There are six HTTP requests in the load balancer, nginx, or whatever.

Those requests are sent to the CPU to be processed by the JVM when they come in, and then they are processed across the four processors in the node. Why is that? Because the 1000 millicore is about CPU time and not about how many processors I have access to. Those threads, when they performed, each request consumed 200 millicores. In total, they consumed 800 millicores. Those 800 millicores were consumed within 20 milliseconds. Now I only have 200 millicores left for the rest of the 80 milliseconds until the next period that I can use 1000 millicores again. What happens in those 20 milliseconds? The garbage collector has to work and remove objects from the heap, clean up memory. The GC work total was 200 millicores. Now I have to wait 60 milliseconds until I can process another request. That is your CPU throttling. When you understand that, you understand that you want your application and your runtime stack to be very clear on how many resources are needed to perform all the tasks on that float. Now, new requests come in, and that’s your latency going up by at least 60 milliseconds in these two requests.

We covered this thing. One millicore is one processor, 1001 to 2000 millicores, two processors, and so on. There is this little flag in the JVM that you can actually use for whatever reason, called ActiveProcessorCount. Why is it an interesting flag? Because you can say my application has 1500 millicore limit, but I’m going to give three processor count. That will tell the JVM, the JVM has access to three processors, so it will size thread pools internally according to this number. How many threads can I process at the same time? Three? Sure, I’ll size my thread pool based on that. If you have a very strong I/O bound application, this is where you may want to use it despite having smaller CPU limit. Most microservices on Kubernetes are I/O bound, because it’s just network. Receiving requests from the load balancer, sending requests to the database, calling another REST API, receiving another REST API, is just network I/O. At Microsoft, we came up with this flow of recommendations, or better starting points for Kubernetes.

Depending on how much you have in terms of CPU limits, and I found customers saying, no, our pods in Kubernetes can only have up to 2000 millicore. I find that crazy, but there are cases like that. Yes, if that is the case, here’s some guidance on where to get started. Start with this instead of the defaults in the JVM. Then from this point, start observing your workload and fine-tune the JVM to your needs. Always have a goal. Is your goal throughput? Is your goal latency? Is your goal cost, resource utilization? Always have a goal in mind when JVM tuning, because there are garbage collectors for latency. There are garbage collectors for throughput. There are garbage collectors that are good in general, for everything. There are garbage collectors that are good for memory. You always have to keep that in mind that you have a goal to address.

Resource redistribution. Remember when I said the case of the customer with 20 replicas, 30 replicas, like just growing for the same workload. I did a benchmark years ago when I started all this research and I came up with this chart where I put a Java workload on Kubernetes, and I did four scenarios. The green line and the green bar represents six replicas of the workload. You can see that the latency, my p99 is really bad. It’s not even good by p99. Then it goes even worse on the p99.9. The throughput is ok, 2.5 million requests. Then I thought, what if I reduced the replicas and gave more CPU and memory? Technically, I’m still using the same amount of memory and CPU. I’m scheduling the same amount, but I’m reducing the number of replicas so that my language runtime has more opportunity to behave properly. The throughput went up, the latency improved. There is a correlation between resource allocation.

Then I thought, what if I could save money as well? What if I could reduce resources now that I have figured out resource redistribution helps? I came up with the two replicas, the blue line, two replicas with two CPUs. I went down from six CPUs to four, from 6 gig to 4 gig total. I improved the performance. My throughput is better than originally. My latency is better than originally. It’s not the best, but it’s better. It costs less. That goes to show that you have an opportunity today if you have a case where you have systems with dozens of replicas, to do a resource redistribution on your Kubernetes cluster. This is applicable to any language. I heard the same story from .NET folks, from Go folks, and so on.

I’ll give you some examples in practice. This is looking at Azure Kubernetes Cluster. This is a cluster of six VMs, six nodes. I deploy the same workload like 18 times, and it cost me $840 a month to run this scenario. What should I do first? I’m going to merge a few pods, and that should give me better performance on those pods, on those nodes. Then I’m going to continue to apply this rollout to more. Instead of having lots of replicas of the same node, I’m going to have just one replica per node. If you are into the idea of writing Kubernetes operator that does this thing magically for you, please be my guest. I would love to have a talk with Kubernetes operator experts.

Then, finally, this is performing better than originally. Maybe I should go even further. I’m going to increase my node pool to taller VMs, increase a little bit the resource limits of those pods, and have only three now. I’m still interested in resiliency and all of that good stuff, but now I have standby CPU and memory. The cost is still the same because I’m using the same type of VM, just with more CPU and memory on that type of VM. The cost is still the same, but now I have spare resources for more workloads. From a cloud vendor perspective, it’s great. You’re still paying the same. For your perspective, you can do more. That is the beauty of this approach.

A/B Performance Testing

To finalize, we’re going to get into a land of unproven practices. I’m still hoping that somebody will help me prove this thing. I have this concept called A/B performance testing. We see A/B testing all the time, mostly for features in a system. What if we could do A/B testing for production performance as well? What if we could go and say, I’m going to have a load balancer, and I’m going to route the loads to different instances of my application, same application, but configured differently. This instance here uses the garbage collector A. This instance here runs the garbage collector B. This instance here has more resource limits. This instance here has less memory limits, and so on. Then you start also considering, I’m going to have smaller JVMs with more replicas, horizontal scale, or I’m going to have taller JVMs with lesser replicas on a taller system. You can do that easily on Kubernetes. Here’s an example of how I can do the 2by2, 2by3, 3by2, 6by1. I put that on from the nginx, and I do a round robin. I try the least connection pattern for nginx. It’s tricky. It really depends on your workload.

For the benchmark purposes, I used round robin. The other scenario, which is a lot easier actually, is to use for benchmarking, garbage collector configuration and tuning. I have a deployment of the same application, but with ergonomics, default JVM, G1 GC, and Parallel GC. That’s how I configured. This one is actually interesting to use list connection, because you want to see how the GCs work really fast, and doesn’t pause too much the application so you can process more connections. For my benchmark purposes, I still use round robin. When you combine all that, you will have something like this, at least on Azure. Here I have the Azure dashboard. You can see here the inspect, which is just a JSON path. It returns me a serialized JSON, the prime factor, which I used to do a CPU bound emulation. Then we can see the roles. Then in the roles, that’s where I can see, 2by2, 2by3, and then I can see which one is performing better.

This is in the last 24 hours. I can make it a bit bigger. Here I have the other profiles. I don’t want to spend too much time in here, because this was just to generate data to show you, if we compare these cases here, it won’t show, this is better than that. It’s clear. No, it won’t be clear, because, first of all, my use case is definitely not like yours. You have to understand your scenario. I’m showing you what you can put in practice so that you can observe in production how things can behave differently.

I’ll give you the bread of the butter, where I do a live thing, at least. Here, I have the 2by2, 2by4 running, and I’m going to trigger this kube exec to get into the pod. I have a container in a cluster, and I just access that container in this bottom right shell. I’m inside a container in this cluster, and now I’m going to run the same test as I did before, and it’s going to be prime factor. I’m going to trigger this thing against nginx. nginx will route this thing to those different deployments, different topologies. I have 20 threads and 40 connections. Let’s go to the dashboard, and let’s actually use the live metrics, which is a nice, fancy thing for doing live demos. Here I have the request coming on the right side, the aggregated request rate and everything from this application. This application is deployed with the same instrumentation key. It’s different deployments, but it’s the same container image. I’m giving, of course, different environment variable names.

When I go to role, I can see the roles of the deployments. I’m going to show you later how I define that in the code, but it’s just a parameter for Azure Monitor. Who’s performing better? Again, hypothetical. Here I have all the pods, 2by2, 2by2, 6by1, 6by1, 6by1. For this case here, it doesn’t really matter who’s performing better, because I haven’t hit the peak performance need for this workload. You can see the CPU usage is not above 50%, so it’s pretty good. It goes to show you how you can compare in production different topologies and different GCs and different JVM tuning flags, different parameters. That will give you a much better opportunity to evaluate production performance load with your application still working just fine. Sometimes I have issues with doing performance testing in the lab, just because it doesn’t mimic exactly production customer workload. This is an opportunity for the customer themselves to do that test.

What Have We Learned?

The main takeaways, reduce the size of container images, but think primarily in security, not in size itself, unless size is the problem. Track down, is downloading the image into the node a problem? We have 1 gigabit speed, and your container registry is in the same data center as your Kubernetes cluster. Does it really matter? If it does, yes. Security should be the primary focus, in my opinion. Startup time, optimize JVM for startup time. There’s lots of technologies and features in the JVM these days that can make your application fly in terms of startup time. CDS, class data sharing is a main thing for most versions. If you’re on Java 11, Java 17, Java 21, class data sharing is available in those versions. Take advantage of that. Evaluate Project CRaC and Project Leyden as you think about modernization in the near future. JVM tuning, understand your runtime.

For any language, understand your runtime defaults, understand your runtime capabilities, and take advantage of them. Observe as much as possible: observe memory, CPU, garbage collection, JIT compilation. All of those things can be measured in production. It’s fairly easy. Understand the impact of resource constraints in your runtime stack, and make sure that you are giving enough for the runtime to behave properly. Horizontal scaling is not a silver bullet. It’s just throwing money at the problem. Take advantage of that vertical scaling as well. Finally, A/B performance tuning in production. It’s going to be the next big thing after AI. Consider that as well, especially in your staging. If you have a nice staging pre-production environment, that’s a great opportunity. If you’re interested in what other stuff that Microsoft is doing for Java, go visit, developer.microsoft.com/java.

Questions & Answers

Participant 1: I’m curious if Microsoft is looking to add CRaC support in OpenJDK distribution of Microsoft.

Borges: We are researching that. We are working with internal teams. I actually just emailed a follow-up on my conversation with the teams to see the status on their research of which projects they want to test CRaC. CRaC is a nice feature that Azul is working on. There is one little thing that can complicate things, especially if you’re not using a standard framework like Spring, which is, it does require code to be aware of the checkpoint/restore flow. I’m going to checkpoint. I’m going to restore. You have to shut down a lot of things. You have to shut down thread pools, database connections, and inflow transactions, and all of that, before you can do a checkpoint snapshot. Then, when you restore, you have to start those objects again. Spring has that implemented for you. If you are using Spring, or I think Quarkus is working on that in the meantime. If the framework has a capability, great. We know how enterprise customers are creative in coming up with their own frameworks in-house. CRaC will require at least that the framework team builds that capability into the application framework. We are looking into it.

Participant 2: About the current JSRs that are in progress, which one do you think will affect most the Java performance in production?

Borges: Not JSR specifically. JSR stands for Java Specification Request. There are no JSRs specifically for enhancing the JVM for these problems. There are projects and conversations in place by Google, Microsoft, and Oracle to make the heap of the JVM be dynamic, grow and shrink as needed. That capability will allow, especially the InPlacePodVerticalScaling capability to be taken advantage of by the JVM. Oracle is working on the ZGC. That’s the other problem. Because the JVM has lots of garbage collectors, it’s up to the garbage collector, not the JVM, to define memory areas and how to manage the origin servers. Oracle is working on adaptable heap sizing on ZGC. Google has done some work on the G1 GC. We Microsoft are looking into Serial GC for that idea.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.