Presentation: Unblocked by Design

MMS Founder
MMS Todd Montgomery

Article originally posted on InfoQ. Visit InfoQ

Transcript

Montgomery: I’m Todd Montgomery. This is unblocked by design. I’ve been around doing network protocols and networking for a very long time, exceedingly long, since the early ’90s. Most of my work recently has been involved in the trading community in exchanges, and brokerages, trading firms, things like that, so in finance. In general, I’m more of a high-performance type of person. I tend to look at systems that have to, because of their SLAs, be incredibly performant. That’s not all, other things as well.

Outline

What we’re going to talk about is synchronous and asynchronous designs. The idea of having sequential operation and how that impacts things like performance, and things like that, and some, hopefully have a few takeaways of things, if you’re looking to improve performance in this category, things you can do. We’ll talk about the illusion of sequentiality. All of our systems provide this illusion of the sequential nature of how they work. I think it all boils down to exactly what do you do while waiting, and hopefully have some takeaways.

Wording

First, a little bit about the wording here. When we talk about sequential or synchronous or blocking, we’re talking about the idea that you do some operation. You cannot continue to do things until something has finished or things like that. This is more exaggerated when you go across an asynchronous binary boundary. It could be a network. It could be sending data from one thread to another thread, or a number of different things. A lot of these things make it more obvious, as opposed to asynchronous or non-blocking types of designs where you do something and then you go off and do something else. Then you come back and can process the result or the response, or something like that.

What Is Sync?

I’ll just use as an example throughout this, because it’s easy to talk about, the idea of a request and a response. With sync or synchronous, you would send a request, there’ll be some processing of it. Optionally, you might have a response. Even if the response is simply just to acknowledge that it has completed. It doesn’t always have to involve having a response, but there might be some blocking operation that happens until it is completed. A normal function call is normally like this. If it’s sequential operation, and there’s not really anything else to do at that time, that’s perfectly fine. If there are other things that need to be done now, or it needs to be done on something else, that’s a lost opportunity.

What Is Async?

Async is more about the idea of initiating an operation, having some processing of it, and you’re waiting then for a response. This could be across threads, cores, nodes, storage, all kinds of different things where there is this opportunity to do things while you’re waiting for the next step, or that to complete or something like that. The idea of async is really, what do you do while waiting? It’s a very big part of this. Just as an aside, when we talk about event driven, we’re talking about actually the idea of on the processing side, you will see a request come in. We’ll denote that as OnRequest. On the requesting side, when a response comes in, you would have OnResponse, or OnComplete, or something like that. We’ll use these terms a couple times throughout this.

Illusion of Sequentiality

All of our systems provide this illusion of sequentiality, this program order of operation that we really hang our hat on as developers. We look at this and we can simplify our lives by this illusion, but be prepared, it is an illusion. That’s because a compiler can reorder, runtimes can reorder, CPUs can reorder. Everything is happening in parallel, not just concurrently, but in parallel on all different parts of a system, operating systems as well as other things. It may not be the fastest way to just do step one, step two, step three. It may be faster to do steps one and two at the same time or to do step two before one because of other things that can be optimized. By imposing order on that we can make some assumptions about the state of things as we move along. Ordering has to be imposed. This is done by things in the CPU such as the load/store buffers, providing you with this ability to go ahead and store things to memory, or to load them asynchronously. Our CPUs are all asynchronous.

Storages are exactly the same way, different levels of caching give us this ability for multiple things to be optimized along that path. OSs with virtual memory and caches do the same thing. Even our libraries do this with the ideas of promises and futures. The key is to wait. All of this provides us with this illusion that it’s ok to wait. It can be, but that can also have a price, because the operating system can de-schedule. When you’re waiting for something, and you’re not doing any other work, the operating system is going to take your time slice. It’s also lost opportunity to do work that is not reliant on what you’re waiting for. In some application, that’s perfectly fine, in others it’s not. By having locks and signaling in that path, they do not come for free, they do impose some constraints.

Locks and Signaling

Let’s talk a little bit about that. Locks and signaling introduce serialization into a speed-up. If you look at Amdahl’s law, what it’s really saying is, the amount of serialization that you have in your system is going to dictate how much speed-up you get by throwing machines or processors at it. As you can tell from the graph, if you’re not familiar with Amdahl’s law, which I hope you would be, but it does, it limits your scaling, and even just a simple thing such as 5% serialization within a process. That’s a small percent compared to most systems, can reduce that scaling dramatically, so that you don’t gain much as you keep throwing processors at it and scaling.

That’s only part of the issue. It also introduces a coherence penalty. If you want to see a coherence penalty in action, think of a meeting where you have five people in it, and how hard it is to get everyone to agree and understand each other, and make sure that everyone knows what is being talked about. This is coherence. It is a penalty that is attached to getting every entity on the same page and understanding everything. When you add in a coherence penalty and do something like that, it turns out that Amdahl was an optimist. That it actually starts to decrease the speed-up that you get, because the coherence penalty starts to add up, so that becomes a dominant factor, in fact. It’s not simply that you have to reduce the amount of serialization, but you also have to realize that there’s a coherence. Locks and signaling have a lot of coherence, and so this limits scaling. One thing to realize is that by adding locks and having signaling, you are, in effect, limiting your scaling to some degree. It goes even further than that. More threads, more contention, more coherence, less efficient operation. This isn’t always the case, but it often is.

Synchronous Requests and Responses

There is actually more to think about. The reason why I’m going through a lot of this is so that you have some background in terms of thinking about this from a slightly different perspective. I’ve had a lot of time to think about it, and as systems that I’ve worked on, I’ve distilled down some things. I always have to set the stage by saying here’s some of the things that limit and here’s how bad it is, but there are things we can do. Let’s take a look here. First, the synchronous requests and responses. You have three different requests. You send one, you wait for the response, you send another, and you send a third, and you wait for the response. That may be how your logic has to work. Just realize the throughput of how many requests you can do is limited by that round-trip time. Not by the processing, it’s limited by how fast you can actually send a request and get a response.

If you want to take a look at how our technology has grown, response time in systems does not get faster very quickly. In fact, we’ve very much stagnated on that response. You can take a look at clock speed, for example, in CPUs. If you look at network bandwidth, storage capacity, memory capacity, and somewhat the CPU cores, although that hasn’t grown as much, as the accumulated improvements have grown over time, they’ve grown more than improvements in response time, for example. From a throughput perspective, we are limited. If you take a look at it from a networking perspective, and look at it through throughput, just in trying to get data across, this stop and wait operation of sending a piece of data, waiting for a response, sending another piece of data, waiting for a response, is limited by the round-trip time. You can definitely calculate it. You take the length of your data, you divide it by the round-trip time. That’s it. That’s as fast as you’re going to go. Notice that you can only increase the data length, or you can decrease the round-trip time. That’s it. You have nothing else to play with.

You’d rather have something which was a little bit faster. This is a good example, a fire hydrant. The fire hydrant has a certain diameter that has a relationship to how much water it can push out, as opposed to a garden hose. Our networks are exactly the same thing. It doesn’t matter if it’s network. It doesn’t matter if it’s the bandwidth on a single chip between cores, all of them have the same thing, which is the bandwidth delay product. The bandwidth is how much you can put in at a point of time. That’s how big that pipe is. The delay is how long that pipe is. In other words, the time it takes to traverse. The bandwidth and delay product is the amount of bytes that can be in transit at one time. Notice, you have a couple different things to play with here. To maximize that you have to not only have a quick request-response, request-response, but you also have to have multiple pieces of data outstanding at a time, that N right there. How big is N? It’s a whole different conversation we can have, and there’s some good stuff in there. Just realize that you’d want N to be more than just 1. When it’s 1, you’re waiting on round trips.

More Requests

The key here is while something is processing or you’re waiting, is to do something, and that’s one of the takeaways I want you to think of. It’s a lost opportunity. What can you do while waiting and make that more efficient? The short answer is, while waiting, do other work. Having the ability to actually do other stuff is great. The first thing is sending more requests, as we saw. The sequence here is, how do you distinguish between the requests? The relationship here is you have to correlate them. You have to be able to basically identify each individual request and individual response. That correlation gives rise to having things which are a little bit more interesting. The ordering of them starts to become very relevant. You need to figure out things like how to handle things that are not in order. You can reorder them. You’re just really looking at the relationship between a request and a response and matching them up. It can be reordered in any way you want, to make things simple. It does provide an interesting question of, what happens if you get something that you can’t make sense of. Is it invalid? Do you drop it? Do you ignore it? In this case, you’ve sent request 0, and you’ve got a response for 1. In this point, you’re not sure exactly what the response for 1 is. That’s handling the unexpected.

When you introduce async into a system where you’re doing things and you’re going off and doing other stuff, you have to figure out how to handle the unexpected because that’s what actually makes a lot of things like network protocols. How you handle them is very important. There’s lots of things we can talk about here. I want to just mention that errors are events. There’s no real difference. An event, can be a success, it can also be an error. You should think about errors and handling the unexpected as if they were events that just crop up in your system.

Units of Work

The second thing to think about is the unit of work. When we think about this from a normal threads perspective, we’re just doing sequential processing of data, we’re doing work, and it’s between the system calls that we do work. If you take that same example I talked about, like a request, and then a response, if you think about it from getting a request in, doing some work, and then sending a response, it’s really the work done between system calls. System call to receive data. System call to send data. The time between these system calls may have a high variance. On the server side, this isn’t so that complicated. When you start to think about it from the other side, where it’s, I do some work, I then wait. Then I get a response. Now it’s highly varying in terms of the time between them, which may or may not be a concern, but it is something to realize.

Async Duty Cycle

When you turn it around and you say something like from an asynchronous perspective, the first thing you should think about is, ok, what is the work that I can do between these? Now it’s not simply just between system calls. It’s easier to think about this as a duty cycle. In other words, a single cycle of work. That should be your first class concern. I think the easiest way to think about any of this is to look at an example in pseudocode. This is an async duty cycle. This looks like a lot of the duty cycles that I have written, and I’ve seen written and helped write, which is, you’re basically sitting in a loop while you’re running. You usually have some mechanism to terminate it. You usually poll inputs. By polling, I definitely mean going to see if there’s anything to do, and if not, you simply return and go to the next step. You poll if there’s input. You check timeouts. You process pending actions. The more complicated work is less in the polling of the inputs and handling them, it’s more in the checking for timeouts, processing pending actions, those types of things. Those are a little bit more complex. Then at the end, you might idle waiting for something to do. Or you might just say, ok, I’m going to sleep for a millisecond, and you come right back. You do have a little bit of flexibility here in terms of idling, waiting for something to do.

Making Progress

The key here is, you should always think about it as, your units of work should always be making progress towards your goal. Once you break things down, which is where all the complexity comes into play, you start to realize that the idea of making progress and thinking about things as steps like you would normally do in just listing out logic is the same. The difference here is that you have to think about it as more discrete, as opposed to just being wrapped up. To give an example of this, I’ve taken just a very simple example. Let’s say you’re on a server, and you get a request in. The first step you need to do is to validate a user. Then if that’s successful, you then process the request. If it’s not successful, then you would send an error back. Then once you’re done, you send a response. That response may be something as simple as, ok, I processed your request. It could be that you generate a response. If you turn that into the asynchronous side, you can think about the requests as being an event, like OnRequest. The next thing you would do is you would request that validate. I’ve made this deliberately a little bit more complicated. The validating of the user in a lot of sequential logic is another blocking operation. That’s the actual operation we want to look at from an asynchronous perspective.

Let’s say that we have to request that validation externally. You have to send away for it to be validated and come back. This might be from a Secure Enclave, or it could be from another library, or it could be something else. It could be from a totally separate system. The key is that you have to wait at that point. You go off and you process other things. Other requests that don’t depend on this step can be processed, other pending actions. There could be additional input for other requests, more requests, other OnRequests that come in. At some point, you’re going to get a response from that validation: it might be positive, it might be negative. Let’s assume that it’s positive here for a moment. You would then process the request at that point. That could spawn other stuff and send the response. What I wanted to point out here is, that’s the lost opportunity. If you simply just did get a request, validate user, and then you just go to sleep, that’s less efficient. It’s lost opportunity. You want to see how you would break it down. That’s where having a duty cycle comes into play. That’s where that duty cycle helps you to basically look at this and to do other stuff, and so breaking it down into states and steps. At the first time, you actually had an implicit set in the sequential version on the left, of states that something went through, like request received, request being validated, request validated ok, processing request, sending a response. Those states are now explicit in a lot of cases on the right. Think about it from that perspective. You’ve got those states, it’s just how you’ve managed it.

Retry and Errors as Events

One of the more complicated things to handle here is the idea of, ok, that didn’t work and I have to try it again. Retrying logic is one of the things that makes some of these asynchronous duty cycles much more complicated. Things like transient errors, backpressure, and load are just some of the things that you might look at as transient conditions that you then can try again. If we look at that now from a little bit different perspective, we take and expand this a little bit. On a request, request to validate. You wait, and it’s ok. You process a request, you send a response. That’s the happy path. The not so happy path is not the case where you get an error on the validate, you say, no, can’t do that. It’s where you get an error that basically says, ok, retry.

This does not add that much complexity if you’re tracking it as a set of state changes, because you would get the request. If we look at this, we’ll see this on the right there, you will request to validate, you wait. The same as before. On validate, if the OnValidateError indicates that it’s not a permanent error, like somebody put in bad credentials, let’s say, that the system for validation was overloaded, please wait and retry. You would wait some period of time. That is not any more complex than waiting for that response. You’re simply just waiting for a timeout. The key here is that you would then request validate again. You can add things like a number of retries, and things like that. It doesn’t really make things more complicated. Something may hide just underneath of you for the sequential case, but it’s just lost opportunity. This is what I mean by making progress. This is making progress at some form every step of the way. Again, one size does not fit all. I don’t want to get you to think that one size fits all here. It does not.

Takeaways

Takeaways here are the opportunity when you’re waiting for something external to happen, or things like that. If you think about it from an asynchronous perspective, we may think that a lot of times it’s complicated, but it’s not. It’s, what do you do when waiting? Sometimes it’s easy a question to answer, and leads us down interesting paths. Sometimes it doesn’t. Not all systems need to be async, but there’s a lot of systems that could really benefit from being asynchronous and thinking about it that way.

Questions and Answers

Printezis: You did talk about the duty cycle and how you would write it. In reality, how much a developer would actually write that, but instead use a framework that will do most of the work for them?

Montgomery: I think most of the time, developers use frameworks and patterns that are already in existence, they don’t do the duty cycle. I think that’s perfectly fine. I think that also makes it so that it’s easy to miss the idea of what a unit of work is. To tie that to one of the questions that was asked about actor model, reactive programming, patterns and antipatterns, what I’ve seen repeatedly is, when using any of those, the idea of a unit of work is lost. What creeps in is the fact that in that unit of work, now you have additional, basically blocking operations. Validation is one that I used here, because I’ve seen multiple times where the idea of, I got a piece of work in. The first thing I’m going to do is go and block waiting for a validation externally, but I’m using the actor model in this framework. It’s efficient, but I can’t figure out why it’s slow. I think the frameworks do a really good job of providing a good model, but you still have to have that concern about, what is the unit of work? Does that unit of work have additional steps that can be broken down in that framework? There’s nothing wrong with using those frameworks, but to get the most out of them, you still have to come back to the idea of, what is this unit of work? Break it down further. It’s hard. None of this is easy. I just want to get that across. I’m not trying to wave my hand and say this all is easy, or we should look at it, be more diligent or rigorous. It’s difficult. Programming is difficult in general. This just makes it a little bit harder.

Printezis: I agree. At Twitter, we use Finagle, which is an asynchronous RPC that most of our services use to communicate. Sometimes the Finagle team have to go and very carefully tell other developers, you should not really do blocking calls inside the critical parts of Finagle. That’s not the point. You schedule them using Finagle, you don’t block. Because if you block all the Finagle threads, that’s not a good idea. We haven’t eliminated most of those. None of this stuff is easy.

Any recommendations out of this actor model, patterns, antipatterns, would you like to elaborate more?

Montgomery: I am a fan of the actor model. Again, if you look at me, the systems that I have out in open source and have worked on, using queues a lot, using the idea of communication, and then having processing that is done. I don’t want to say it’s the actor model. I think that model is easier, at least for me to think about. That might be because of my background with protocols and packets on the wire, and units of work are baked into that a lot. I have very much an affinity for things that make the concept of a unit of work, already to be something that is very front and center. The actor model does that. Having said that, things like the reactive programming, especially with the RX style, I think have a lot of benefit from the composition side. I always encourage people to look at that, whether it makes sense to them or not, as you have to look at various things and see what works for you. I think reactive programming has a lot. That’s why I was involved in things like RSocket, reactive socket, and stuff like that. I think that those have a lot of very good things in them.

Beyond that, I mean, patterns and antipatterns, I think, learning queuing theory, which may sound intimidating, but it’s not. Most of it is fairly easy to absorb at a high enough level that you can see far enough to help systems. It is one of those things that I think pays for itself. Just like learning basic data structures, we should teach a little bit more about queuing theory and things behind it. Getting an intuition for how queues work and some of the theory behind them goes a huge way, when looking at real life systems. At least it has for me, but I do encourage people to look at that. Beyond that, technologies frameworks, I think by spending your time more looking at what is behind a framework. In other words, the concepts, you do much better than just looking at how to use a framework. That may be front and center, because that’s what you want to do, but go deeper. Go deeper into, what is it built on? Why does it work this way? Why doesn’t it work this other way? Asking those questions, I think you’ll learn a tremendous amount.

Printezis: The networking part of the industry has solved these with TCP, UDP, HTTP/3, what has prevented us from solving this in an industry-wide manner at an application level? How much time do we have?

Montgomery: The way that I think of it, because I’m coming from that. I spent so much time early on in my career learning protocols and learning how protocols were designed, and designing protocols. From my perspective, it is a lesson I learned early. It had a big influence on me. When I look back, and why haven’t we applied a lot of that to applications, it’s because just like CPUs provide you with program order, and compilers reorder, but with the idea that none of your critical path that you look through in your program in your mind is still going to function step one, step two, step three. By giving this illusion of sequentiality that we can base our mental models on, it’s given us the idea that it’s ok to just not be concerned about it. While at the networking level, you don’t have any way to not be concerned about it, especially if you want to make it efficient. I think as things like performance become a little bit more important because of in effect climate change. We’re starting to see that performance is something that people take into consideration for other reasons than just the trading community, for example. We’ll start to see some of this revisited, because there’s good lessons, they just need to be brought into more of the application space. At least that’s my thought.

Printezis: Any preference for an actor model framework, Erlang, Elixir, or Akka, something else?

Montgomery: Personally, I actually like Erlang and Elixir from the standpoint of the mental model. Some of that has to do with the fact that as I was learning Erlang, I got to talk to Joe Armstrong, and got to really sit down and have some good conversations with him. It was not surprising to me. After reading his dissertation, and a lot of the other work, it was something that was clearly so close to where I came from, from the networking perspective, and everything else. There was so much good that was there that I find, when I get to use some Erlang. I haven’t actually used Elixir any more than just playing around with it, but Erlang, I’ve written a few things in, especially recently. I really do like the idioms of Erlang. From an aesthetic perspective, and I know it’s odd, I do prefer that.

Akka is something I’m also familiar with, but I haven’t used it in any bigger system. I’ve used Go and Rust and a few others that have pieces of the same things. I think it is really nice to see those. It’s very much more of a personal choice. The Erlang or Elixir thing is simply just something that I’ve had the opportunity to use heavily off and on, last several years, and really do like, but it’s not for everyone. I think that’s just a personal preference. I think, keeping an open mind, trying things out on your own, is very valuable. If possible, I suggest looking at what speaks to you. Whenever you use a framework or a language, there’s always that thing of, this works but it’s a little clunky. Then, this isn’t great. Everything is pretty much bad all over. It doesn’t matter. I find if you like something you haven’t probably used it enough. For me, I do encourage people to take a look at Erlang, but that doesn’t necessarily mean that you should do that and avoid other stuff. You should try everything out, see what speaks to you.

Printezis: I’ve always been fascinated by Erlang. I don’t want to spend more time on this. I’ve always been fascinated because I’m a garbage collection person, and it has a very interesting memory management model. The fact that thread-local GC in the language, basically, the language assures it, the way it structures the objects. That’s been fascinating for me.

Project Loom basically is supposed to introduce fibers in Java, which are very lightweight threads. The idea is that you can run thousands of them, and they’re not going to fill up your memory, because not all over we’re going to have a full stack. Any thoughts on Java fibers? The idea is that they’re very lightweight, and then you can run thousands of them, and then you get the best values, but if one of them starts doing some synchronous I/O, another one will be scheduled very quickly. Any thoughts on that?

Montgomery: Yes, I do. I’m hopeful. I’ve been down this road a couple times where the idea of let’s just have lighter weight threads has come up a few times. What tends to happen is we think, this is hidden from me and so I won’t take care of it, or I won’t think about it until it becomes an issue. I don’t think that’s really where we should spend some of that time. I don’t see it as a panacea, and then all of a sudden the coherence penalty and the serialization will go away, which are inherent in a lot of those designs. It would be very interesting to see how this gets applied to some systems that I’ve seen. I’ve seen some systems with 600 threads running on 2 cores, and they’re just painful. It’s not because of the application design, except for the fact that they’re just interfering with one another. Lightweight threads don’t help that. They can make it worse. It’ll be interesting to see how things go. I’m holding my breath, in essence, to see how that stuff comes out.

Some of the things that have come out of Project Loom that have impacted the JVM are great, though, because there are certain things that I and others have looked at for many years and thought, this should just be better, because this is clearly just bad, looking at them. They have improved a number of those things. I think that’s great. That’s awesome. I’m not exactly sold on the direction.

Printezis: I’m also fascinated to see where it’s going to find usage and where it’s going to improve things.

One of the most challenging aspects of doing something like an asynchronous design where you send requests, and then you get them later, is actually error reporting and error tracking. If you have a linear piece of code, you know like, ok, it failed here, so I know what’s going on. If you have an exception in the middle of this request, sometimes it’s challenging to associate it with what was going on. Any thoughts on that?

Montgomery: A lot of code that I’ve seen that has a big block and a try and a catch, and then it has like I/O exception, and there’s a whole bunch of I/O that happen, some of the sequential logic has the same problem. I think, in my mind, it’s context. It’s really, what was the operation? If it’s an event that comes in, you can handle it just like an event. You might think about state change. I think that is an easier way to deal with some exceptions in big blocks as well, is to break it down, and to look at it in a different way. In my mind, I think that it makes sense to think of them as events, which is something I’ve harped on for a number of years now. When you look at systems, you should think of them as errors should be higher level and handled a little bit better in context. It doesn’t mean you handle them locally, it means you handle them with the context that they share. It is hard. One of the things that does make them a little bit easier, in my mind, are things like RX and other patterns that an error happens as an event that you can deal with slightly separately, which forces you to have a little bit more context for them.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.