Presentation: How Rust Views Tradeoffs

MMS Founder


Klabnik: I’m Steve, and this is a talk called “How Rust Views Tradeoffs.” I am on the Rust core team. I’m currently in between jobs. I used to work at Mozilla on Rust, but have recently left and I don’t know exactly what I’m doing yet. We’ll see what happens, but I’m still going to be heavily involved in Rust development even though it’s not my day job anymore.

This is a little overview of what I’m going to be talking about today. This is the “Choose the right language for the job” track. I wanted to talk about how Rust views tradeoffs and what would make you choose Rust or not, and also just some ideas when you think about tradeoffs. We’re going to start off by talking about what even is a tradeoff in the first place and then we’re going to talk about Rust’s bending the curve philosophy, it’s how Rust approaches thinking about tradeoffs. We got to talk about that first and then we’re going to go into a small digression about how things get designed in the first place and this concept of values. Then lastly, we’re going to go over these four different case studies in areas where Rust has had to choose some tradeoff and why we pick the thing that we did and the thing that we didn’t do.

Briefly, before I get into the rest of the talk, I want to thank Bryan Cantrill. A large part of this talk is based on a framework that he came up with. There’s this talk called “Platform,” it’s a reflection of values from Node Summit 2017, and it’s kind about the story of Joyent and Node.js and io.js and all that stuff. He’s the one that really got me thinking about a lot of the stuff that ends up being in this talk, I wanted to make sure to give him credit. You should watch his talk, it’s good.

What Is A Tradeoff?

Before we talk about tradeoffs, it’s important that we all agree what a tradeoff actually is. I am an amateur philosophy enthusiast and one of the hardest problems when communicating with others is making sure that you’re not talking past each other. You have to agree what words mean by using words which is complicated. We’re going to get into what tradeoffs are before we talk about the specific tradeoffs in Rust.

Everyone always does this, and it’s boring and dumb, and I apologize, but the dictionary defines tradeoff, this is a little more interesting, I swear, “a balance achieved between two desirable but incompatible features or a compromise.” Now, the first thing is interesting, I think is the sentence a tradeoff between objectivity and relevance. That’s an interesting example of a tradeoff. You can be objective or relevant, you have to be subjective to apply to reality. I might agree with that a little more, but the reason I decided to put this on the slide is not because, the dictionary says it’s such a dumb meme, but the thing right below this on the webpage, if you Google this, and I thought this was really interesting is this use over time graph. I was like, “Wait a minute, just after 1950 is the first time, we thought of the concept of tradeoffs?” I went into a deep rabbit hole. You’re supposed to be working on slides. Instead of working on your slides, you’re like, “I need to look up the linguistics about the word tradeoff,” and it turns out the reason that graph that was there looked like that when you could see it, is because obviously the concept of a tradeoff did not start in the early 1960s but it used to be the words trade off and we only started putting them together into one word around the 1960s. That’s why this graph looks like this. Obviously, people use the words tradeoff and talked about discussions earlier, but I just thought that was interesting. That’s why the dictionary thing is still up there, language changes over time, and it is cool.

I got a bachelor’s in computer science, and one of the things that were beat into you in at least in the States is “There’s tradeoffs and you have to deal with them.” Here are three examples of classic tradeoffs in computer science or programming that you might have dealt with before. One of the biggest ones is space versus time, if you google computer science tradeoff, everyone’s like, “That’s space-time tradeoff.” It’s the vast majority of things on the web apparently. The basic idea is that you can either make something fast or you can make it small, and these two things are independent of each other and, so that’s a common situation when you’re designing data structures or talking about the network. The second one is throughput versus latency, these are two different measures of network activity. Throughput is how much stuff can you do in a given space of time and latency is how long does it take you to do the thing. These are two things that are often you can get good at one or not the other.

Then, finally, a big classic one, dynamic types versus static types. I wrote Ruby or Haskell because I was trying to think of the most truly way to describe this description ever. I actually have a Ruby tattooed on my body, I used to do Ruby before Rust, it was a very dynamic typing enthusiast. One of my friends Sean Griffin, I’ve been going around for last couple of years it’s been like, “Oh I have a ruby tattooed on my body. That’s how much I love Ruby.” My friend Sean had his first-born daughter, and he named her Ruby. He leg one up to me, “You think you are going to tattoo, and you’re going to be the biggest thing.” but I was like, “Fine, Sean. You win.”

Dynamic versus static typing and tradeoffs. These tradeoffs are great and complicated, and they’re a core aspect of our discipline, and they’re also core to another aspect of our discipline, which is arguing on the Internet. People argue over which one of these tradeoffs is the right thing to choose and in some ways that’s dumb but also I think the tradeoffs are the core of what we do. If we think about programming as an engineering task, fundamentally, you have to make choices between things sometimes. That’s important.

Bending the Curve

Rust has this interesting approach to tradeoffs that we call “bending the curve.” This is an attitude that got instilled in the Rust developers fairly early on. I’m not really sure who started trying to think about things this way, but it’s a way that we approach the problem of “You have these two things. What did you pick?” Let’s talk about that a little bit. When I was making these slides, I felt very clever because I looked at these three tradeoffs and I was like, “Wait a minute. one of these tradeoffs is actually different than the other two.” I don’t know if you have any ideas about which one might be the different one here, but I think I made a slide. Since I don’t have my laptop, it’s totally blank, we’re going to have a little bit of fun. Did I guess my next slide correctly? Yes.

Throughput versus latency. This is often a function of physics, but that doesn’t necessarily mean it’s always a tradeoff. You can sometimes have systems that have more throughput and less latency than a different system. It’s not always inherently a tradeoff. At some point physics and wires and stuff I don’t fully understand comes into play, but this is an interesting entry point into this idea that these things don’t always have to be at odds with one another even when there’s a tradeoff and things are usually at odds. If we think about the other ones, this also becomes true.

These dictionary definitions, I’m sorry, bear with me briefly. This is what Wikipedia has to say, instead of the dictionary, I’ll quote Wikipedia. “A situational decision that involves diminishing, or losing one quality, or whatever in return for gains and some other aspects.” Basically, one thing goes down, and another thing goes up. If you think about this a little more generally, this is weird with these things I presented to you as tradeoffs. A lot of times, something that’s smaller is actually faster. If you do high-performance computing and you can fit your computation into your one cache, it becomes massively faster. What normally might be a space versus time, like tradeoff, a common thing in compiled languages is, do you inline a function or not? The argument there is if you inline the function everywhere, your binary gets larger, so that’s a large issue but you get faster. Whereas if you do dynamic dispatch instead, your binary is smaller because you only have one copy of the code of that function, but you end up being a little bit slower because you have to actually do the dispatch.

In other situations, like in this idea with the cache, these two things aren’t at odds and in fact, smaller is often faster. This is also true, for example, if you’ve ever had to download jillions of gigabytes of JavaScript to load a webpage. You’re like, “Man, if this JavaScript was smaller, my page would load faster.” A lot of stuff around the web is about making things smaller so they can be faster. Sometimes this is not actually a true dichotomy.

Dynamic versus static typing. The non-trolly resolution to this is gradual typing, you can have things where you start off a little statically and more dynamic or vice-versa. Has anyone ever heard people describe dynamic languages as a unit typed languages before? There’s an argument that all languages are statically typed because types are only compile-time construct and languages that are dynamically typed, only have one type, and everything is one type, so, “Ha-ha. You still have static types,” but that’s totally useless and only good for making other people mad.

Gradual typing is a better example of how static versus dynamic as a real dichotomy. In fact, even in most languages that give you static types, there’s some facility for dynamic types and we’re seeing an increase in dynamic languages, Python is introducing this thing called mypy which I think is in the language now that lets you annotate functions that you want to be annotated with types and your stuff gets faster and all that goodness. It’s not exactly a pure dichotomy tradeoff.

Then, finally, as I already mentioned, throughput versus latency, you can sometimes do better on both. That doesn’t always mean you can, but these are actually different measurements, throughput is about amount over time and latency is about the distance to an individual thing. It’s weird to even argue that they’re against each other because they’re just fundamentally not measuring the same thing. With this idea that tradeoffs don’t have to purely be one thing goes up or goes down, if you have two options like this or that, you see my amazing presentation drawing skills here, get ready. You really think that we can choose something that’s here or something that’s there and you have to pick one or the other but in reality, this is more of a gradient and so you could choose this instead.

It’s a little bit closer to one than the other but bending the curve is about the idea of “What if you could pick this instead?” This is the best I could do with a curve, I’m sorry. I used to write all my slides in HTML and CSS and stuff, but I’m really bad at CSS, so I don’t know why I kept doing that to myself, and then PDF export is terrible, and this conference loves PDFs for your slides so I just did Google Slides and it’s fine. Even the idea of just grab it in the middle and draw it up towards a different thing, we can do other stuff with tradeoffs than just look at two different options. We can get unique solutions by thinking out of the box a little bit. I hope that was an enterprising enough sentence for you all.

The way that we often think about tradeoffs, as I mentioned earlier, one thing increases, another thing decreases. This is more commonly known as a zero-sum game, if you do any game theory or economic theory, apparently according to Wikipedia, but no. A zero-sum game basically means that when you add up everyone’s scores, they end up being zero, if you need to lose then I need to win and vice versa, although of course, I’m going to be the one winning, that’s the idea. This problem is that when you think this way, you start believing that other people inherently have to lose for you to win. It turns out that not very many things other than economic theory and game theory, which are theories is actually true. In most situations, including programming language design, you can design things in a different way, and that is a win-win game. This is a game in which everyone wins, or everyone loses. We like to focus on everyone wins part.

It’s not really inherently about you must lose so that I must win. It’s about trying to figure out, “Is there a way that we can both win?” A win-win strategy means that you can try to figure out how that goes. Then this idea of bending the curve is fundamentally about when we look at tradeoffs, we try to figure out a way that we can have both things at the same time. This bending the curve really boils down to “Is this tradeoff fundamental?” Sometimes it’s absolutely true, there is situations in which someone has to lose for the other person to win, but a lot of times we get too obsessed with that idea, and we apply it where it doesn’t have to work. Can we be a little creative and instead have everyone win? This works way more often than you might think.

Design Is about Values

Before we get into the case studies about what Rust actually did, if we think about, “Ok, this is the approach we’re going to try to find win-win solutions instead of you need to lose, and I need to win.” we need to talk about the game that we’re actually playing, and in order to talk about that, we have to think about the concept of design in an abstract, “What is the task?” When I’m the person in charge of designing a language or designing a system, “What is the thing I’m even trying to do?” What is that activity? If you know any architects, you may ask yourself, “What is their job really?”

This is the part that largely relies on Brian’s work. Fundamentally, design is about values, when you’re thinking about a system and you’re thinking about building it, you need to understand what is valuable to you and beyond that, you really need to think about it a little more complicated than just “What do I care about?” You need to think about what are your core values? That is what is this stuff you absolutely are totally not willing to give up on? Is there a hill that you will definitely die on and you’re going to die for sure? What is that stuff? Then also secondary values, there are some things that would be nice to have, but if you don’t get them, it’ll be ok. This is often a little more complicated because a lot of people to think that you can never compromise on anything and I definitely am that person sometimes, but a lot of the creative process actually sticks into this, when you’re willing to give a little, what do you get back from it? Having more secondary values. You would think that having a lot of core values is very useful, but it turns out those are useful in some situations, but they’re not as useful as something you’re actually willing to trade off for something else. Having a lot of secondary values is also pretty good, what stuff do you want to have but you don’t necessarily need to have?

Then, finally, what stuff do you just not care about whatsoever? Identifying this is really important too because it means that if there’s something that you do really want and you have a thing you’re willing to give away, it’s really easy to get that thing if you can figure out how to do that particular tradeoff. Being explicit about the things that you don’t care about can be just as important as caring about the things you actually do care about.

Let’s talk about Rust’s core values when it comes to designing things. Now, I will say that I am not on the language design team anymore, it’s complicated, I’ll get on the history a little bit later. This is what I see, please take this as my personal interpretation. Rust is designed by a lot of people, and so I’m not saying that they necessarily 100% agree with me. That’s another funny part about design is you get to argue about things with lots of people. When I look at Rust core values, I see these three things as being what Rust cares about a lot. I mentioned that they’re in a particular order because the funny thing about core values is you certainly also need to decide “If these things come into conflict with each other, what do I actually pick?”

The thing that Rust cares about above all else is memory safety, and there is historic reasons for this. Largely, because Rust’s whole idea is “What if we could make a C++ that’s actually memory safe.” If you were to give up memory safety, it’d be “What are we doing here? This is the whole point of the enterprise is wrong.” Rust also really cares about speed, but Rust cares about safety more than speed, this is also why I said they’re in order. Historically speaking, these two things are at odds, if there’s a situation in which we need things to be safe, but we have to give up a little bit of speed, we will do it, but because speed is still a core value, we will try our damnedest to make sure that we can find some other way to make that happen. Every once in a while there’s a situation where that’s not actually the case.

Then finally, I put productivity here, which is a little bit of, I don’t want to say weasel word exactly, it’s a little hard to define what productivity means, but Rust cares a lot about being a language that is useful to people. You’ll see this express differently in the things Rust doesn’t care about later, but basically Rust wants to make decisions that will make it be a thing that you want to use Rust. That sentence is terrible, I’m sorry. Programmers need to use a language and Rust’s language that wants to be used, and there are some languages that don’t want you to use them, surprisingly enough. We’ll get into that, and that’s totally fine, it’s not a judgment about values, it’s about your judgment of your values. These are the core things that Rust really cares about.

Rust secondary values, and these are the things that we would like to have, but we’re willing to give up a lot of the times are ergonomics. In order to achieve safety and speed, Rust has some stuff that makes it a little harder to use. Getting ahead of myself, but we’ll give up that ease-of-use sometimes to achieve those other goals, but we still would really like it to be as easy to use as possible. Another one, and this is unfortunate if you’ve ever used Rust, I’m sure you’re not surprised by compile times being a secondary value. The Rust compiler is slow, it’s a problem, we are working on it, but we care more about the final speed of binaries than we do about making the compiler fast. We will always make your program faster if it makes your compile time slower and that’s just what it is. That said, after this talk I’m about to post the proposal for this year is Rust roadmap, and one of the major features of it is “How are we going to make the compiler faster?” We do care about this, and we want to get it done, but we give it up maybe a little more than we should even sometimes.

Then this is interesting, correctness is a secondary value. What I mean by this is Rust really cares that your programs are right, but we’re not willing to go full dependent types, proof assistant, make sure that your code is right. It should be right, but we’re not going to make you prove that it’s right. That’s why it’s a secondary value is because we’re willing to give up a little bit of correctness sometimes in order for, for example, ergonomics. Proof assistants are really hard to use, and I don’t expect that many of you in this room have even used one, let alone are comfortable using one. You have to give up a little bit of those correctness tools in order to achieve ergonomics and productivity.

Things that Rust does not care about. I think this first one might be a surprise to a lot of you, but it’s actually in the name. Blazing a new trail and programming language theory. The name “Rust” evokes a lot of different things and it actually, there’s no one reason why Rust is named “Rust.” The guy who made it originally, Graydon [Hoare], used to just come up with new things of why whenever someone would ask him. There’s six different new reasons out there, but one of them is, is that Rust’s programming language theory technology is 2000 to 2006 era programming language terminology. It just happens that most of the languages we use today were made in 1995. Rust seems it’s this really big conceptual leap forward but if you talk to somebody who’s trying to get their PhD in PLT, they’re going to be like, “Rust is real boring.” A lot of the tech that Rust is built on is actually pretty old and so Rust is not trying to be a research language. We’re not trying to push programming language theory forward completely. Some of the people in language team might disagree with me a little bit, they have PhDs, it’s cool. We do some new stuff, but it’s not a thing we’re trying to do as a goal.

Secondarily, worse is better. Rust is not interested in just throwing something out there and hoping it’s good enough and iterating. We spend a lot of time trying to get things right and so on the Jersey versus MIT side of things, we are more on the MIT side of things, Rust will spend a lot of time iterating on features until they are good and we are not willing to just throw stuff out there. The way that you can see this is in our standard library, Rust has a very small standard library, and that’s because we’re not totally sure that we have great libraries for things yet. We’re not just going to throw an XML parser in the standard library unless we think we’ve got a great XML parsing library because the standard library is where libraries go to die, and that’s no fun for anyone. We tend towards the correctness side than just the throw something out there side of things.

Then, a last one, which is interesting for systems languages, supporting certain kinds of old hardware. An example of this specifically is two’s complement versus one’s complement integers. If you’re a C and C++ programmer, you may know, I hope that the representation of integers is undefined and that leads to all sorts of fun shenanigans. That’s because C and C++ were developed in an era where a lot of hardware had different implementations of integers and so you’re allowed to pick one’s complement, two’s complement, or assign magnitude for integers. We basically said, “Listen, literally all the hardware that gets made today uses two’s complement integers, so we’re just going to assume you have two’s complement integers and you can use a different language if you are programming the machine from the ’70s.”

This hardware support is so true, there’s actually a paper right now, the feature freeze for C++ 2020 just passed, but the next iteration of C++ 23 might also declare that it only supports two’s complement hardware because it turns out that it’s been a long time since anyone’s made one’s complement machines except for one company and everyone’s like, “Come on,”. Anyway, we’re willing to skew hardware support for certain kinds of old things, we don’t have those kinds of integer undefined behaviors because we’re willing to just say it’s two’s complement, and that’s fine. That’s a tradeoff that we are willing to make, those are some examples of our values.

A little bit more about values and design, an interesting thing is that it’s not just you that has values in the system you’re trying to build, it’s also your users. They have a certain set of values or the things they’re trying to accomplish. As a programmer, I think it behooves us to think about not just the values that we hold, but the values of the people that use our software hold and, as a programmer, you should use tools that align with your values. I really like programming languages and learning new ones, but there are some that I have seen where I’m like, “You know what? This language is not for me, so I’m just not going to use it.” I’m not going to denigrate any languages by naming them, but it’s true that I would be unhappy if I had to program in some languages and that’s because they value different things than I value and that’s totally chill. There are other people who have different values than me, they can use those languages, they are super happy. That’s literally why we have different languages, it’s fine. I’ve had frustrations with tools where I was forced to use something, and I was like, “Man, this tool sucks,” and then I realized it wasn’t that the tool sucked, it’s that it cared about different things than I cared about. That weirdly made me more okay with using the tool because I was able to just be like , “I understand why this friction is happening,” and it made that job easier.

In general, those kinds of mismatches can cause those problems. I find that a lot of programmers arguing on the Internet about whether something is great, or terrible, or awesome, or horrible, really come down to that person has a certain set of values for the things they create, and they’re talking about a thing that cares about completely different things and that’s where there are a lot of arguments happen. For more than that, watch Brian Stock, it’s great.

When should you use Rust? Before we talk about the specific tradeoffs, I figured I would put some examples of when Rust might make sense to you. If you find these values to be true in the software you write you may want to use Rust, if not, don’t, it’s cool, there lots of great languages. I think that Rust is ideal whenever you need something that’s both reliable and performance. Yes, performance is a word, I don’t care what you say. Language changes over time, deal with it, I’ve had a lot of bad arguments on the internet, I’m really sorry, that’s really shaped my worldview in many ways. There are people who care if you use the word performance and they will get mad at you, and I’m expecting tweets about it later. Performance is important, reliability is important, when you need those two things, you might want to look to Rust.

It’s interesting because a lot of people were like, “Well, when wouldn’t I care about reliability and performance?” And let’s be serious, think about some systems you built, there’s been a lot of them that have not been reliable or performant. There’s times in which you are willing to trade away those things, and that’s totally cool. A lot of the “rewrite it and Rust” meme comes from places that have built something in a system that is not necessarily reliable performant and then got to scale and realize “Oh my God, we need reliability and performance,” and they rewrote a portion of it and Rust and we’re happy, that’s a really great strategy for managing these kinds of tradeoffs, I’ll talk a little more about that later. Sometimes you need these things, sometimes you don’t need these things, and that’s cool and yes, as I mentioned with the rewrite stuff, sometimes you don’t need these things immediately. We’ll be here, it’s cool, go write stuff and other things.

Case Study: BDFL Vs Design by Committee

Let’s talk about some case studies, the first couple of case studies are going to be about the design of Rust itself and tradeoffs in the design and the way we approached the design process and then I’ll get more specific, we’ll talk about threading models at the end, this is going from broad to concrete. BDFLs versus designed by a committee, this tradeoff involves who is building your system and who gets to make the calls, who’s the decider? One model is the BDFL model, which is the benevolent dictator for life. They rule over their project with hopefully a velvet fist, not an iron fist, hope not mixing too many metaphors. They need to be benevolent or else you’ve got a dictator, and that’s bad, but if they’re benevolent and generous, it’s probably good. A lot of people this model and a lot of programming languages are designed this way.

The other option is “designed by committee,” where a bunch of people who are not invested in the system make the decisions. There’s this quote, I forgot about this when I was looking at these slides “A camel is a horse designed by committee.” I don’t think it’s really fair to camels. I also have a pearl camel tattoo but, when you look up the definition for this, a lot of people think “Oh, if something’s designed by multiple people, then that also has chances to really go awry” We have these two tradeoffs, we let one person make all the decisions, and that if they make a bad decision, we’re totally screwed or make a lot of people make decisions and when they make bad decisions, we’re totally screwed. Which one is actually better? How can we do things differently?

Rust didn’t ever really truly have a BDFL, but we went from “one person makes decisions” to lots of people making decisions over time as the project developed. Originally, Rust was a side project by Graydon [Hoare], he got to decide everything because he was the only person working on it. That’s just what happens, you start a project, you’re in charge. He was always extremely forward that he was not the BDFL, which a lot of people were like, “That’s a great sign in the BDFL.” Eventually, he gave up his power to a bunch of other people in which even more people wanted him to be the BDFL because they’re like, “You’re willing to give it up now,” , “you’re going to be great,” and he was like, “This whole thing makes me uncomfortable. No.” We developed the Rust core team, and so that became a small group of people whose decision was to make decisions about the language.

Eventually, we ran into issues of scale, I believe that’s my next slide. We transitioned from having a group of people to having a group of groups of people. Now, the Rust project has our core team, which I’m a member of, but we also have what used to be called sub-teams are now just called teams, so for example, I’m a member of the documentation team as well as the core team. I’m on both, there are some people who are on only one team, and the idea is that all of the teams are actually equally in charge. Rust core team is more of a tie-breaking organization at this point than it is a hierarchical thing, but that’s also complicated and weird. I’m going to get into it, we don’t really vote so we don’t have ties to break, but it’s fine. The important being a part is Rust used to have one person in charge, and then it had six people in charge, and now it has about 100 people in charge, we’ve changed a lot as this works out.

The reason this happened is basically due to scale. As the project grew, we ran into limits, I was on the core team whenever it was the only team, and the problem was that it was our job to decide on things. Every week we have a meeting, and we would decide on all the things we did decide on, and by that I mean there’d be this big giant list. We get through some of them, and the next week there’d be even more added onto the list, and so it just started to grow, and people became frustrated that the core team was becoming this bottleneck. Members on the core team were frustrated because not every one of those decisions was relevant to every member of the team. If you wanted to talk about variants in our type system, I like read Reddit while those meetings happens, I wasn’t paying attention, but I still had to vote because I was on the team, so that’s weird and dumb. Then when I really wanted to talk about whether we choose British or American English for our documentation standards, that was my jam, the people that have the PhDs in type theory, were like, “Yes. Whatever Steve says, that’s fine.”

It took so long to get through all these decisions that people would be like, “I’ve been waiting on a month for you all to make a decision on my pull request, what’s going on?” We’d be like, “Sorry. We got a lot of stuff to decide.” In order to scale, we decided to make more than just the core team. That was just creative solution to this problem and that’s been helpful, but then that comes with new problems of its own because now when you have 100 people and 15 teams, they all have to coordinate. Recently, I would like to announce that we’re about to make our governance team, which is basically a team’s team. Its job is to figure out where coordination issues are between the teams and help in the team stuff work, it’s a team making team, programmers’ love recursion.

This also means that originally, one of the problems with BDFL over designed by committee that people bring up, is the BDFL has a grand vision that he toils over an artist or whatever, and designed by committee has no taste. One of the problems is you move to multiple people, you lose this cohesion unless you’re explicit about these design values. We all have to agree what are the principles that we use to make these decisions. That’s something that we’ve been getting a little bit better at is communicating to each other, how we make these decisions and why, and dealing with those problems. I don’t want to say that having 100 people run your language is a panacea because it is not, but it definitely has helped with the bottleneck of having documentation people decide on language features.

Case Study: Stability without Stagnation

I had an argument on the Internet with somebody about what stability meant, recently. They’re like, “You added a new API this release, so it’s not stable, stable means unchanging.” I’m like, “Oh God.” Stability means things don’t change, but if you never change, then you’re also not growing. Growth requires some amount of change, you make sure that you’re stable enough that your users aren’t dealing with “We changed everything, now your code doesn’t compile,” or, “Enjoy the new feature,” versus “Sorry. We can’t fix that bug because it’s relied on in production by this large company.” This is a tradeoff that you have to deal with, we want to be able to have change, but we also don’t want it to affect people that don’t want it, opt-in change. We don’t actually think that these two things are inherently at odds.

There’s this blog post called “Stability is Deliverable.” I’m going to have a couple of little citations from it, but if you want to look it up, it’s on the Rust blog, there’s the URL. I’m sure you will type out that URL in the two seconds that takes me to describe this, but you can just Google for this on the Rust blog, and it’s a thing. This lays out our plan and our approach to stability and I’m not going to get too deep into the weeds, but basically, we don’t want to mean that Rust will stop evolving, but we want to release new versions of Rust frequently and regularly. In order for people to be able to upgrade to these new versions, it has to be painless to do so. Our responsibility is to make sure that you never dread upgrading Rust. If your code compiles on 1.0, it should compile on Rust 1.x without problems. This is continuous integration, all the rhetoric around, like, continuous integration, continuous deployment is, if you deploy twice a year and you fear deploy week, if you start deploying every week, you get better at it, you get better at what you do. If you deploy often, you will be better at deploying, so let’s do that.

We approached this with the language. If we release the language often, then we will do a better job at making sure that we don’t break stuff because it’s not once a year that we check in with our users if we broke all their stuff, and so this is what we do, we actually copied browsers. This basically just says we land stuff on Master, we have feature flags, and then every six weeks, Master becomes promoted to Beta, the previous Beta become Stable. If you’ve ever used Chrome or Firefox, you have probably seen this model, every six weeks, your browser’s like, “Hey, a new version of the browser came out.” We did the same thing with Rust, and that lets us do these releases, but things don’t get off of Nightly, they don’t get into a release until they’re explicitly marked as stable. What that lets us do is it lets just experiment with new features on Nightly and actually ship the code and put it into the compiler but that won’t affect stable users because you’re not allowed to use Nightly features when you’re unstable.

This lets you as a user of Rust, if you want to be involved and try out new features and get goodies while they’re still cooking, you can do that by downloading a Nightly version of the compiler and trying it out and give us feedback. If you don’t want to deal with that cause that’s a pain, then you can use Stable and never have to worry about that, and Stable becomes really easy to update and all those kinds of things. This says what I just said, I’m not going to read slides to you.

What’s the tradeoff here? The thing with bend in the curve is when you introduce a third thing into your this or that, you’re also probably introducing a fourth thing, you’re giving up something there too. I don’t want to always say this means you get everything, this process is a lot of work for us. We have a team for that, it’s called the release team, and also the infrastructure team, they both deal with this problem. We got two teams working on this, and so we had to put two teams together to work on it, that’s the tradeoff.

We also invested a lot in continuous integration because we needed to be testing. We actually periodically download every open source package in Rust and try to compile it with the next version of the compiler to just double check we’re not breaking your stuff. That’s really cool, it also means Mozilla’s paying a bunch of money for some service, so thank them. We developed a lot of bots, so bors is our continuous integration bot, it makes sure that everything passes the test suite before it lands. This also means that bors is always the number one of our contributors’ list because he merges every single pull request. I got lots of stories about that that are funny, but there’s no time, so sorry, you can ask me about that later, but bots are awesome. Basically, this is one of those “our users versus us tradeoffs”, we’re willing to put an effort to make things easier for our users, and that is a tradeoff that we will almost always take, it is a tradeoff, and we pay the price for you.

Case Study: Acceptable Levels of Complexity

There’s two different kinds of complexity, there is inherent complexity, and there’s incidental complexity. Inherent complexity is just like, “It’s actually complicated,” and incidental complexity is like, “You made it complicated when you didn’t have to make a complicated.” Separating out these two things is important because you can’t always get inherent complexity to go away because there’s inherent, it’s defined in the word, but incidental complexity is the thing that you can fight because it’s about you accidentally making things more complicated than you needed to, that’s a skill, and I think to work on. What was interesting about this is something can be inherently complex for one design, but incidentally complex for another design. That values list that you picked earlier can often determine if something is inherent or incidentally complex.

Here’s what I mean by that, Alan Perlis is this guy, I don’t actually know what he did other than write witty stuff about programming, to be honest, but he has a thing called “Epigrams in Programming,” and I found several of them, I think that are interesting to Rust, “It’s easier to write an incorrect program than to understand a correct one.” “A programming language is low-level when it’s programs require attention to the irrelevant.” That one was my favorite, and then finally, “To understand the program, you have to become both a machine and the program.” He wrote these in the late ’80s, I believe, I’m not totally sure, but I think these all apply to Rust. What I mean by that is that Rust does want to help you write correct software and Rust does want you to write fast software. In order to do that we expose a lot more air handling than many languages do because a lot of stuff can go wrong when you’re writing programs, as it turns out. The network can die in the middle of the connection, your user can type in something that doesn’t make sense, all sorts of errors happen. We expose those errors many times and other languages, just hide them away, this happens in a language design level because we have a type called result that returns from fallible operations. We don’t have exceptions, a lot of languages hide a lot of stuff in exceptions, which is where you get that “catch e” or, “throw all,” the things we were just “Yes, whatever. I have no idea what exceptions is throwing, so I’m just going to catch them all and re-throw it again. Somebody else can deal with it somewhere else.” That’s not great for correctness, but it is easy to do.

We’ve introduced stuff like a question mark operator to help reduce the complexity, but it’s still always going to be there because we want you to be able to handle errors and that’s important. That’s a way in which our design has made something inherently complex. Languages that care less about correctness are able to just be like, “Yep. Throw us a bunch of random crap,” and that’s fine and it becomes much easier to use. They’re able to get rid of that stuff, and so, it’s not inherent for them.

One way that Rust does safety and speed together – we achieve those two values at the same time – is by having a really great static type system because types or check to compile time so, they’re free. Remember what I said about long compile times earlier, they’re not actually free, but at runtime they’re free. That’s cool, but have you ever used a really strong static type system you know they’re complicated, and that means as a user, Rust is a little more complicated for you to use, but the benefit of what you get out of that is programs that are really fast. That’s cool, but that means that we have this inherent complexity to achieve our goals, and these things actually matter. If your goals are not to have safety and speed at the same time, to only be fast or only be safe, then you don’t need these complicated type systems, and things become a lot easier for your users. That’s not inherently complex anymore.

Case Study: Green Vs System Threads

Last case study before I go away, Green versus system threads. This is the most complicated, actual concrete case study I have for you here. There’s these two different models of threading, there are more of them as well, but for the purposes of this talk, only these two exist. I’m not going to get into the details that much, but basically, system threads are provided by your operating system. They’re an abstraction for running code, you say, “Hey, OS, please run some more code at the same time.” And it goes, “Cool.” It doesn’t actually run at the same time, but that’s a whole separate story. Green threads are an API that’s offered by your runtime. This is a programming language “Hey, I have this mechanism for running code at the same time.” And you’re like, “Cool, I’ll use that.” Sometimes this is called N to M threading because you have N system threads that are running N green threads, and sometimes system threads are called one to one threading cause one system thread is one operating system thread. These terms are also incredibly loose, and you can argue about them a lot on the internet if you want to. You can argue about a lot of things on the internet if you want to.

Some of the tradeoffs involving picking these things are system threads require that you call into the kernel, the kernel API, and they have a generally fixed stack size, yes, you can tweak it. This is a slide, I’m not putting every last bit of politics onto here, you get 8 megabyte by default on x86, x64 Linux. Green threads, however, because they’re run by your program, they have no system calls. That’s cool, no overhead to call into the kernel and they have a lot smaller stack size. For example, a go routine has currently 8 kilobyte of stack, it used to be even smaller, they found out that was too small, they made it a little bigger. From these set of tradeoffs, it looks like you always want green threads. Why would you ever use system threads? These are just better.

As I mentioned earlier about the way Rust development changed, sometimes your values change over time. Rust was before 1.0 for five whole years. Originally, even though Rust had the same design goals and values, it expressed them very differently. Originally, Rust was actually much more similar to Erlang than C++ just little weird, it was awesome, but weird. It provided only green threads because that’s what Erlang does, and as the previous slide showed you, obviously, you’d pick green threads in every situation. Over time, Rust got lower and lower level, and we were able to commit more to our performance goals by doing shenanigans.

We had to reevaluate this choice. This was such a contentious change, there were people threatening to fork the language over it actually. That’s one other story I don’t have time for. The argument goes like this, “You’re supposed to be a systems programming language, but you don’t provide access to the operating systems API? What does that even mean?” And we’re like, “Yes, that makes sense,” and then also the downside of green threads because you have these small stacks and they’re different stacks, if you want to call in to see, you have to switch to the regular operating system stack. That has a cost, that performance is totally at odds with our previous performance stated goal as well. We tried to bend the curve, and we failed.

What if we had a unified API that let you pick? Do you get green threads or do you get system threads? Whichever one you want. You’re writing code one way, they’re just threading models. You spin up a new thread, it doesn’t need to be a green thread, our system thread is fine, let’s do both. We had this “libnative” versus “libgreen,” and you pick the one you wanted in your program and people would write libraries that would be abstracted, that didn’t care about the threading models. You just get to do whatever you want, everything will be wonderful, the problem is that this gave you the downsides of both and the advantages of neither. That was a problem, it turns out that our green threads weren’t very lightweight, they were actually pretty heavy. There’s some other things, I have some lists here, I’m not going to read all this to you, but basically, some things only made sense for one model and not the other model, but both models had to support both things. That was awkward, it was a problem with IO because only some stuff worked properly across both things or just implementation issue.

Embedding Rust, if you want to write Rust in embedded system, you’d have to say, “I never support the green threading runtime, but the whole point you’re supposed to be agnostic.” It’s like, “How does that go?” Then finally we committed to maintaining both things, you just had to be good enough to maintain both of them, and that’s a really big burden. We eventually decided to kill it, and that was bad, we realized that we were able to commit to some values more than others, so the answers were different. There are other languages who only give you green threads. That is awesome for them, they have different values than we do. You should take time as you’re designing a system too to check back in with yourself and say, “Hey, have my values changed since I originally made this decision because maybe the decision I made was a bad one and I need to reevaluate it.”

Now, we only have system threads in the standard library because when I’ve runtime, it means that you can write your own runtime and include it if you want. There is two different packages, one is called Rayon, and one is called Tokyo, and they are both ways of doing green threads for different kinds of workloads. Don’t have time to get into it, we could talk about it afterwards if you’d like. There’s also tradeoffs here as well, for example, now you have to know about Rayon, and/or Tokyo, and you have to pick the right one to use, that’s complicated. Then finally, what happens if people made six packages to do this instead of two? There’s some downsides, but I don’t have any more time.

With that, thank you so much for coming to my talk. Three things you take away from this are tradeoffs are an inherent aspect of our domain, but if you think outside the box, you can sometimes have your cake and eat it too. You should use Rust when you really care about something being both robust and fast.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Presentation: When and How to Win With New Programming Languages

MMS Founder


Welsh: Andrea has already introduced me but that’s the social media, the website if you want to tweet me or read my kind of deranged thoughts. Thanks for coming along to my talk. Let’s get going.


What is this talk about? The main point I want to make is that you should consider using more programming languages or maybe I should say you should consider changing programming languages, perhaps more frequently than you currently do.

One of the things I believe is that programming languages are over the long term the biggest drive of productivity that we have. I think is bigger than process, bigger than architecture, bigger than libraries but it takes a long time to see those returns, I’ll make any case for languages on that front. If we are going to adopt new languages, we got to understand the forces of against adoption, what stops us from adopting languages as quickly as we might like and understanding how and when we should be adopting languages. Those are the main points in front of making this talk.<.p>

The Lawyer Hypothesis

Let’s go back to a depressing long time ago back to when I was at university for the first time. Back in Australia, at the University of Western Australia, I studied engineering. For some reason, I’m not really clear on we had this on running feud with the lawyers, we had T-shirts that had this quote by William Shakespeare, “The first thing we do let’s kill all the lawyers.” Every year, we would have this big battle against the lawyers which involved sort of water fights, some tug of wars and just general mayhem. Every year, we would absolutely destroy the lawyers which was helped by the fact that we had many more students than they did and, probably, the gender bias in engineering gave us a bit of a physical edge.

I grew up thinking that lawyers were scoundrels, that was my first introduction to them. As I met a few lawyers outside of university, I understood that, actually, lawyers are not just lizards walking around wearing human skins. Well, a bit of a strong statement, they’re mostly not lizards wearing human skin. Another thing about lawyers I also learned was that they do get paid a lot and programmers get paid a lot but there’s something interesting about lawyers which is that at the top when you have the people getting these salaries, which is CEO level salaries, these lawyers, actually, do law whereas the people at the top of software companies don’t do software, they don’t write code. That’s interesting to me because I believe that the value of software is much higher than the value of law. I don’t think that’s really controversial because the richest people in the world are all coming out of software companies. These companies have huge capitalization, computing companies like Apple, huge market capitalization is clear, the value that software can deliver is much higher than the value that law can deliver.

It seems that the productivity of an individual lawyer, the ability to create value, is much higher than that of the individual developer. Then the question becomes, why is that the case? Is it just the fact that the lizard-like metabolism allows them to work 24/7 or is it something else going on? Let’s look at productivity. What determines productivity in this industry? Something which I hope is not controversial is that humans don’t scale, this is the law of software. Humans make mistakes, humans get things wrong but the thing that does scale is automation and that is why software is so valuable. We can take lots of people and do them on one computer and do them efficiently. If we’re looking to scale, we need to look at automation.

Productivity means we can’t be writing code, the only way we’re going to get productivity is by not writing code. The only way we’re going to get productivity at scale, is by not writing code. Ultimately, we have to not write some code, so all of your process, your domain driven design on that. The whole point of that is to find the right code to write so you don’t write the wrong code. Other places where we can get productivity are by using code that’s already written, reusing libraries, reusing that web server, that web framework or react or whatever it is that works for you.

Another place that we can avoid writing code is by the language. What the language features give us are things we don’t have to write ourselves, for example, in assembler, you have to worry about register allocation. That’s something you never have to worry about, it’s done for you, it’s written for you by a compiler, so the languages allow us to not write code. This is fundamentally about abstraction, it’s being able to say more concisely something that covers a large domain.

When we look at the facilities that we get for abstraction, ultimately, languages give as much more power than libraries because libraries are constrained by the host language. If you write a library in C, for example, do you always have to talk about memory management? Maybe it’s not actually relevant to the domain but it’s going to be in there, you can’t get around that but if you have a language that gives you some kind of automatic memory management there, like, last approach I find linear types or garbage collection. That’s a concern you don’t have to discuss, something you can only achieve in a language, you can’t achieve it in a library.

What Is A Language For?

I want to look at what can we do with languages. When we understand what language does, we’ll understand where we can adopt them. When I think of languages, I think of them delivering two main things. One, is this idea of language as a tool that allows us to control the machine, it’s the interface by which we access the machine, can we talk about types that the CPU understands? Can we talk about floating point, IEEE Floating Point. Can we talk about machine [inaudible 00:08:15] to be integers. Some languages allow us to talk about these things and some languages don’t.

JavaScript, for example, is a language that doesn’t have any distinction between integers and floating point. You have to hope that the compiler can figure out what type you’re actually using and do some arithmetic with it. That’s something that you can’t control in JavaScript. Another example might be CMG instructions, the vectorizing code. On the JVM, you can’t explicitly talk about vectorizing code at this point in time. The just in time compiler, JIT, will recognize some patterns and vectorize them but you can’t explicitly tell it, “Hey, I’d like you to vectorize this.” You just have to know the patterns it’s looking for or you have to hope that it’s going to recognize the code and vectorize it.

The machine is more than just a CPU, it’s a whole sort of system that you run on and so one of the things that you can do is like accessing an operating system. Can you have access operating system? You do not have a structured way. The JVM gives you kind of abstract view of the operating system which can be annoying when you can’t get access to the features you want, some of the non-blocking stuff is not as good as it could be, for example. Operating system is sometimes not really like the machine operating system sometimes, it’s something like the web browser that is a platform that seems to run on JavaScript being as an example there and so on.

On another case where we see a sort of a platform and maybe don’t think of them as platforms are when we have frameworks so a lot of systems, for example, a mobile, thing that you’re accessing. The operating systems, there are some aspects but it’s like the UI kit that you get, user interface tools that are provided by the Apple’s UI Kit, I can’t remember what it’s called in android. These frameworks are things that you also use languages for.

Another way of looking at languages which is somehow orthogonal, is as a notation for expressing thought or a notation for expressing solutions. How concisely can you express whatever it is you’re interested in and avoiding the irrelevant details like memory allocation was example used earlier. It’s going to be allocation relevant, if it is, can you talk about it? If it’s not, can you not to talk about it? In many cases it’s not relevant, so you don’t want to talk about it.

What are the first class values in your language? What are the things you can return from functions? What are the things you can pass to functions? What are the things you can compose programs out of? Languages without first-class functions are a great example, season example there. You can return a function, you can return a function point, it’s not quite the same thing. You can’t really pass functions around the same way, you don’t have a notion of closures we could say. You can’t compose functions together in the same way that you can in function programming languages.

Another example of generic types can you express that some code is abstracted over any type often use in containers but not always used in containers in function programming world. We have lots of things where we abstract the types, the express of error handling or concurrency, all by using generic type is something that you can do in majority of 30 taught languages we can’t do in Go, for example, in all the generic types, you’re going to get the things that they build in. The limitations to what you can talk about, in Go, you have this repetitive error handling pattern that comes up again and again. You can’t abstract over it because you don’t have any notion of generic types.

Another example is automatic resource management. Do you have to manually open and close files or files be closed for you and if they’re closed for you, at what point are they closed? This is really looking into the future a little bit, we’re used to memory being managed for us but the memory just one type of resource and then there are lots of other resources that we might learn, manage files, sockets, whatever. Most languages don’t provide any facility for managing these. I think Rust is where we’re first seeing these ideas coming into the mainstream. Those are the different views of languages, controlling machine verses expressing thought, in terms of the platform and your language ransom, what can you use and then what concerns can you discuss in your program or what concerns that you’re forced to discuss by the language?

I just want to emphasize that when I talk about the machine, I’m not necessarily talking about something running on a physical CPU, it can be abstract away like the browser and also we can narrow down this idea of what is the machine that we have access to? It would be quite narrow and we call these things like a DSL, Domain Specific Language. You might have, a common example, DSL is a style sheet, Cascading Style Sheets, CSS. The machine that you access here is the lead engine in the browser and you can control that in certain ways and you can’t talk about anything outside of that. You can talk about layout, animations, and no more.

Domain-specific languages or as I call them sometimes little languages, a very important class of languages we need to consider. We normally think of these big languages, we discussed earlier in Go, Scala, Java, whatever about other languages as well. One of the things which I think is interesting is that, generally, notion of the machine which determines if the language is going to be used. For example, JavaScript. Who would use JavaScript if it wasn’t for running in the browser? Well, some of you might but I don’t think many people would because I don’t think it is a very good language. Maybe I’ll just say there are better alternatives.

Of course, if you’re running stuff on the browser, you don’t have too many options. At this point in time, web assembly is more ubiquitous, you have to at least compile into JavaScript some way. This kind of machine is driving adoption but it’s really the notation aspects which are giving upper bound in productivity. There’s a real tension here because we want languages that are productive but it’s the access to the machine that is driving short term value, so a bit of tension there.

Where Are Languages?

I want to talk more about languages, and where we find languages, I’ve talked about this a little bit already but let’s go a little bit in more detail. We’ve talked about the big languages, Python, Objective-C, Swift and all that. Lots of those but little languages I think are more interesting and there are more than we expect, here are some examples. CSS, I mentioned earlier, CUDA for controlling GPUs, terrible thing but it is this, configuration, we often don’t think of that as being a language but it’s already getting that way. Excel, Spreadsheet, probably the world’s most popular programming language. Marketing automation, very much outside what many people are familiar with here, this is often controlling like sending emails and things like that or sending tweets when this person takes this action, send them an email, “Hey, I noticed you abandoned your cart, would you like to come back and buy something?” They’ll ask millions of those.

There are many more little languages in use than there are big languages and they’re much more ubiquitous and we don’t really think of them when we think of program languages and we should. When we look at the frameworks that we use, and when using Spring – I feel sorry for you, my condolences – Kubernetes, all these kind of things, the things that are kind of frameworks but they’re really libraries and we program them by the configuration files, kind of JavaScript, XML or YAML, or whatever. We call these things configuration and think of them as being like an afterthought but a lot of them are, actually, languages, they just don’t realize they’re languages. We’re seeing that in these crazy little DevOps languages. I’ve used a bunch of them, Ansible was one of them, that was a lovely thing where I had like a programming language embedded in YAML. I had to work around the YAML syntax to actually get your programming language interpreted correctly.

I understand the Kubernetes are similar things going on, I’ve never used it myself. I understand I should be thankful but all these kind of things have language like facilities. They need bees of obstruction and reputation, but they don’t ignore the fact they’re languages and suffer for it. Another example, CSS, pretty much people don’t do front-end stuff here but Cascading Style Sheets have just added this notion of variables, they’re really not variables, they don’t change the bindings if you like. They’re probably 10 years, I would say about 20 years too late but this ship has sail because they didn’t realize they were building a language, they thought it was a static thing and, really, it’s always needed means of obstruction and people have been adding these means of obstructions through other means. Bunch of these little things, CSS was one of them, SaaS was another one generating CSS and now people just write on JavaScript because you need to treat it like a language. They didn’t realize that the time they were creating it so they didn’t build any facilities or obstruction into the language and now people have to retrofit it in there.

We’re seeing the same thing in the DevOps world and these crazy languages that generate, YAML or whatever because the original developers didn’t realize they were building a language here, they didn’t build any forms of obstruction and now you’ve got to a mess of YAML and little crazy languages built on top of that. What we see here are languages, languages everywhere, we think of as programming languages but, actually, we’re in a world full of languages or things that could be languages if we look at them the right way. A framework, for example, a configuration is a language or something that most people run into but there are many products as well which are, actually, languages, excel is a great example.

When Are Languages Adopted?

The question becomes, if you buy my hypothesis that language is the root of productivity, when a language is adopted, what are the forces that allow a language to get adoption? If you understand this, then we can understand how we can adopt languages ourselves.

The big languages, generally, access to compelling new platform or machine. JavaScript, like I said, I don’t think anyone would use JavaScript if it wasn’t for the browser. Objective-C, what would you use Objective-C for if it wasn’t for iOS? Few developers here. These are languages which I think have a number of flaws in them and I’m not sure that people could use them if it wasn’t for the fact that you have to use it because you don’t have to use Objective-C anymore, you can use Swift now on iOS.

Then keep this idea that the platform is not just like offering any system or the device so on. It may be the framework, like Ruby, at the time got popularity because it was seen as a way to access Rails and at the time Rails came out, it was seen as one of the few sane ways of creating websites or web services. They were web services and they were websites, maybe everybody’s around at the genesis of Rails but at the time it was the early days of the internet. You had CGI and postscripts or you could write enormous piles of Java nonsense in XML and Rails are seen as one of the few sane ways of actually producing a website. It was compelling to people, I want to produce a website in a reasonable amount of time, I better use Ruby. The notion of a platform is not just restricted to what we typically think of as a device and people, the computation.

Another reason why languages get adopted is because they allow us to respect some legacy codes that we have to continue to maintain our investment in legacy but they give us a better notation here. I think legacy is under appreciated for in our industry. Why are we still using Unix? Can we not do better than Unix? We sure can but we’ve got legacy is there. We’ve got a huge amount of legacy tied up in Unix. Why did Windows win rather than OS/2, for example, if anyone was around for that? Because OS/2 wasn’t going to run any of the existing software and Windows was and so on throughout computing legacy. Respecting legacy or maintaining legacy has been one of the driving forces of our industry, we don’t give enough credit. If you can offer something which is going to allow people to continue using their investment in legacy and at the time has a dramatically better notation than you might see in the adoption.

Here, we have Scala versus Java. When Scala came out, we were still back in Java six days and that was seen as a vastly superior way of accessing the JVM. If Scala came out now, I’m not sure we will see the same adoption because Java has unbelievably to some of us who live through the job six years actually started innovating and changing, which is great. I’m not sure the advantages of Scala would seem so impressive in this new context. On android, it’s still Java six days so they’re using some adoption column, I think it’s quite popular in the android field, I don’t really work there myself.

Swift or Objective-C, very conscious decision there to allow you to work with iOS. Rust is introduced, basically, to do everything you were doing in C, to access that operating system, all those system codes but in a much better way. Perhaps, the final one is TypeScript, coming in here versus JavaScript. TypeScript is just adding a type system on top of JavaScript, you don’t have to change any code. There’s small changes through code but you can get a type system, since beginning adoption really just this year, it seems to be taking off.

There is another force here but I think it’s a much weaker force and that is additional cultural fit, certain communities have different values that different languages. If you want to run something on the BEAM, the Erlang virtual machine, you can use Erlang itself which is fairly unpopular. There’s this language called Elixir which runs on BEAM, I wouldn’t say it’s a hugely popular language but it comes from the Ruby community. It has, I believe, a Ruby-like syntax and it seems to be popular with people coming from Ruby who are looking for a language that has a more interesting platform. Ruby doesn’t have very much support for concurrency and Erlang virtual machine, obviously, does so using some adoptions in there.

I’m going to say it’s by Go versus every other compiled language. Go is an interesting one to me, I don’t really understand why people use it, maybe be some of the Go programmers can tell me. My hypothesis is that if something that’s culturally acceptable to a bunch of different people, if you’re coming from a systemic world, then Go is enough like C. You don’t have to learn new things, but you don’t have to do all the C stuff and I think if you’re coming from Python or Ruby, then you can write your code the same way you would write it in those languages because you’ve got structural type system and you get some advantages. I think there’s a cultural aspect as well as a sort of better notation in terms of aspects, I’m not sure because I’m not much in the Go community myself.


Let’s start the story, then how it has played out my own career. Record is a dialect of Schemes, Scheme is sort of a Lisp dialects. It’s a wonderful language and I started writing it when I was at university. I might have continued writing it into a consultancy with a friend from university with credit control. I’m sure you’ve read some record programs, you got people to pay money for them which seems extraordinary now. Ultimately, though, we couldn’t expand the company as far as we wanted to because we couldn’t find clients who were prepared to put up that craziness.

The reason I see for Record not getting adoption is lovely. You didn’t offer, you’re compelling new machine or just lying to the same things, you do any other language. It didn’t really have any respectful for the legacy either, you had to write everything yourself and this is one of things that got us, was we just couldn’t write our own web frameworks, our own database drivers, and our own everything. It was just too much work to do, particularly, as a consultancy where you may be working different systems all the time. A lot of aspects went into this but I think one is irrelevant.

After doing Record for a while, we then switched to Scala and we’ve been happily much more successful with Scala. We started with Scala when it was really in infancy and it’s been taking off and now there’s quite a big community and quite a vibrant one. What I see is driving adoption of Scala was two things, one, there was Spark which at the time you’ve said you had Hadoop, which is awful and you had Spark and Spark was 100 times faster and you could get pages and pages of Hadoop code, hundreds of lines could be turned into three or four lines of Spark. That was compelling to people to get that Spark is one of the main drivers of Scala.

The other angle was the respect for legacy, the fact that you could take your existing Java Code, your existing JVM Software and you could just run it with Scala being a better Java. That’s how I mean many people many people bootstrapped into Scala, they started writing Java without semicolons and then over time, they learned about the new features that Scala brings. The final thing I got to say, it was culturally acceptable for me to use Scala because I came from a functioning programming background and Scala’s functional programming language, so I could use Scala without feeling like it was too distasteful so there’s some aspects there.

Little Languages

What about other little languages because this is really where I think there are many more languages going on. What are the forces here? I think it’s generally the same forces but much easier to adopt because it has much lower risk. Some of them, you have to adopt to access a platform, CUDA and CSS, some of them you can choose what to use. You generally think of that much less investment in them where people pick up, say, one of the CSS generators, none often think that this is like something you’re going to be stuck with forever. You feel like you can probably easily get away, change it if you need to because the amount of code you have in CSS is maybe not that large. You don’t, generally, have tens of thousands of lines. It’s not such an investment so even more ready to adopt.

We’ve been talking about languages from the point of view of the programming using a language but I want to also mention that languages themselves can be a competitive advantage for business. One example is this idea of marketing automation. Marketers want to do things, they want send email, they want to send tweets or Instagram posts, and if you can give them automation, they’ll benefit the same way that we benefit from automation. They benefit as well and they actually do programming although they don’t realize it where they set up these little flows when triggers happen, they like sending emails. You can build companies around this and you can access a new market.

Airtable is a startup, it’s probably exiting startup phase now. It’s a programmable spreadsheet, it’s kind of a combination between spreadsheet and database. It’s a very flexible tool, I think it’s something that we might be seeing more in software every time more program ability, getting closer to pass what people in the audience do.

Let’s talk about Cloudflare. I guess most people know Cloudflare and then I think they’re building a lot of products based on this idea giving you programming ability. They just announced something today talking about the web filter, the language for web filters and they’ve had something earlier there, computing at the edge with web assembly. Providing program ability is a thing that can be a real advantage to a company. There are many more of these little languages out there than the big languages. Frameworks are libraries that we really program.

How Are Languages Adopted?

What are the contexts where you can adopt languages? If you’re looking like a big language, that’s what we normally think of languages, my basic thought is consultancy forget about it. You might get lucky and you choose a language which is gaining adoption but I don’t think as a consultancy, you can drive language adoptions. This has been my experience, you have to work so the markets prepare to accept. Most people are not willing for you to put additional risk into their company by using a language that they are not comfortable with. That was our experience with Racket, I got to the point where people would say, “No, you can’t use that crazy language.” That’s quite understandable.

In a consultancy context, you have to work with the market. Maybe you get into that market as the language of the market is growing around it but it’s not something you can count on. However, if you are a product company, then you can do whatever the heck you want. That’s fine, my advice here is if you’re going to choose a language, you need to go all in on it. There’s a high cost of failure, but you do have some advantages. One, is you’re normally around languages there is a bunch of really enthusiastic, early adopters and they will love to work in your language and you can hire them very easily. They will move for the opportunity or maybe accept lower pay, I’ve seen that in Haskell. At the moment, I remember one of the cases in Scala is now swung the other way completely where you can’t hire Scala developers anywhere but if you get in that new language, then you’ll find people who are really excited to use it and they’ll come to your company and they will help you recruiting.

That pool is going to dry up eventually, I suggest very strongly and I think all companies should consider this really, is creating a remote first culture, I have companies who have been remote first. There are so many developers out there in lots of places that you can access, that are not London or New York or San Francisco and they will happily work through a company if you can accommodate them, particularly, if you’re using a language they like. There are down sides here, I think you need to go in on a language but you got to be prepared for some burden you’re going to be taking on.

You might end up maintaining or building a bunch of libraries than the database drivers or the AWS library or Google Cloud or other cloud providers who do exist. Those libraries might not exist, you might end up having to write them and maintaining them and that’s the pain but it does give your status within the community which is another thing you really need to look at during getting new language. Maintain your community presence so that you can find these people who want to work with these new languages, come to, so you got to be visible there.

The other problem is you’re creating legacy. One thing I’ve noticed when people start using new languages, they often don’t know particular community is small, the idioms are not established yet, they don’t know the best way to use the language, they make lots of mistakes. We made lots of mistakes in Scala and there are many frameworks in the early days that people don’t really use anymore and we consider them to be bad designs, we first had to learn that the hard way. Be prepared that you will be making mistakes and you will just fix them up. That’s possibly the case in anything but depending on where you adopt, there may be more mistakes to be made.

The advice here is start small, start maybe with your best teams, got to demonstrate success early. People get nervous, you’re taking on a risk here, you’ve got to demonstrate that you can mitigate that risk. Choose something that’s relatively isolated part of system to demonstrate that you can implement it or re-implement it in this new language. Once you’ve got that, then you can spread out amongst the rest of the company, normally, team at a time, telling people what the best practices are getting that new culture that is new ways of doing things, getting that spread across the organization.

As a consultant, who’ve done a lot of these, consider getting external mentors in. We do a lot of work with people that are adopting Scala, they tell us it really helps them. We’re perhaps view that they don’t have and we also have time that they don’t have to help people switch this new way of doing things. There are external experts, maybe consider getting them in, I think a lot of companies underinvest in training, they make a mistake. Little language on the other hand, the risk is lower, so I just say, “Just go for it. Have fun, it’s great.” Low cost of failure.

The language that you’re adopting here, firstly, is it simple enough? A lot of languages have an appearance of simplicity but if you see Rich Hickey talks, simple is not easy, they end up being real messes. You have like these DevOps DSL that I talked about earlier, they appear on the surface to be simple but when we actually get into them and you’re trying to use them in a moderately sizing, they are very difficult to reason about. That’s what real simplicity is, can you reason about this, what is going to happen when this beta code is used? Have people got a good design here or how they just sort of bunched together whatever came into the head the first time.

There’s another point here that is this going to be used enough to actually make it worthwhile to overcome that? There’s a one-time cost you pay for learning and you then get the benefit every time of that learning. Are you actually going to use this or not? It is just some peripheral part of your system that people look at once every six months, for example, in which case maybe it’s not worth doing because you won’t get that benefit from adoption of language.

The final thing is, when you introduce a new barrier in the system, you go from a configuration thought of framework or whatever it is. You’re introducing difficulties of reasoning because it’s very difficult to reason across boundaries, something you will see in micro-services, for example, where your protocol ends up being something you just agree on and it’s very hard to understand what’s going on the complete system. It’s the same thing with DSL, if it’s DSL, particularly if it’s an external DSL, the one embedded in a host language is difficult to reason about that, it goes across the boundary. Be careful about where you’re introducing these new boundaries and ask yourself, do you want to reason about this?

I have a colleague of mine who is working in the U.S., probably, she can tell you the customer but he has spent. This week and most of last week, has been just dealing with configuration files trying to understand, what is the actual configuration that this system is running on because it doesn’t seem like it’s doing what it should be doing. That’s an example of this problem, we’re trying to reason across the boundary, the configuration file, how they actually get assembled and what is the current running with. That’s much of what I got to say.

Conclusions, I think languages are really powerful tools, I don’t think we give them enough credit. What we can do with languages, they can allow us to be much more productive, they can open up new market segments based externally in terms of what we’re doing, providing for customers or internally, they give us novel ways of doing things that can be radically more efficient than what we’re used to. Adoption is risky, the conditions have to be right. I’ve gone through some of the conditions, I’ve seen in my experience some suggestions but you consider adding an idea of languages into your toolbox along with everything else as discussed in this conference.

We’ve talked about continuous integration, deployment, DevOps, domain driven design, people very rarely to talk about languages. I think that’s something you should consider adding to your toolbox. I think it’s really powerful if you can leverage it correctly.

Questions and Answers

Participant 1: How important do think opinionated this is in languages? For example, one thing that Go may be was that it had good format. There’s just one way of doing this. TypeScript opinionated JavaScript.

Welsh: I think there are a bunch of things where your opinion is unimportant and that is like formatting of code. Who cares? Maybe some people do care but I don’t think any sane person cares about the formatting of code. I’ve got some OCD colleagues who like to rearrange code all the time. If you’re going to take away a bunch of decisions you don’t need to make, I think that’s great. That’s one aspect of opinionated, of just decisions that you really don’t have to care about to get rid of them, give them to a tool.

I think the other point is opinion is guiding you down a certain form of language design. We see that probably in Scala where perhaps it’s not opinionated enough and people end up doing all sorts of crazy things that aren’t working so well for them. I think having clear consistent paradigm is important but then I got to say in defense of Scala, I don’t think it will have the adoption it had if it wasn’t so easy to go from Java into Scala. I don’t think it would be that gentle slope and then you can get down the sort of functional roots that I think most people end up.

Opinions, I believe, are important but if you’re particularly opinionated, that can be a hindrance to adoption. We probably see like Scala getting adopted instead of Haskell, Haskell is a much cleaner, pure functional programming language, many of the words in Scala don’t exist there because it doesn’t respect legacy because aspects, it doesn’t have the kind of same adoption, so you got to be careful there.

Participant 2: What do you think of the future of Scala, especially, in the face of all of these innovation in Java, these functional features, etc., being added?

Welsh: First question, are you a Scala programmer?

Participant 2: Yes.

Welsh: I’m heavily in Scala. I think Scala 3-Dotty is going to be great, is going to get rid of a load of inconsistencies, what’s in the language, and give us a much cleaner way of writing the kind of code I want to write, I think that’s great. I also don’t believe that what Java is doing is a serious threat we can say to Scala because I don’t think it gives us the features that I want to use. However, I think you have to go quite away down the functional programming in which to understand a lot of the concepts, a lot of the benefits you get from it. When beginning functional program, let’s talk about functional programming, they tend to talk about first class functions and high order functions and immutable data structures, they’re very surface level things.

I don’t think that’s wrong, I think that’s just the way that people learn, there’s no problem with it. My experience has been as I go further, I’m caring more about reasoning about code and things like higher kind of types that allow me to reason about code, which is much harder to explain because you need to adopt a very different mindset, you need to do a lot of learning to reach the point where these concepts are meaningful. In summary, I think Scala 3 is great, I think it’s going to allow Scala developers to do much better things. I don’t know that it’s going to be particularly compelling to Java programmers because I’m not sure that they’re necessarily on the mindset or paradigm, what do we want to see where there’s compelling features to them, but we will see.

Moderator: You talked about configuration and small languages and I’m wondering, this is what happens to me? I see configuration files and I see this looks like a language to me. Why should I stop going and implementing an interpreter language on top of this configuration files or then, I don’t want my co-workers to hate me.

Welsh: No one would hate you, Andrea [Moderator], it’s impossible. I think we should be treating configuration as language and I think we can do some really interesting things there. One thing they’re going to ask the world to have this language code DOL, which is not Turing complete. It’s a configuration language that we can’t write in an infinite loop, I think that’s fantastic. I have a customer I’m talking to at the moment, actually talking to the lawyers which is terrible because I think their lawyers are reptiles. They insisted like a configuration language, the system is configuring it for the enterprise.

If you can have a configuration language that doesn’t allow some sales engineer to have set everything into an infinite loop, that’s fantastic. There’s a real benefit to be got there from having taken the idea of configuration as a language. Of course, the usual benefits of abstraction, these configurations can be enormous, you want to re-use them across different clients, particularly, when you’ve got like parent organizations that have accomplished that. Lots of benefits of abstraction and configurations, I think we really should be treating configuration as a language and apply the same discipline we do to big languages and perhaps we should be looking at terms of things like this non-Turing complete guarantee to terminate some languages.

Participant 3: What are your thoughts on evaluating and re-evaluating choices that you’ve made as a team for certain languages? Like for example, I work in analytical team where for the last three years, to my experience we’ve had every year the same discussion, are we using R, or Python or both. It’s coming back every time. Do you have any thoughts on the matter?

Welsh: If you use a big initial distinction, then I think changing the big language has a higher cost, a higher risk to it. That’s probably the case you’re in of using R and Python, use both. You have to see what are the benefits you’re going to get from changing. Normally, it has to be, most people are fairly dramatic in improvement or can you lower the cost of switching to the extent where it’s not such a big deal to change between the two. I think there are some interoperability between Python and R, maybe that’s sufficient.

I would just say, look at the cost in case you get our training and unfamiliarity with code, perhaps. What are the benefits that you’re going to get in return? Access to different tools, perhaps, I think, ours has got a better sort of statistics, toolbox and Python probably a better machine learning toolbox. Then if you can push down the cost of maybe people who can develop training internally, learn R or Python or whatever it is that you need to do, just the way you organize your team, so that’s not just like a Python silo or an R silo, then maybe you can push down the cost of benefit from both of them, I think that’s possible but definitely, are risk associated. We need to be cognizant of that.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Comparing Classifiers: Decision Trees, K-NN & Naive Bayes

MMS Founder

A myriad of options exist for classification. In general, there isn’t a single “best” option for every situation. That said, three popular classification methods— Decision Trees, k-NN & Naive Bayes—can be tweaked for practically every situation.


Naive Bayes and K-NN, are both examples of supervised learning (where the data comes already labeled).  Decision trees are easy to use for small amounts of classes. If you’re trying to decide between the three, your best option is to take all three for a test drive on your data, and see which produces the best results.

If you’re new to classification, a decision tree is probably your best starting point. It will give you a clear visual, and it’s ideal to get a grasp on what classification is actually doing. K-NN comes in a close second; Although the math behind it is a little daunting, you can still create a visual of the nearest neighbor process to understand the process. Finally, you’ll want to dig into Naive Bayes. The math is complex, but the result is a process that’s highly accurate and fast—especially when you’re dealing with Big Data.

Where Bayes Excels

1. Naive Bayes is a linear classifier while K-NN is not; It tends to be faster when applied to big data.   In comparison, k-nn is usually slower for large amounts of data, because of the calculations required for each new step in the process. If speed is important, choose Naive Bayes over K-NN.

2. In general, Naive Bayes is highly accurate when applied to big data. Don’t discount K-NN when it comes to accuracy though; as the value of k in K-NN increases, the error rate decreases until it reaches that of the ideal Bayes (for k→∞).  

3. Naive Bayes offers you two hyperparameters to tune for smoothing: alpha and beta. A hyperparameter is a prior parameter that are tuned  on the training set to optimize it. In comparison, K-NN only has one option for tuning: the “k”, or number of neighbors. 

4. This method is not affected by the curse of dimensionality and large feature sets, while K-NN has problems with both.

5. For tasks like robotics and computer vision, Bayes outperforms decision trees.

Where K-nn Excels

1. If having conditional independence will highly negative affect classification, you’ll want to choose K-NN over Naive Bayes. Naive Bayes can suffer from the zero probability problem; when a particular attribute’s conditional probability equals zero, Naive Bayes will completely fail to produce a valid prediction. This could be fixed using a Laplacian estimator, but K-NN could end up being the easier choice.

2. Naive Bayes will only work if the decision boundary is linear, elliptic, or parabolic. Otherwise, choose K-NN.

3. Naive Bayes requires that you known the underlying probability distributions for categories. The algorithm compares all other classifiers against this ideal. Therefore, unless you know the probabilities and pdfs, use of the ideal Bayes is unrealistic. In comparison, K-NN doesn’t require that you know anything about the underlying probability distributions.

4. K-NN doesn’t require any training—you just load the dataset and off it runs. On the other hand, Naive Bayes does require training.

5. K-NN (and Naive Bayes) outperform decision trees when it comes to rare occurrences. For example, if you’re classifying types of cancer in the general population, many cancers are quite rare. A decision tree will almost certainty prune those important classes out of your model. If you have any rare occurrences, avoid using decision trees. 

Where Decision Trees Excel

Image: Decision tree for a mortgage lender.

1. Of the three methods, decision trees are the easiest to explain and understand. Most people understand hierarchical trees, and the availability of a clear diagram can help you to communicate your results. Conversely, the underlying mathematics behind Bayes Theorem can be very challenging to understand for the layperson. K-NN meets somewhere in the middle; Theoretically, you could reduce the K-NN process to an intuitive graphic, even if the underlying mechanism is probably beyond a layperson’s level of understanding.

2. Decision trees have easy to use features to identify the most significant dimensions, handle missing values, and deal with outliers.

3. Although over-fitting is a major problem with decision trees, the issue could (at least, in theory) be avoided by using boosted trees or random forests.  In many situations, boosting or random forests can result in trees outperforming either Bayes or K-NN. The downside to those add-ons are that they add a layer of complexity to the task and detract from the major advantage of the method, which is its simplicity.

More branches on a tree lead to more of a chance of over-fitting. Therefore, decision trees work best for a small number of classes. For example, the above image only results in two classes: proceed, or do not proceed. 

4. Unlike Bayes and K-NN, decision trees can work directly from a table of data, without any prior design work.

5. If you don’t know your classifiers, a decision tree will choose those classifiers for you from a data table. Naive Bayes requires you to know your classifiers in advance. 


Decision tree vs. Naive Bayes classifier

Comparison of Naive Basian and K-NN Classifier

Doing Data Science: Straight Talk from the Frontline

An Introduction to Machine Learning

Machine Learning Classifiers

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Presentation: F# Code I Love

MMS Founder


Don Syme is a language designer, compiler hacker, researcher at Microsoft Mobile Tools and a contributor to F#. He works with users and open source communities to make better programming technologies, and, through that, make people more productive and happier.

About the conference

Code Mesh LDN, the Alternative Programming Conference, focuses on promoting useful non-mainstream technologies to the software industry. The underlying theme is “the right tool for the job”, as opposed to automatically choosing the tool at hand.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Mini book: The Organisational Dynamics Review

MMS Founder

I’ve been guiding what you might call organizational “transformations” for the last decade and although this started with agile, I soon came to believe that isolating the transformation success of a company to only the tech teams or only the PMO processes or only customer service, etc. was short-sighted. The biggest and ironically the most undervalued element needing transformation was always the leadership and culture in these changes.

This publication is to start conversations about these topics which are so essential but so hard to wrap our minds around.

So I’m recruiting a group of fellow evidence hunters to help me provide a selection of interesting articles and critiques to you, hopefully regularly over the year. The first goal is to summarise and introduce you to blogs, articles, or academic papers that make us think and the second goal is to help us to apply some critical thinking and see if we can act on these within our real workplaces.

I picked a good controversial title to kick off with, I hope you enjoy the summaries and reviews.

Free download

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

C# 8 Pattern Matching Enhancements

MMS Founder

C# 7 laid the groundwork for pattern matching, but a lot of features had to be left on the cutting room floor. With the extra time C# 8 needs, many of these are being picked up.

Positional Pattern Matching

Consider this rather verbose pattern using C# 7 syntax.

case Rectangle r when r.Length == 10 && r.Width == 10: return "Found 10x10 rectangle";

By leveraging the deconstructor feature, the new positional pattern match makes the feature much less verbose.

case Rectangle (10, 10): return "Found 10x10 rectangle";

This feature will also be supported with anonymous tuples. This is referred to as a “tuple pattern”. You can see an example of this in Mad’s article Do more with patterns in C# 8.0.

Property Pattern Matching

The positional pattern is concise, but it only works if you happen to have a suitable Deconstruct method. When you don’t, you can use a property pattern instead.

case Rectangle {Width : 10 }: return "Found a rectangle with a width of 10";

Support for indexed properties are being considered as well, but the specifics have not been determined.

Deconstructor Improvements

Another idea being considered under the Open LDM Issues in Pattern-Matching ticket is allowing multiple Deconstruct methods with the same number of parameters. In addition to having different types, the parameters must be named differently.

ITuple Pattern Matching

The ITuple interface, introduced in .NET 4.7.1 and .NET Core 2.0, raises several questions in C# 8. The basic idea is that if an object implements this interface, then it can participate in pattern matching. Three scenarios are under consideration regarding when this goes into effect.

if (x is ITuple(3, 4)) // (1) permitted?
if (x is object(3, 4)) // (2) permitted?
if (x is SomeTypeThatImplementsITuple(3, 4)) // (3) permitted?

A related question is if a class implements ITuple and there is a Deconstruct extension method, which takes priority? Ideally, they would return the same values, but a tie breaker is needed when that’s not the case.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Presentation: Debuggable Deep Learning

MMS Founder

Singh: My name is Avesh [Singh], this is Mantas [Matelis]. We’re both software engineers at Cardiogram, as Mike said and we were working on a deep neural network to predict cardiovascular diseases using wearable devices like the Apple Watch. Along the way, we’ve learned a lot about how to build and debug deep neural networks, and we wanted to share some of that knowledge with you today. Personally, I find it helpful to look through the slides during the talk, and in this slide actually we’ve put some code, so, if you’d like to follow along, Mantas and I both just tweeted out the slides, you can find our Twitter handles right here, just look on our feeds and you can find these slides in PDF format.

We’ve titled this talk Debuggable Deep Learning, but is that an oxymoron? Deep Learning is often seen as a black box, you take a piece of input data, X case several thousand matrix multiplications, and out comes a prediction. This prediction isn’t easily explained, so this poses an obvious problem for machine learning practitioners. How do you construct and debug a model? In this talk, we’re going to walk you through some techniques that we’ve used to demystify the behavior of a DNN, and we don’t promise explainability, but, the point we want to drive home is that constructing a DNN architecture is not alchemy, it’s engineering.

This presentation is split into two parts. First, we’re going to talk about coming up with an initial architecture, to do this, you must understand your problem and your data. This is also going to introduce you to Cardiogram’s data set and model, which is going to be essential to understanding the second part of the talk which is on debugging techniques. After that, we’re going to talk you through these debugging methods that we’ve used to identify and fix problems in our DNN. Let’s get started.

Overview of Cardiogram Data

Cardiogram is a mobile app for iOS and Android. Who here has an Apple watch, or Garmin, or Wear OS device? Great, please download our app, we want more users, more labels. A lot of us track our heart rates when we’re running or biking, but within your heart rate data, you can also see your rem cycles, or how your sleep is disrupted by alcohol. You can quantify the anger you feel when you’re stuck in traffic on the 101, or your anxiety during a job interview. Your heart says a lot about you, and we’re using this data to detect signs of disease.

About 500,000 people use the Cardiogram app daily, we’ve surveyed many of these users in order to come up with a data set of diagnoses. The conditions we’re most interested in are these chronic cardiovascular diseases, diabetes, sleep apnea, and high blood pressure. These conditions form our labels, we’re trying to predict whether a user has diabetes, sleep apnea, or high blood pressure. For each of these conditions, as you can see here, the number of positive labels is in the tens of thousands, and the total labels are in the hundreds of thousands. Our data set is restricted to Apple Watch users, we get both heart rate and step count information from the watch. The step count is intended to provide some context, so, a high heart rate is more expected if the step count is also high, but as a side effect, it also provides a measure of how active the user is.

We get a user’s heart rate and step count at various time intervals. These time intervals are not always consistent, so, if a user takes her watch off to go to sleep, then there’ll be no heart rate readings for eight hours. To account for these gaps, we encode this delta time channel, which is DT here, which stores the amount of time since the previous reading. Let’s say the first time set, the user has a heart rate of 76 beats per minute, and then five seconds later, that rises to 78, and then after that we get a step count reading, and so on. This is the input that we use to our model, it’s a 2D array, shown here.

You might be thinking, this data comes with a lot of challenges. The three that we enumerate here is, one, as I mentioned, readings are taken at an irregular sampling frequency. Number two, the data is also very low dimensional, it includes just the three input channels: heart rate, step count, and delta time. Finally, the data streams are arbitrarily long, some users have been using their watch for years, while others only have data for the past day. Next, I’ll pass it off to Mantas [Matelis], to discuss some solutions to these problems, and to present some ideas around model architecture.

Building an Architecture

Matelis: You understand your problem, you know the characteristics of the data, it’s time to build an architecture. If you’re using an existing well-researched domain like image recognition or speech recognition, use existing architectures, maybe even existing weights, the less that you build yourself, the less you have to debug, and the easier life will be. If you’re in more of an unresearched space, like we are, you have a lot more work to do, so the broad advice that I can give is, start simple and look for incremental gains.

I’m going to go over one of the architectures that we ourselves have built and debugged. You don’t need to understand this entirely, but a general picture of what we’re working with will help some of the later slides. Our input at the bottom is the sensory data that Avesh mentioned a few slides ago, and our output at the top is the risk scores for the four conditions that we try and predict, atrial fibrillation, sleep apnea, diabetes, and high blood pressure.

At a high level, the architecture consists of feeding in the time series of wearable data into several temporal convolution layers, several LSTM layers, a final convolution layer, and then the outputs that correspond to the four risks scores. There are of course alternative data representations that lead to alternative architectures that we can try. Instead of feeding in the raw data inputs directly, we can group the data into hour long intervals, and manually generate features for things that we deem might be important for the neural network in order to successfully discover the patterns in our data, so things like total step count, heart rate percentiles over the hour, indicator variables, so things like, is the user sleeping at the time? Do we think the user was working out?

Then from this data representation, we can use a lot simpler CNN architecture, something a little more off-the-shelf, because in this architecture, the data is much higher dimensional, but also the sampling is fixed. You get one of these per hour, whereas, in the other case, you get a heart rate reading maybe every five seconds, maybe every five minutes, it’s a lot more complicated. This does of course reduce the granularity of the data from being an individual observation of a heart rate or a step count, but it helps to mitigate these issues, and it allows us to use a lot of a much simpler architecture.

This concept is really related to debuggability of DNNs, and its interpretability. Interpretability is just giving it an input, what made the output be the output, and there are a lot of sort of complicated papers about how to make really complicated deep neural networks interpretable, but I’m going to approach it from a simpler angle here. If you can make your model architecture simpler, do so. One of our cases is, we have some disparate data sources, so we have the sensor data that we’ve been talking about, but we also have things like age, sex, and BMI. If we’re able to build separate models out of both of these pieces of information, and then combine them later on, it becomes a lot easier to debug the model. You can tell which part of the model is failing, you can see exactly how each model contributes to the accuracy of the final model, and it’s really easy to run ablation studies. To go take out parts of the model and see, well, how much worse do we do?

Debug Your Model

We have architecture, we understand our data, it’s time to debug your model. There are two kinds of classes of breakages in DNNs, one is the really common “My model is in training or my model doesn’t work at all, maybe your losses is as NaN, or your AAC is point five”, something along those lines. There’s also a more insidious, “My model doesn’t do as well as I think it could”, and so we’re going to talk about both of these. First, you’re often not certain whether a model is broken or whether you have a hard problem. What’s the priority, what do we think? What kind of results can we expect? It’s hard to say, so try and find a baseline with a simple and obvious model that you trust.

In our case, for example, we could take those hourly interval feature vectors that we built and talked about a few slides ago, and through the mean and standard deviations of these into LR, and that would be a really simple baseline that we could compare the results of a DNN against. If it turned out that the DNN did worse than this baseline, well, of course that’s a pretty clear indication that the DNN is fundamentally broken, or something along the way didn’t go as planned, whereas, if the model does better, great, you did a good job applying deep learning. If the model does about the same as logistic regression, or linear regression, that means that there’s no value above and beyond the very simple feature engineering that you did, and so you should take a step back and think differently about the problem.

Another really important way to make sure that your model isn’t entirely broken is to verify your input generation data pipeline. A lot of the time you have this complex series of functions that takes a series of CSVs and turns it into arrays that go into a .predict, or a .fit, or a .train, and it’s really easy to make a mistake along the way. By making sure that the input to the model is what you expect, you can eliminate a whole class of bugs.

Example errors that we’ve made here include training on the wrong label, filtering many more user weeks than we expected, leading to a lot less training data, leading to worse model performance, and then insidious things, like at some point we had different pieces of code that had different understandings of what the ordering of conditions were, so some piece of code thought it was atrial fibrillation, diabetes, sleep apnea, another thought it was diabetes, sleep apnea, atrial fibrillation, and that made the model look a lot worse than it actually was. It’s hard to catch these sorts of things because these conditions are correlated with each other. The easiest way is just to write unit tests and make sure that what you put in is what you’re expecting.

Next thing to try to make sure your model isn’t entirely broken, is to make sure that you’re able to overfit on a small data set. Of course, overfit is normally a bad word, we don’t want to overfit, but, if you turn off regularization, dropout, batch norm, all that, you should be able to overfit on a small portion of your data set maybe a few percent, maybe a bit more. If you’re able to do this, you eliminate another class of errors, so there’s no normal loss curve for this, but in classification, you should expect to be able to get to around 0.99 AUC, or better.

If you can’t, it could mean there’s a number of things that could be going wrong here. Maybe your model architecture isn’t what you expect, maybe you’re doing some funny slicing and carries, and you accidentally dropped most of your input and your training on very low dimensional data. Maybe your input pipeline is bad and broken, and your unit tests didn’t catch it, or alternatively, your learning rate is just really far from what it should be. It could also be that your model just doesn’t have enough capacity, so, make it wider, maybe make it deeper, that’s also an important sign.

Examining Outputs

We’ve gone through some of the debugging techniques that help with a model that’s doing really poorly, but there are a lot more techniques to help figure out why a model isn’t doing as well as it could be doing, one of these is examining outputs. Telling a machine learning practitioner to examine their outputs is like telling someone to eat their vegetables. You do have to do it, it’s unfortunate, but you do. Here’s an example of an aggregate analysis that we ran on a DNN architecture, here, with this DNN, we initialized the LSTM state with some user metadata, like their age, sex, and BMI. And we wanted to understand the extent to which the model is just regressing over this metadata, and just using the metadata to compute its predictions, ignoring the sensor data that we’re providing it as well.

We graphed the DNN predictions alongside the logistic regression predictions. The DNN here takes in one week of user data at a time, in this graph, each dot is one week of user data. The answer to our initial question is no, the graph is not particularly linear. Clearly, the DNN is using extra signal above and beyond just the age, sex, and BMI, but there’s something else striking about this graph. It’s made up of vertical lines, and each line is actually formed by a single user with multiple weeks’ worth of data, so their LR prediction is unchanging because their age, sex, and BMI don’t change, but the DNN prediction varies over the weeks. Sometimes this ranges from 0.1 to 0.7, so, this is actually a really useful piece of information, it tells us that there’s some sort of improvement that can be made over and above just averaging our DNN predictions, which is what we were doing in the past. Perhaps our filtering isn’t strong enough and we were including weeks of user data where the user had worn their watch for a few hours, and the DNN wasn’t able to pick up enough of the signal. Alternatively, the DNN predictions were actually accurate, because these are sleep apnea predictions and sleep apnea can be sporadic. After a night of drinking, you probably have a lot more apnea events than otherwise.

In addition to an aggregate analysis like that, it’s really useful to take a look at examples of wins and losses. Sort your tuning set by absolute error, and take a look at a handful examples where the prediction is really far from the label, and look in both directions, look for false positives and look for false negatives. From these examples, come up with a hypothesis or a pattern of what’s causing these errors, and then take a look at the wins of the model and make sure this pattern doesn’t apply

You want to find a pattern that explains why your DNN isn’t working in certain cases; next, you can stratify your input set into cases that exhibit the pattern and those that don’t, and take a look at the accuracy metrics here, you should find that the pattern does explain some sort of breakage in the model. This technique isn’t specific to deep learning per se, recently, we were debugging a logistic regression model on hand engineered features, and we discovered a few loss patterns here. First, users who work out a lot during the week, throw off our estimations of daytime heart rate standard deviation. Second, users who travel a lot don’t fit our assumptions about sleep time and wake time, and time zones.

Predicting Synthetic Outputs

Next, I’m going to talk about one more technique that we call predicting synthetic outputs. As a first step in evaluating a model architecture to see if it’s suitable, we trained the DNN to predict a synthetic task using the heart rate and step count data, the task is just a deterministic function of the data. I’ll give you an example, we applied this when coming up with an architecture to predict sleep apnea. From existing literature, we knew that a feature standard deviation of daytime heart rate minus standard deviation of nighttime heart rate was particularly predictive of sleep apnea. We trained the DNN to predict this, but in order for a DNN to be able to predict this with low mean absolute error, it has to have at least a few properties.

First, it has to be able to distinguish day from night, this is not particularly obvious, in the past, we had the slide that showed this delta time channel that advanced in seconds. It’s possible that the DNN won’t know when daytime is, when nighttime is. The DNN also has to be able to remember data from several days in the past, this is a common problem with LSTMs, sometimes they can’t. This is a good sanity check to make sure that your architecture at least has the capability of learning what it has to learn.

In the past, we’ve also used this kind of synthetic task as a form of semi supervised training. As Avesh mentioned, we have over 500,000 daily users, so we have a lot more unlabeled heart rate data than we have labeled heart rate data. We can construct synthetic labels from the unlabeled data, train the network on these, and then use the learned weights, apart from the last layer, as the initial values of supervised training over the labels that we do have. Next, I’ll pass it off to Avesh [Singh], to talk about some more debugging techniques.

Visualizing Activations

Singh: Let’s talk about a pretty simple idea, which is, visualizing your model’s activations. This is the architecture slide that Mantas presented a few minutes ago, and we’re going to be examining the outputs of the convolutional layer. Actually, there are three convolutional layers, so we’re going to be looking at the output of the last one. Oftentimes you’ll see CNNs convolving over images, in our architecture, the input is not an image, it’s a time series data. Our temporal convolutional layers are learning functions to apply to pieces of time series data.

Let’s start by understanding a single neuron here, each individual neuron in this layer takes as input, a time series of data, and it applies some function, which returns another time series. In this diagram is a convolutional neuron with width four, it applies the transformation to its inputs that’s shown here, multiplied by a vector of learn rates W out of bias term B, passes through a non-linearity F, then out comes H, the hidden output of this neuron. The activation function we use here is a rectified linear unit or a value, I was talking about that chair with Mike, and he joked that ReLU is basically a pretentious name from Max, and that’s what it is. We’ve graphed the ReLU, aka the max function here, so we’re going to be visualizing the output H.

We obtained the output from this neuron for every time step, and it applies a convolution with stride one. We have 128 such neurons in this layer, so, ultimately, we’re going to end up with a matrix that looks like this. The rows here are neurons, and each row shows the activation of a single neuron on each time step of data. What we’re hoping we’ll notice is some cells are semantic properties, cells that light up when a user is sleeping, or working out, or anxious, or maybe we won’t, because, after all, the neurons form a distributed representation, and graphing each neuron’s output individually may be meaningless. I want to make sure this presentation is useful to your work, so, I’m going to actually show some code here, it’s a warning, there is some code ahead, this code uses Keras in TensorFlow.

The code should be easy to follow along, even if you’re used to PyTorch, like everything Python, it’s very readable. It’s actually very simple, layer output function here is a Keras function, it takes the input of layer zero and produces the output of the selected layer. We run this function on the actual input data, and we get back layer output, which is the time series output of each neuron, and that’s it. We got this idea from a Google brain paper that’s published at the Distill link below. If you’re interested in using this technique, I’d recommend that you take a look at that paper, it’s really cool and it’s very interactive, like all Distill papers.

We ran this code on the third convolutional layer for model for one week of user data, and we visualized the results in this graph. The shades of blue here show the value of the activations, so smaller values are light blue, larger values are dark blue. Question for you guys, do you notice anything strange about these activations?

Participant 1: No change with time.

Singh: Exactly, they don’t change with time. We would call these dead neurons, they output the same value regardless of their input. Why is this happening? We thought that this might have something to do with our activation function. Remember, we’re using a ReLU activation shown on the left, if we take the derivative of the ReLU, we get a piecewise function shown on the right. Notice that when the input is less than zero, the derivative is zero, the values will not be updated in gradient descent. Perhaps B is very negative in this function, causing the input to F to always be less than zero. That doesn’t really sound right, because if that were the case, then every neuron in this layer would output zero.

What’s more likely happening is that one of the earlier convolutional layers is always outputting zero, so each neuron in this layer just takes on the value of its bias term, because X is just zero, and we can use TensorBoard to verify this is what’s happening. If we were to pop the values of the first convolutional layer prior to the activation, we’d see a histogram like this. Notice that the pre-activation outputs here are all very negative. After passing through the ReLU, they’ll be set to zero, and then after that W times X will be zero, and W times X plus B will just take on the value of the bias term.

Full disclosure, this histogram isn’t actually from our model, we didn’t need to use TensorBoard to debug this problem, because it turns out this is a very common problem that many of you have probably heard of, and there’s also a common solution to it. We can make sure that the gradient always has a non-zero value by using a leaky ReLU. This function has a value of X when X is greater than zero, like the ReLU, but has a small fraction of X when X is less than or equal to zero, so even when the input is very negative, the grading is still propagated and the weights will still update.

We tried using the leaky ReLU for the convolutional layers, and, as you can see here, the activations for each cell now vary throughout time, but, you’ll notice that there are chunks of time here when most cells output zero values. These actually correspond with the times when the user turned their watch into workout mode, which means that the Apple watch is going to take a reading every five seconds, rather than every five minutes. This suggested our convolutional layers can’t really handle these variable time scales. One potential solution that we’ve thought of is basically to take advantage of the fact that the Apple watch operates in two timescales, either every five minutes, or every five seconds, we could process inputs of different time scales separately, and then merge the results prior to the final layer. We actually haven’t tried this yet, so I can’t tell you if this worked.


Instead, let’s talk about a different problem, this is a problem that our DNN suffered from. We call it amnesia, and I’m going to tell you how we created a metric to quantify the issue. Recall that our input is one week of user data. It consists of 4,096 heart rate or step count readings. It’s important that our DNN be able to track long term dependencies, for example, when we’re predicting diabetes, we care a lot about heart rate recovery time. This is the amount of time it takes to get back to your resting heart rate after a workout. In order to compute this, the DNN must be able to store and retrieve the time when you ended your workout. We want to answer this question, is the DNN able to learn long term dependencies, or does it have amnesia?

We can find an answer to this question using gradient analysis. Let’s examine the gradient of the output with respect to the input of each time step. In our architecture, a prediction is output at every time step and they’re later aggregated into a single score, so let’s look at the very last output. If the DNN has amnesia, we expect that the first time step has only a miniscule contribution to the final output, whereas the last time step should have a huge contribution. In other words, the gradient to the output respect to time step 4,095 will be much greater than the gradient of the output respect to the first time step. How much greater is much greater? This is really context specific, in our case, it would be fine but not ideal if the last time step is 10 times as important as the first, but, if the last time step is a billion times more important, then we have a clear problem in our architecture.

This is the idea, the question is, how do we compute these gradients? Once again, warning, there’s some code ahead. This function is a bit more involved in the last function, but don’t be intimidated, I’ll walk through it line by line. We’re going to be writing a function gradient output with respect to input, and it’s going to compute the gradient of the last time step with respect to each of the input time steps and each of the input channels, those are heart rate and step count.

Computing the gradient sounds complicated, but we can use TensorFlow to do the heavy lifting. TensorFlow has a built-in gradient function that’s of course used in weight updates. In Backprop, we update the trainable parameters by using the gradient of the loss with respect to the learned parameters, and here we use the same function to save the gradient of the output with respect to the input.

This function takes a model as input, and we also need to provide some data as the gradient is only a function until we actually run it on some data. The first thing we do is take the output at the last time step, which is 4,095. Also, let’s only look at the first output task, which is diabetes, just for simplicity. That means that output tensor here is going to be a vector of one num users.

We don’t actually care about the gradients per user, we care about the average gradient across users, so, we just take the mean value across all users to get output tensor sum. This is going to be a scalar, it’s the sum of the last output value for each user. Now let’s figure out how the input affects this value, let’s take the gradient of output Tensor sum with respect to the inputs. Inputs here is a 3D tensor of shape, num users by num time steps by num input channels. We’re driving a scalar with respect to a 3D tensor so our results can be a 3D tensor.

Here we want to average over all users, so we take the mean across axis zero, are resulting in a tensor of shape, num time steps by num input channels. For example, gradient tensors of 10 zero is going to be the derivative the last output with respect to the 10th input heart rate, so we’re almost there. We just convert this to a Keras function and execute the function that provided data, returning the resulting gradient. I hope that makes sense, if you’d like to take a closer look offline, we did tweet out the slides, and this is also a slightly abbreviated version of the code. You can find the full code at this tiny URL, which leads to a git gist, take a look and feel free to steal it. We use this code to compute the gradients of our output with respect to each time step of our input, and we’ve graphed that here.

One important note before we dive into this, is that our LSTM layers are bi-directional, meaning they receive their input in order and in reverse order. For this reason, we plot the gradient of the output at time step 2,048, the midpoint with respect to each time step. On the X axis here is the time step of the input we’re taking the gradient with respect to, and on the Y axis is the value of the gradient. For example, the gradient of the output at time step 2,048 with respect to time step about 2,048, is about 0.001. You’ll notice that the Y axis here is a log scale, so, if you were to compare the time step 2,048 with time step 2,500, you’ll see that the gradient has dropped by a factor of a million. The input at time steps far from 2,048 has pretty much zero effect on the output. We would say this architecture definitely suffers from amnesia so we’ve answered that question.

Actually, a few months after we produced this graph, we reran this analysis on a newer architecture, and we found that it no longer has amnesia. The inputs at each time step now have roughly the same impact on the output. How did we fix this? Well, we actually made a number of changes to the architecture during this time. The most likely fix is that we’re no longer running average pooling over the time series output of the LSTM layer, and instead we use the LSTMs output at the last time step that has input data. This problem may actually have been fixed by some other change, let this be a lesson to make incremental changes to your model and to measure their effects following the scientific method. This is especially important if you plan on giving a talk on debuggable deep learning.


To summarize, we walked you through the first steps in creating a DNN architecture, understanding your problem and your data. We talked about model debugging, talked about examining your outputs, predicting synthetic outputs, amnesia, and visualizing activations. Before we take questions, I can’t help but it include a plug, which you’ve heard in every talk I’m sure, which is that we are hiring, the ML team at Cardiogram only has three people right now, and we’re looking for ML engineers with some prior research experience, so, if you’re interested in using our data to build models to predict cardiovascular disease, please shoot us an email.

Questions and Answers

Participant 2: Regarding getting rid of the issue that your first input doesn’t matter for the last output, did you solve it with an hierarchical architecture, or with just going convolutional with all the temporal layers?

Singh: The architecture we ended up using for that had convolutional inputs, and then we had a recurrent layers on top. What we think solved this is that the recurrent layers, instead of outputting at every time step, only output at the last time step that has input data. Remember, our inputs are 4,096, but we may not actually have 4,096 readings from the user, so if we have like 3,000 readings, then we just take the LSTM output at times step 3,000 as the model prediction. That’s a good question, it makes sense that when we’re using average pooling, the impact of the output at a particular time step will be very local to that time step, so this solution makes sense, but, like I said, we haven’t scientifically proven that that was the reason.

Participant 3: The [inaudible 00:29:10] plot. Can you explain that? The one you prepared models logistic against that.

Matelis: This is a scatter plot of logistic regression predictions and DNN predictions for each user week worth of data. Our hypothesis was that, it’s possible that the DNN isn’t actually using any of the heart rates and step counts we’ve given it. If that were the case, this would be a straight line, but it’s not a straight line. That means that the DNN is actually using the heart rates and step counts to make its prediction, but we also find that it’s made up of these vertical lines, and so, that means that each vertical line is one user’s worth of data, and that means that, for a single user that has the same logistic regression prediction, the DNN prediction per week can vary quite a bit. That means that there’s more investigation into how to combine these multiple weeks that may be anywhere from 0.1 to 0.7 into one global risk score that answers the question, how likely is it that we think that this person has sleep apnea?

Participant 4: I’m curious how you guys got the labels for the data.

Singh: I think we’re in a unique opportunity or unique situation, because we create the app as well. We have these 500,000 users who use Cardiogram for various reasons, like tracking their health, or enrolling in habits, and we sporadically ask them questions like, “Do you have diabetes? Do any of your family members have diabetes?” things like that, and we use those answers as the labels. We can actually verify these with a different data set that we’ve built partnering with UC San Francisco, where they have a more formal study where they will send out pages and pages of surveys to a bunch of patients, and we can use those labels as well.

Participant 6: I really like the point about overfitting on a small data set. In my experience that finds most of the problems in the network architecture and your backprop layers. Have there been instances where that strategy didn’t work?

Singh: I actually missed the first part of the question. Could you repeat the beginning?

Participant 6: Overfitting on a small data set to figure out the problems with the network is a very, I think, useful trick. I’m curious if there are any instances where that didn’t find some bugs, which you mentioned where you had to use another technique?

Singh: Yes. I think that we have found is oftentimes overfitting is too easy a task, so a single layer LSTM can overfit on a hand engineered feature. It’s a very basic unit test.

Participant 6: Overfitting on the entire network, but small dataset, like the network that you’re going to deploy, but on a very small dataset?

Singh: There are two aspects here, overfitting on a hand engineer feature and overfitting the actual labels. I’m not sure if there’s any major detriments we found to making it a precursor that our models be able to overfit, as long as you apply regularization afterwards. Our process would be, you can have a large model without much regularization overfit in a smaller data set, and then apply some regularization, like decrease the width of the model or add L2, and then you can trade off the training and tuning accuracy on the full data set.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Presentation: The Evolution of Spotify Home Architecture

MMS Founder


Samuels: Imagine putting on your headphones and listening to your favorite song or discovering a new artist whose songs you just can’t live without, or hearing an album that just dropped by an artist that you already know and love and adding those songs into your rotation. These are the types of listening experiences that we at Spotify want to connect to our users. In this talk, we’ll discuss how we use technology to get there. I’m Emily, I’m a staff engineer at Spotify and I’ve been here for over five years now and I’ve been working on music recommendations.

Muppalla: I’m Anil, I’m a data engineer at Spotify, I’ve been there for about two and a half years. I also work in recommendations.

Samuels: At Spotify, we’re a music streaming service and our mission is to unlock the potential of human creativity by giving a million creative artists the opportunity to live off of their art and billions of fans the opportunity to enjoy and be inspired by it. At Spotify, we have 96 million subscribers, 207 million monthly active users, we’ve paid out over €10 billion to rights holders. There are over 40 million songs on our platform, over 3 billion playlists on our service, and we’re available in 79 markets.

You could see that we’re dealing with a large scale here and specifically, at Spotify, Anil and I, we work on the home tab. For those of you that aren’t familiar with the home tab, it looks something like this. There are a lot of different ways that you can discover music and listen to music on the home tab. There are things like your heavy rotation which could be playlists and albums and artists that you’ve been listening to a lot in the last month. There’s recently played, which is just the last few things that you’ve played on Spotify. For this user, we’ve also seen that they’re a fan of Linkin Park, we wanted to give them more ways to listen to that artist, so we have a playlist featuring Linkin Park and a Linkin Park radio station. There are also album picks, which are albums that we think this user might like, so there are a lot of different ways that you can listen to music on the home tab.

You’re going to hear some vocabulary in this talk and I just wanted to explain it to you all before we get into it. You might hear us say the word “card” in reference to playlists or albums or podcasts. It’s just any kind of individual playable item on Spotify and multiple cards can make up a shelf. In this case, the shelf is your heavy rotation and the cards are just the items that this person has listened to a lot recently in the last month.

In this talk, we’re going to go through how do we create those recommendations. Back in 2016, we started with a batch architecture, then in 2017, we moved over to a services architecture to hide the complexity in that batch architecture and it allowed us to be more reactive. Then in 2018 to today, we leveraged our move to the Google Cloud Platform and added in streaming pipelines into our architecture to build a product based on user activity.


Let me paint you a picture of Spotify back in 2016. We were running a lot of Hadoop jobs back in 2016. We had a big Hadoop cluster, one of the largest in Europe at the time, and we were managing our services and our databases in-house, so we were running a lot of things on-premise. In this batch architecture, we started off with some inputs that were the songs played logs. These are just all the songs that uses are listening to on our platform and we also use Word2Vec to figure out what songs are similar to other songs.

Many of you may already be familiar with Word2Vec, but I’ll just give you a short crash course. Word2Vec is a machine learning model that allows you to find vector representations of words. Given a corpus of text or a bunch of documents, you could see that words are similar to each other based on how often they co-occur amongst those different documents. How does that apply to Spotify? How do we use Word2Vec? Well, like I said, we have over three billion playlists on our platform, we use those playlists as the input into Word2Vec and we treat the tracks as if they are the words, and the playlists as if they are documents. We say that tracks that occur together amongst different playlists are going to be similar to each other, and they’ll be closer to each other in this vector space. You could see that 2Pac could be in one section of the vector space and Mozart could be in another section. Bach would be closer to Mozart than to 2Pac, because they are more similar.

Now, we have all the songs that users have played and we know what songs are similar to each other. We are able to apply this not to just songs, but also to albums, and to playlists, and artists as well, we could find similar artists and playlists to songs. We have these Hadoop jobs that take this as input and for a specific user, we’d look at what songs they played and then use Word2Vec to figure out what are similar playlists to those songs. From there, we could create a shelf, a playlist, that we think a user might like to listen to. Once we had that, we wrote it out into Cassandra, our data store.

So when a user opens up the home tab and wants to look at music recommendations, there’s this content management service that knows that these are the different types of shelves that we want to load on home, and it knows the services that it needs to call to get the content. We’ll call a service to fetch that shelf from home, we would go to Cassandra, get the content, and return that back up to the client. For all this talk, we’re just talking about mobile clients here.

Pros & Cons of the Batch Approach

What are the pros and cons of this approach? What are the advantages and disadvantages? Well, one advantage is there’s a low latency to load home. It’s pretty fast to go from the client to Cassandra and back, so a home can be pretty snappy when it loads. You can also fall back to old data if you fail to generate recommendations, if for some reason our Hadoop jobs fail, the cluster is messed up, we can’t run our jobs, they’re taking too long, we can always fall back to an older recommendation and just give the user something that’s a little bit stale, rather than nothing at all.

What are some of the drawbacks to this approach? For our users, the recommendations are updated only once every 24 hours, so we’re not able to react to what users are doing during the day in our app. We have to wait until they finish listening to their songs for the day and then process all of that data, it’s in the nature of how Hadoop jobs work and the amount of data that we’re processing that we can only react once every 24 hours to give a new music recommendation.

We also calculate recommendations for every user that’s listened to a song on Spotify, not just home users. There are a lot of ways to listen to music on Spotify, you can listen through the home tab, but you could also listen through your library, through search, through browse, through radio. There’s a lot of other ways to get music on our service and because of that, when we’re processing all of those songs played in that log, we’re processing more data than we need to, and we’re creating recommendations for users who may never even come to the home tab.

Experimentation in the system can be difficult. Let’s say you have a new idea for a shelf, a new way you want to make a recommendation to a user, there’s a lot in the system that you need to know about to be able to get to an A/B test. You have to know how to write a Hadoop job and actually, this time we were running Scalding, it’s a framework on top of MapReduce that Twitter wrote, so you have to know how to write a Scalding job in this case. You need to know how to process those songs played logs, you need to know how Word2Vec works and how to use that data, you need to know how to ingest data into Cassandra, and you also need to know how to stand up a service to fetch that data out of Cassandra. There’s a lot of institutional knowledge that you need to have in order to just try out a new hypothesis for an A/B test.

There’s also a lot of operational overhead needed to maintain Cassandra and Hadoop. At that time we were running our own Hadoop cluster, we had a team whose job it was just to make sure that that thing was running, making sure that it was getting the latest updates, making sure that we had enough nodes, and making sure that the resources were being shared fairly. When we first started out, it was kind of a free for all in terms of using this Hadoop cluster. If you had a resource-intensive job, you could make it so other people’s jobs wouldn’t be able to run, it would have to wait until those resources freed up.

We did a couple of strategies to try and mitigate this, we worked with resource pools. We had a resource pool for production jobs versus development jobs, so that way when you would try out your development jobs, they wouldn’t impact your ability to run production jobs and then even that wasn’t enough, and we had resource pools per team, so that one team’s jobs couldn’t impact another team’s jobs.

It was a lot of work just to maintain the system. We were also maintaining our own Cassandra clusters and Cassandra has a lot of knobs and things that you can tune to make it fit your use case. There was a lot of tweaking that we had to do to make sure that it worked for us, sometimes Cassandra would run a lot of compactions, and that would impact our ability to serve production traffic. That was something else that we had spend a lot of time making sure that it worked correctly.

We knew that this architecture, it was not going to be the best one for us. We knew that we could do better and we could improve upon this. Anil [Muppalla] is going to talk about how we moved to a more services architecture in 2017.


Muppalla: We started to adopt services in 2017, this is at the time where Spotify was investing and moving to GCP. We had moved our back-end infrastructure to GCP as well and we also realized that we wanted to be reactive to users and we wanted to give them the best possible recommendations as soon as they wanted. What did we do? We replaced the songs played data set and the Word2Vec data set with the corresponding services. The songs played service would return a bunch of songs for the user in real time, the Word2Vec service would give latest recommendations for a bunch of songs for that user.

In this system, when a user loads up the homepage, the Spotify client would make a request to the content management system. The CMS system is where we define which content the user should see and how we should render this content, and where the content lives. The CMS system would then talk to our create shelf for home service, this is the service that would fetch the songs played for that user from the songs played service, and get a bunch of playlist recommendations based on these songs for that user, package these playlists as cards into a shelf, and return that to the user, and the user would see it instantly on the homepage.

In this architecture, we made sure that was easy to write shelves. In the system, you could write a shelf as simple as writing a back-end service, so the more back end services you wrote, the more shelves there were and it would process messages as they came in.

Pros & Cons of the Services Approach

What are some of the pros and cons of this approach? We were updating recommendations at request time. Every time a user loads the homepage, we are either creating new recommendations or we are updating existing recommendations, so the user always see something that is relevant, that is more reactive. We are now calculating these recommendations only for the users that are using the home page, so we have reduced the number of calculations that we are doing, as opposed to what we used to do in the back system, we saw there an improvement.

The stack is further simplified because now the complexity of the Word2Vec model is hidden behind a service. All you need to know is how to interact with the service and it’s much easier to manage the system because it’s all just services. It’s easier to experiment; anybody, any developer can come in, can think up a content hypothesis, and just write a service, and you can just go from there. Since we moved our back-end infrastructure to Google, we saw that there was a decreased overhead in managing our systems because we moved away from our on-premise data centers.

What are some of the cons of this? You saw that as we added more and more content, as we added more and more recommendations for the users, it would take longer to load home because we are computing these recommendations at the request spot. We also saw that since we don’t store these recommendations anywhere, if for some reason the request failed, the user would just see nothing on the homepage, that’s a very bad experience.

Streaming ++ Services

In 2018, Spotify is investing heavily in moving the data stack also to Google Cloud. Today, they’re using a combination of streaming pipelines and services to compute recommendations on home that you see today. What’s the streaming pipeline? At Spotify, we write Google data for pipelines, for both batch and streaming use cases. We use Spotify Scio, it’s a Scala wrapper on an Apache Beam to write these pipelines. A streaming pipeline processes real-time data, real-time data is an unbounded stream of user events.

At Spotify, all user events are available to us as Google Pub/Sub topics, every interaction you make on the app is available to us as a real-time as Pubsub topics. In the streaming pipeline, you can perform aggregations based on this data by collecting them into time-based windows. You can apply operations like groupBy, countBy, and join to perform operations and then once you have these results, you can store them in other Google stores like Pubsub, BigQuery, Google Cloud Storage, and Bigtable.

For the Spotify home use case, we care about three signals, we care about songs played, follows, and hearts. A song is played signal is fired every time a user completes listening to a song, a follow signal is fired when a user either follows an artist or follows a playlist and a heart signal is fired every time a user hearts a track. All these events are available to us as individual Pub/Sub topics. We take these Pub/Sub topics, we write a streaming pipeline that consumes these messages from these Pubsub topics by making what we call subscriptions to them, so when the events are coming in, the streaming pipeline processes them instantly.

What do we do in the streaming pipeline? We perform aggregations on these messages that we get to determine how often we need to either create a new recommendation or update an existing recommendation for that user. Once we’ve decided the cadence that we should do it, we publish another message to another Pubsub topic saying, “Hey, this user is ready for a new recommendation.”

We then have this create shelf service, it’s a huge monolith service that has shelves and content hypothesis written as functions inside it. What this service does is it consumes from that create recommendation for this user topic and starts creating recommendations for that user. Each recommendation is nothing but a function here, we’ve reduced the complexity even further. For example, we take what we did in the services system, so every time an event comes in for that user, we fetch the songs for that user and we get playlist recommendations based on the songs, we neatly package it into our shelf for playlist recommendations, and we then write it to Google Bigtable. Google Bigtable is a highly scalable key value store which is very similar to Cassandra.

In this architecture, when a user loads the homepage, the client makes a request to the CMS and the CMS quickly fetches this content from Bigtable and shows it on home. We’ve reduced the complexity to write new shelves in the system by just adding as many functions as you want in that create shelf service.

Pros & Cons of the Streaming ++ Services Approach

What did we learn? What are the pros and cons of this approach? We are now updating recommendations based on user events. We are listening to the songs you have listened to, the artists you have followed, and the tracks you have hearted, and we make decisions based on that. We’ve separated out computation of recommendations and serving those recommendations in the system. Since we are sensitive to what the user is doing on the app, we are sure that we are giving the user fresher content. We are able to fall back to older recommendations if for some reason we are unable to compete new recommendations, if the streaming pipeline is having an incident or the services are down for some reason, the app is still responsive and you’re still getting recommendations. It’s easy to experiment now, because since we moved from services to this approach, all the developer has to do is just write a function for any content hypothesis that he or she wants and you’re ready to go.

What are some of the cons? Since we added the streaming pipelines into this ecosystem, the stack has just become a little bit more complex. You need to know how the streaming ecosystem works and have an awareness to how to deal with services to deal with any kind of issues. There’s more tuning that needs to be done in the system, when you process real-time events, you need to be aware of event spikes. I can tell you an incident that happened where, for some reason, we didn’t see any events coming from Google Pub/Sub, when that incident was resolved, we saw a huge spike of millions of events come in. We hadn’t accounted for this, and then that basically bombarded all the downstream services and we had to reduce the capacity so that we consumed those messages slowly, so we had to manage that. It’s very important to consider this use case when you’re building and using streaming pipelines, it’s very important to have guardrails, so you protect your ecosystem.

Debugging is more complicated, if there is an incident on your side, you have to know whether it’s the streaming pipeline, or your service, it’s the logic, or it is because Bigtable is having an issue. It’s just more pieces to consider when you’re debugging issues.

Lessons Learned and Takeaways

What did we learn through this evolution? From batch, we learned that because we store the recommendations in Cassandra, it’s easy to fall back to old ones. Since we’re just fetching from Cassandra, the latency to load home is really fast. The updates are slow because of the way we ran our batch jobs, it takes longer to update recommendations. The services, the updates were fast, we were responding to users as quickly as they were interacting with the app. Since we added more and more recommendations, the home was slow and since we didn’t store any recommendations, there was no fallback.

In the streaming and services combination, the updates were fast and frequent because we were listening to user events as they were happening. Since we stored the recommendations in Bigtable, we were able to load the home with low latency and we’re also able to fall back to old recommendations in case of incidents. One caveat is that we have to manage the balance between computing recommendations as quickly as we want, and the downstream load that we put on other systems that we depend on. Through the evolution of these three years, we managed to pick the best parts of each architecture and still be fast and relevant today.


What are some of the key takeaways? We’ve seen that since we moved to the managed infrastructure, we could focus more on the product, we could iterate faster on our content ideas, we spent less time managing our infrastructure. If you care about timeliness, if you want to react to events that are happening at the moment, I would suggest that we use streaming pipelines for this, please be aware of event spikes; it can really harm your ecosystem.

Through the evolution of the home architecture, we’ve optimized for developer productivity and ease of experimentation. We’ve moved from batch pipelines and in services, it was just easy to write a service. Then today, writing a shelf is as easy as writing a function, and then the rest is already taken care of.

Questions & Answers

Participant 1: In the new architecture services and streams, do you have versioning in the Bigtable? How do you support the fall back to the older versions?

Muppalla: We make sure that we have at least one version for each shelf that we have. Bigtable has GCS policy that you can set based on either time or the number of words it has to keep for each shelf. That way we make sure that there is at least one recommendation for that specific shelf for that user.

Participant 2: Very interesting talk, by the way. This is a question more related to how you categorize songs rather than recommendations, because I’m wondering with regards to radio edits, for example, it’s often the case that the metadata cannot be used to identify a song in a different way to the actual same song which is the original version. Do you have any advanced ways of figuring out whether something is a radio edit when you categorize it or things like that, so that you wouldn’t recommend something to a user as part of a playlist and the same song appearing three or four times as alternatives?

Samuels: I think I understand what you’re saying. We also have a concept of track IDs, but we also have recording IDs, that’s unique for each recording. That can help us make sure that we’re not recommending duplicate things if we are creating our playlist based off of those recording IDs instead. I’m not sure if that’s exactly what you mean.

Participant 2: Yes. I have a question about that, how do you guys do A/B testing in these architectures? Because A/B testing is quite important in terms of a recommendation system. Could you briefly elaborate how you do it?

Muppalla: We have an entire A/B system ecosystem already in place. They’re all exposed to us as back-end services, so when you write, say, a function here to test your hypothesis, every time this shelf is supposed to be curated, we make sure that this user is in that specific A/B test and we curate that. When we show that we are also sensitive to this user, we should see the right experiment.

Participant 3: I would like to know if the team set up did change with the architecture set up?

Samuels: Yes. It definitely changed since 2016.

Participant 3: No. I mean, did they change from batch to the microservices? Was it needed to change the team as well to be able to work with the new architecture?

Samuels: I don’t think that we had to change the team, the teams just change at Spotify over time just because we do a lot of reorgs there. We didn’t have to change the structure of our teams to be able to implement this. Our teams are usually set up where we have data engineers, ML engineers, back-end engineers all together working on them, so we have the full stack skillset.

Participant 4: The architecture looked like you guys are serving the recommendations real time. Can you talk about the training and if that’s instance-based as well, or if that’s batch, and how that works?

Samuels: The training for Word2Vec?

Participant 4: Yes, Word2Vec and how you do all that kind of stuff.

Samuels: Word2Vec gets trained, I believe, weekly across all of our systems. We use user vectors to figure out what to recommend for users, and those get updated daily, but the actual model itself gets trained weekly.

Participant 5: Do you guys use the lyrics of the songs or the tempo of the song to drive recommendations?

Samuels: No, not right now.

Participant 6: A related question. I think the [inaudible 00:27:32] challenge last year was Spotify, or I think it was 2017, that one was proposing a very different algorithm. In addition to Word2Vec, it had a whole bunch of things, the one that won the prize. I don’t know if that’s actually coming into production or not.

Samuels: I don’t know anything about that.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Sixth Annual MongoDB Innovation Award Winners Announced

MMS Founder

NEW YORK, June 18, 2019 /PRNewswire/ — MongoDB, Inc. (NASDAQ: MDB), the leading, modern, general purpose data platform, today announced the 11 winners of the 6th annual MongoDB Innovation Awards. The winners were honored at MongoDB World 2019 happening in New York, NYJune 17-19.


The MongoDB Innovation Awards celebrate organizations building the world’s most innovative applications and recognize companies with a transformative impact on their respective sectors. This year we received entries across dozens of industries, from cloud-native start-ups to Fortune Global 500 companies.

The overall “Innovator of the Year” was Marriott. Congratulations to all of our winners across the following categories:

MongoDB Innovation Award Winners

  • MongoDB Atlas: Compass
  • Enterprise: Marriott
  • Customer Experience: Royal Caribbean
  • Data-Driven Business: Continental
  • Internet of Things: Airobotics
  • Launch Fast: ANZ
  • Partner of the Year: IBM
  • Scale: Square Enix
  • The Savvy Startup: Corva
  • Certified Professional of the Year: Ronaldo Martinez
  • The William Zola Outstanding Contributor Award: Danielle Monteiro

“We have a great community of customers and partners from start-ups to some of the largest companies in the world who are using MongoDB to transform their businesses and industries,” said Dev Ittycheria, President and CEO of MongoDB. “Recognizing that data is the lifeblood of every organization, these award winners are using MongoDB to build products that make the world a better place for all of us — from improved travel experiences to better financial products to safer industrial facilities.”


About MongoDB

MongoDB is the leading modern, general purpose data platform, designed to unleash the power of software and data for developers and the applications they build. Headquartered in New York, MongoDB has more than 14,200 customers in over 100 countries. The MongoDB database platform has been downloaded over 70 million times and there have been more than one million MongoDB University registrations.

Investor Relations
Brian Denyeau
ICR for MongoDB

Media Relations
Mark Wheeler
866-237-8815 x7186

Cision View original content to download multimedia:http://www.prnewswire.com/news-releases/sixth-annual-mongodb-innovation-award-winners-announced-300870615.html


Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

MongoDB 4.2 Adds Distributed Transactions, Field Level Encryption, Updated Kubernetes Operator and More to the Leading, Modern, General Purpose Database

MMS Founder

NEW YORK, June 18, 2019 /PRNewswire/ — MongoDB, Inc. (NASDAQ: MDB), the leading, modern, general purpose data platform, today announced the latest version of its core database, MongoDB 4.2. Key features such as distributed transactions, field level encryption and an updated Kubernetes Operator raise MongoDB’s established reputation for supporting a wide variety of use cases for thousands of customers, which range from innovative cloud-native startups to the largest global enterprises.


Distributed transactions, which extend multi-document ACID guarantees from replica sets to sharded clusters, give customers an easier way to address a complete range of use cases by enforcing transactional guarantees across highly scaled, global applications. Field Level Encryption enables users to have encrypted fields on the server—stored in-memory, in system logs, at-rest and in backups—which are rendered as ciphertext, making them unreadable to any party who does not have client access or the keys necessary to decrypt the data. The Kubernetes control plane allows users to have full management over their MongoDB deployment for a consistent experience anywhere, including on-premises infrastructure, private and hybrid cloud, or public cloud.

“When we founded MongoDB, we wanted to give developers an easier way to work with data – wherever it lived in the stack,” said Eliot Horowitz, CTO and co-founder, MongoDB. “To be able to provide great new features that will make them more productive so they can spend less time wrestling with data and more time building great applications is extremely gratifying. Most importantly, these features work and feel like the tools they are already used to so they will experience a vastly improved database experience with a short learning curve.”

Distributed Transactions

MongoDB introduced multi-document ACID transactions in the release of MongoDB 4.0, providing a consistent view of data across replica sets and enforcing all-or-nothing execution to maintain data integrity. Combined with the power of the document model and its distributed systems architecture, developers can easily modernize existing legacy apps and build new transactional services. Distributed Transactions maintain an identical syntax to the transactions introduced in MongoDB 4.0. They are multi-statement and enforce snapshot isolation, making them familiar to any developer with prior transaction experience. The API and implementation is consistent whether executing transactions across documents, collections and databases in a replica set, or across a sharded cluster. Full atomicity is maintained – if a transaction fails to commit on one shard, it will abort on all participant shards.

The Next Level in Enterprise-Grade Security

MongoDB 4.2’s implementation of Field Level Encryption is a different and more comprehensive approach than column encryption used in legacy, relational databases. It is totally separated from the database, transparent to the server and handled exclusively within the MongoDB drivers on the client. Most databases handle encryption on the server-side, which means data is still accessible to administrators who have access to the database instance itself, even if they have no client access privileges. Field Level Encryption changes that.

Advantages of MongoDB Field Level Encryption include:

  • Automatic, transparent encryption: Application code can run unmodified for most database read and write operations. Other client-side approaches require developers to modify their query code to use the explicit encryption functions and methods in a language SDK.
  • Separation of duties: System administrators who traditionally have access to operating systems, the database server, logs, and backups cannot read encrypted data unless explicitly given client access along with the keys necessary to decrypt the data.
  • Regulatory Compliance: Facilitate compliance with “right to be forgotten” requests in privacy regulations such as GDPR – simply destroy the customer key and the associated personal data is rendered useless.

“We partnered with two of the world’s leading authorities on database cryptography, including a co-author of the IETF Network Working Group Draft on Authenticated AES encryption, to develop Field Level Encryption,” said Lena Smart, CISO, MongoDB. “Drawn from academia and industry, these teams have provided expert guidance on MongoDB’s Field Level Encryption design and reviewed the Field Level Encryption software implementation.”

Full control from a single Kubernetes plane

Users can now manage their MongoDB deployment from a single Kubernetes control plane. On self-managed infrastructure – whether on-premises or in the cloud – Kubernetes users can use the MongoDB Enterprise Operator for Kubernetes and MongoDB Ops Manager to automate and manage MongoDB clusters. Developers can use the operator with upstream Kubernetes, or with popular distributions such as Red Hat OpenShift and Pivotal Container Service (PKS).

Further Resources:

About MongoDB

MongoDB is the leading modern, general purpose data platform, designed to unleash the power of software and data for developers and the applications they build. Headquartered in New York, MongoDB has more than 14,200 customers in over 100 countries. The MongoDB database platform has been downloaded over 70 million times and there have been more than one million MongoDB University registrations.

Forward-Looking Statements

This press release includes certain “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, or the Securities Act, and Section 21E of the Securities Exchange Act of 1934, as amended, including statements concerning the anticipated benefits of new product features. These forward-looking statements include, but are not limited to, plans, objectives, expectations and intentions and other statements contained in this press release that are not historical facts and statements identified by words such as “anticipate,” “believe,” “continue,” “could,” “estimate,” “expect,” “intend,” “may,” “plan,” “project,” “will,” “would” or the negative or plural of these words or similar expressions or variations. These forward-looking statements reflect our current views about our plans, intentions, expectations, strategies and prospects, which are based on the information currently available to us and on assumptions we have made. Although we believe that our plans, intentions, expectations, strategies and prospects as reflected in or suggested by those forward-looking statements are reasonable, we can give no assurance that the plans, intentions, expectations or strategies will be attained or achieved. Furthermore, actual results may differ materially from those described in the forward-looking statements and are subject to a variety of assumptions, uncertainties, risks and factors that are beyond our control including, without limitation: our limited operating history; our history of losses; failure of our database platform to satisfy customer demands; the effects of increased competition; our investments in new products and our ability to introduce new features, services or enhancements; our ability to effectively expand our sales and marketing organization; our ability to continue to build and maintain credibility with the developer community; our ability to add new customers or increase sales to our existing customers; our ability to maintain, protect, enforce and enhance our intellectual property; the growth and expansion of the market for database products and our ability to penetrate that market; our ability to maintain the security of our software and adequately address privacy concerns; our ability to manage our growth effectively and successfully recruit and retain highly-qualified personnel; the price volatility of our common stock; and those risks detailed from time-to-time under the caption “Risk Factors” and elsewhere in our Securities and Exchange Commission filings and reports, including our Annual Report on Form 10-K filed on April 1, 2019 and our Quarterly Report on Form 10-Q filed on June 7, 2019, as well as future filings and reports by us. Except as required by law, we undertake no duty or obligation to update any forward-looking statements contained in this release as a result of new information, future events, changes in expectations or otherwise.

Investor Relations
Brian Denyeau
ICR for MongoDB

Media Relations
Mark Wheeler
866-237-8815 x7186

Cision View original content to download multimedia:http://www.prnewswire.com/news-releases/mongodb-4-2-adds-distributed-transactions-field-level-encryption-updated-kubernetes-operator-and-more-to-the-leading-modern-general-purpose-database-300870262.html

SOURCE MongoDB, Inc.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.