Ionic CLI V5 Now With Ionic React Beta Support

MMS Founder

The Ionic Framework team recently released the fifth major iteration of the Ionic CLI. Ionic CLI v5 features Ionic React support (beta), thus allowing developers to write Ionic applications with the React JavaScript framework and Ionic UI components. Ionic CLI v5 also comes with features aiming at a better developer experience and miscellaneous bug fixes.

The Ionic CLI allows to quickly create projects that will use Ionic React, as follows. First the Command Line Interface (CLI) must be updated to its latest version:

npm install -g ionic@latest

Then the project can be created:

ionic start myApp blank --type=react
cd myApp

While the Ionic CLI v5 release is new, community articles have already been published to illustrate how to build applications with Ionic React and the new Command Line Interface (CLI). One such article illustrates usage of the Ionic CLI in connection with Ionic 4 (note that the Ionic CLI follows a separate versioning pattern than the Ionic Framework), React, and the React Router. After installing the necessary dependencies:

npm i @ionic/core @ionic/react react-router react-router-dom

and assuming the pages src/pages/HomePage.js and src/pages/BlogPage.js have been defined, the example application can be written as follows:

import React, { Component } from 'react';
import { BrowserRouter as Router, Route } from 'react-router-dom';

import { IonApp, IonPage, IonRouterOutlet } from '@ionic/react';

import HomePage from './pages/HomePage';
import BlogPage from './pages/BlogPage';

import './App.css';

class App extends Component {
  render() {
    return (
        <div className="App">
            <IonPage id="main">
                <Route exact path="/" component={HomePage} />
                <Route path="/blog" component={BlogPage} />

export default App;

Ionic CLI v5 also features the platform-independent native-run command to simplify deployment to simulators and real devices. native-run is written in JavaScript and works with both Cordova and Capacitor.

On the other hand, cordova-res, also added in Ionic CLI v5, is specific to the Apache Cordova mobile application development framework. cordova-res is used to generate Cordova resources locally for the ionic cordova resources. cordova-res improves developer experience, as developers no longer need an Ionic account to generate splash screens and icons, and resources may be generated locally.

Ionic CLI v5 also comes with breaking changes documented in the release note, together with the upgrade path.

Ionic is a free open source component library for building applications that run on iOS, Android, Electron, and the Web, using standard web technologies (HTML, CSS, JavaScript). Ionic additionally comes with a command line interface (CLI) facilitating the creation of new applications, as well as deploying to the miscellaneous platforms which it supports.

Ionic CLI is available under the MIT license. Contributions are welcome via the Ionic CLI contribution guidelines and contributors should follow the corresponding code of conduct.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Descriptive vs. Inferential Statistics in One Picture

MMS Founder

This simple picture shows the differences between descriptive statistics and Inferential statistics. 

More information about Descriptive and Inferential Statistics:

What is “Descriptive Statistics”?

What is “Inferential Statistics”?

Central Tendency

Measures of Variation

Hypothesis Testing

Parameter Estimation

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Presentation: MOOtopia – Evolving the Spotify Model at MOO

MMS Founder


Claire Donald is Director of Agile Delivery Coaching at MOO. Technical Leader and Coach with almost 20 years experience managing and leading IT teams using a variety of different methods. He has worked in a multitude of different environments, in both the private and public sector.

About the conference

Many thanks for attending Aginext 2019, it has been amazing! We are now processing all your feedback and preparing the 2020 edition of Aginext the 19/20 March 2020. We will have a new website in a few Month but for now we have made Blind Tickets available on Eventbrite, so you can book today for Aginext 2020.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

A Strange Family of Statistical Distributions

MMS Founder

I introduce here a family of very peculiar statistical distributions governed by two parameters: p, a real number in [0, 1], and b, an integer > 1. These distributions were discovered by solving the functional equation, corresponding to b = 2. 

Here f(x) is the density attached to that distribution. The support domain for x is also [0, 1]. This type of distribution appears in the following context. 

Let Z be an irrational number in [0, 1] (called seed) and consider the sequence x(n) = {b^n Z}. Here the brackets represent the fractional part function. In particular, INT(b x(n)) is the n-th digit of Z in base bThe values x(n) are distributed in a certain way due to the ergodicity of the underlying process. The density associated with this distribution is the function f, and for the immense majority of seeds Zthat density is uniform on [0, 1]. Seeds producing the uniform density are sometimes called normal numbers; their digit distribution is also uniform.

However, the functional equation 2f(x) = f(x/2) + f((1+x)/2) may have plenty of other solutions. Such solutions are called non-standard solutions. The set of seeds producing non-standard solutions is known to have Lebesgue measure zero, but there are infinitely many such seeds. All rational seeds are, but they produce a discrete distribution. Thus their density is of the discrete type. We are interested here in a non-discrete solution. 

1. Example with p = 0.75 and b = 2

The uniform distribution corresponds to p = 0.5. Below is a non-standard density satisfying the requirements. Actually, the plot below represents its percentile distribution. It was produced with a seed Z in [0,1] built as follows: the n-th binary digit of Z is 1 if Rand(n)  <  p, and 0 otherwise, using a pseudo random number generator. Here p = 0.75. Note that P.25 = 0.5 and corresponds to a dip in the chart below (P.25 denotes the 25-th percentile.) Dips are everywhere, only the big ones are visible. By contrast, the percentile distribution for the uniform (standard) case p = 0.5 is a straight line, with no dips.

2. General solution

The functional equation is a bit more complicated if b is not equal to 2. It becomes

Using the construction mechanism outlined in the previous section to generate a non-standard seed Z (sometimes called a non-normal number or bad seed), it is clear that x(n) is a random variable. We also havewhere b is the base and d(n+k) is the (n+k)-th digit of the seed Z in base b.  This formula is very useful for computations. Note that Z = x(0). Furthermore, by construction, these digits are identically and independently distributed with a Bernouilli distribution of parameter p. Thus, using the convolution theorem, the characteristic function for the seed Z is

Take the derivative of the inverse Fourier transform (see section inverse formula here) and you obtain

If p = 0.5 and b = 2 we are back to the uniform case. Otherwise the solution is quite special: the density f is nowhere continuous it seems. See picture below for p = 0.55 and b = 2.

Now we should prove that this case is ergodic, for the functional equation to apply. I also tried to check with some sampled values of x to see whether 2f(x) = f(x/2) + f((1+x)/2), but the function being discontinuous everywhere, and since I got its value approximated probably to no more than two decimals, it is not easy.

3. Applications, properties and data

The distribution attached to this type of density has the following moments:

  • Expectation: p / (b – 1).
  • Variance: p(1 – p) / (b^2 – 1).

Why does f(x) must satisfy the functional equation discussed above? This a consequence of the fact that the underlying distribution is the equilibrium distribution for the sequence x(n) = {b x(n-1) } = {b^n Z}. In particular, the equilibrium distribution is solution to some stochastic integral equation P(X < x) = P({b X}  <  x).  For details, see my book Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems available here, see pages 65-66.

Potential applications are found in cryptography, Fintech (stock market modeling), Bitcoin, number theory, random number generation, benchmarking statistical tests (see here) and even gaming (see here.) However, the most interesting application is probably to gain insights about how non-normal numbers look like, especially their chaotic nature. It is a fundamental tool to help solve one of the most intriguing mathematical conjectures of all times (yet unsolved): are the digits of standard constants such as Pi or SQRT(2) uniformly distributed or not?

The charts featured here, as well as the underlying computations, were all produced in Excel. You can download the spreadsheet here. In particular, a very efficient algorithm is used to produce (say) one million digits of Z, and to compute one million successive values of x(n) each with a precision of 14 decimals. You can play interactively with the parameters b and p in the spreadsheet, and even try non-integer values of b (I suggest you try b = 1.5 and p = 0.5). If b  < 2 is not an integer, the functional equation is more complicated: it is found in section 2.1 in this article

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Presentation: Evoking Magic Realism with Augmented Reality Technology

MMS Founder


Hu: The talk today will be about evoking magic realism with augmented reality. Just to start, quick intro, right now, I’m the Director of Engineering and Head of the AR platform at Niantic. Prior to that, I was a co-founder and CTO for Escher Reality which was a startup and a Y Combinator batch from 2017. Our company got acquired by Niantic, so now we’re there. I’ve been working in AR for roughly three, four years. Prior to that, I was building large scale machine learning and computer vision systems for different products since 2012, and I started as a data scientist back at Intel Labs and at OnCue Television, so experience doing recommender systems, information retrieval, and image understanding for television.

What is here is a list of book covers of different authors. On the left is a book by Gabriel Garcia Marquez who is a Nobel Prize laureate of literature. He’s from Columbia, this book is about town in Macondo that tells a lot of history about Colombia, it tells a just a day-to-day life of a family and it talks about a journey about solitude. The interesting thing about this book is that it tells a lot of the story with a journalistic and mundane approach, but there’s a lot of magic things that happen that to the character seems to be normal. For example, there’s ghosts and flying carpets that age and that just seems normal to the characters there.

Another book which you might be familiar with is “The Metamorphosis” by Franz Kafka, it’s about the story of a young man workaholic that suddenly one day, transform into an insect and then goes through this whole philosophical meditation on his perception of reality and society. The interesting thing about this book too is the way the story is told is also this transformation in an insect is pretty weird, but in the story, it just makes it sound as normal, just as another thing that just happened, you wake up and you suddenly become an insect, and that’s it.

This other book also, Haruki Murakami, he’s this Japanese author with this book, “The Wind-Up Bird Chronicle.” It’s about the story of Toru Okada, it’s that kind of a detective story about finding his wife’s missing cat. That just sounds very normal story as well, but what’s interesting is that as the story unravels, he start finding a lot of fantastical things that happen underneath Tokyo and, again, told in a very matter-of-fact storytelling approach which makes it seem that all these occurrences could happen in our life, but they seem to be outlandish.

This last book which also became a movie, “Like Water for Chocolate” by Laura Esquivel, is a story about a woman who’s forbidden to marry, who pours magic into making chocolate. Then again, the drama in the story is very commonplace.

What do all these books and authors have in common? All of them follow the tradition of this literature movement called magic realism, and that’s not science fiction or fantasy. What fantasy is that the main factor is that it takes world entirely different from our own, like “Game of Thrones,” or “Harry Potter,” or “Star Wars”, is a completely different world that doesn’t seem like it would exist here. Science fiction is also not magic realism because described as an altered world with elements where science makes advances that are crazy.

The interesting thing about magic realism is that its fiction that takes place in our world. They could happen to your neighbor as the stories I tell or it’s a story that your grandma could be telling you. It has an interesting introduction of a magical element and the magical element a lot of times fits as an element that elevates and tells the stories, or as criticism about something that’s happening in society, like in Kafka with “Metamorphosis” about criticism about society and how isolation worked.

A Magic Insight into Reality

How does this relate? Describing a bit more this magic insight into reality is this concept of telling the story always with a deadpan expression like journalism that, “Yes, there was a flying carpet and so what,” kind of a thing. It is this way of telling a story in a matter-of-fact including a lot of fantastical elements into just day-to-day world, but note that imagination is used to enrich reality not to escape it like sci-fi or fantasy. It’s really still grounded in reality because if it were just magic, it would be pure whimsy. Rooting it, in reality, builds this layer where illuminates and grows and makes the reality as we experience it more beautiful in unexpected ways.

Main thing about the way a lot of these stories and books tell the story is telling it in such way that captivates something in the real world that is not possible, but making it so believable that you do believe that magic carpets can fly or ghosts can age, and they just seem commonplace because the way they’re told.

How does this relate to augmented reality? My definition for augmented reality is this concept of a matter-of-fact inclusion of fantastical elements into the physical world. The analogy to magic would be the digital, would be digital characters that get displays and seamlessly blend in the physical world while they still follow a lot of the laws of physics and human perception of how we understand things to behave, but with a bit of sprinkle of a fantastical aspect, because with digital you could represent information in a lot more interesting ways.

Augmenting Reality

How do you build such reality? What are some of those components? At Niantic, we have some principles for creating magical realism in AR and these follow a lot of the company missions that we have with one, exploration, more of the world around us. What that means is that there’s a lot of stories and adventures everywhere that are just waiting to be discovered. Some of our games like, Ingress or Pokémon Go take you in an adventure in your neighborhood that you get to find out things that you didn’t know about some historical landmark and that’s something that we achieved to create. That’s a different kind of AR that maybe we have the conception of being just the digital visual, but that’s another aspect.

Exercise is an aspect that we all need a bit of a nudge to move and kind of being embedded, and following the physics of it, and following the natural rhythms of your body, add to the suspense of this belief for AR. The other aspect is social, we are social animals by default. Creating experiences that you’re able to engage in the real world with our friends and a way that we can make new friends, not just friends like in a social network sense, but someone real in a sense that you build a connection with them.

With a lot of our journey throughout Niantic, we’ve been delighted to hear a lot from people around the world who found not only rewarding family, friendly, and intragenerational entertainment, but they found benefits among all these things that were unexpected from our games. Diving deeper into each of these three elements on how we built into the games for AR, this concept of exploration. In the world of AR, we wanted to be a matter of fact and follow reality. As part of reality, the diversity in the world, there are things like weather, it rains, it’s shiny, or snows, etc. so we want those to be reflected in the game. In terms of that for AR, a feature in Pokemon Go, when it rains you see it right displayed in the digital screen in your phone and following that.

The other aspect of building virtual worlds, we want to push us to exercise and move. Capitalizing a bit on our natural rhythm on how we move is this concept that the world as we live in is the game board. You walk through different stores, so as you take a stroll through San Francisco, you’re actually moving your digital avatar in the game with it. That’s other connection where we’re blending this digital with the physical and taking things that we take it for granted that work also in digital.

This other one about shared on social is this concept that the digital world should obey similar rules as in the real world to maintain the suspense of this belief. What I’m going to show you in the next slide is a demo of experience that we built in AR called Codename Neon. It’s a technology that shows a group of people playing together where they’re collecting pellets in the world to connect energy orbs and then they’re having a game of tag and shooting energy balls to each other. The thing that’s interesting to maintain physics and maintain the state is that if I collect that energy orb that’s here, then my friend or them, they should not see it because it’s a shared resource. If I shoot this energy orb, everyone should see it because reality as we experience it is consistent with time and space consistency.

Just to play a little bit the demo here, you can see that when you get a bit of experience, people just have fun. We’re consistent with collecting those white pellets and then going and targeting your friends. If you’re mad at them, maybe you can have a friendly fight in AR, it’s less physical, more digital.

Making the Digital Believable

What does it take to make the digital believable, taking against this learning from the literature masters on magic realism because they’re able to weave in all these stories so believable that when you read all these stories, you’re so absorbed? What does it take to do it, to do that now for AR?

As part of that, at Niantic we’ve built this real world platform which is a set of software that all these different games are built on top where we enable consistent game play, social, mapping, advance AR, and a lot of this underneath is powering our games like Ingress, Pokemon Go, and soon to launch, Harry Potter: Wizards Unite, which will be exciting sometime later this year. In order to do that, this talk will specifically mostly focus on the AR component, and going through that is starting a bit of a brief on AR technology and the big building blocks that it takes to create AR because AR is actually a very interdisciplinary field of many fields in computer science.

In order to make good AR, first, you need to understand the world in order to augment it. What that means is you have to feed on all the different sensors mainly, let’s say from the camera. You need to start understanding and making a sense of what it means, the semantics and in terms of labeling what is it that you see. Besides that, this is the more visual one, I’ll talk a bit later about more understanding world in terms of geometry, but that will be the next slide. Besides that, there’s that whole field of understanding the world is the field of computer vision.

This other one is the need for visuals, in order to display these believable digital objects, characters, they need to blend with the real world. You need to create characters in 3D that makes sense. There’s just a whole world of graphics, 3D animation, and all of that. The other bucket is, you have all these components, “Ok. I understand the world. Now, I can render some characters,” and then how do you create? The reason why augmented reality is such a good fit for gaming is because the way game developers have been creating since a long time ago has been the natural way of creating worlds and experiences in 3D. These are tools like Unity and Real, and a couple of other ones, building experiences.

Diving deeper a little bit into more this concept of understanding the world because we’re going to go a bit more detail into those is besides the semantics which I think, everyone gets what that means is, I see something and the sky is blue, it’s a block of blue, it’s the sky. This concept is a bit more abstract, understanding the world more, think of it at a lower level, kind of geometry. It’s more understanding that this shape of blob for things in the world is more that there’s something here so I should not collide, something here that’s like a plane, something there that is like a blob, but is really a chair. You don’t know whether it’s a chair, but it’s something that you should not collide with.

A lot of this is that taking from the field in robotics with algorithms like SLAM and Vince where you take different camera positions where you’re able to triangulate and build a 3D understanding through stereo through time where you do feature extraction and a couple of other components to build this representation of the world that’s used for AR. Why is it needed for AR? Because in order to display the digital characters, you need to pin them to the world. How do you pin them to the world? You have to know roughly your coherent systems, and SLAM is the ability to build that.

AR Systems for Human Perception

We talked about some of the components that make up AR. There’s another component at the end who’s consuming all these data, humans. You have to build AR systems for human response, and this is where a lot of the algorithms I mentioned earlier are similar to actually self-driving cars. This is where it diverges a little bit because AR systems are meant to be very interactive for humans. A couple of the concepts here to keep in mind is there’s this famous study in the ’80s that talked about the Miller response-time test, that talked about the response from a computer system, what does it feel good to humans and this it’s tied a lot to kind of the brain and neurological signals, and how fast a signal comes from the world, and you’re able to interpret those. In summary, what that whole study says is that anything that’s about less than 100 milliseconds, it feels instantaneous or in real time, or it feels good.

We want to be in that bucket. Anything that’s, like, around a second is fast enough but it’s not instant, this is fast enough to get a response, but good enough. If it’s is beyond 10 seconds, maybe when you look at the loading bar, so you lose the users, the page is gone, it’s like, “Ok. This is not working.” Based on that, you want to design systems that have the budget that has to be less than 100 milliseconds. That’s key in order to create the suspense of disbelief or making things a matter-of-fact that just work in the physical world needs to work with our senses.

The other part, the other on the right side, is just understanding more since we’re focusing a bit more on the visual SLAM, is understanding more how long it takes the brain to process images. There’s this study by Rayner about eye movement and visual encoding during a scene perception from psychology that tells that the retina needs to see an image for about 80 milliseconds before that image is fully registered and understands what it is. Just another number, they have a bunch of numbers, another one is when you’re reading a text, that number is about 50 to 60 milliseconds because it’s less information entropy in a sense because we’re just seeing a random image at a time, it’s a lot more data bandwidth than just reading text. Reading text is more text, I guess, and it’s more conceptual, that’s a bit of things to understand where we’re going to target.

Still taking even less than 100 millisecond, less than 80 millisecond, then there’s this other constraint as well where a lot of rendering in video games is about 30 to 60 frames per second, so that gets you to a 30 to 60 milliseconds response latency.

Need for Speed as A “Matter-of-Fact”

That brings this big kind of assumption and design constraint for building augmented reality system is that the need for speed is what will create the matter-of-fact for augmented reality where we can create this storytelling like the authors that I mentioned. Building believable AR. we need to really build AR systems that are really fast with the other constraint that they need to work on this which there are tiny computation, they don’t have that much battery, and the camera is not so great, and a lot of sensors they’re kind of are more cheap as opposed to let’s say, other robots or self-driving car systems where you could afford to maybe put a GPU in the trunk of your car. You don’t want that for this because in the future when we move to headsets, you don’t want to burn people’s hair. There’s a bit of that and so, it has to be really optimized for the world of low computation and low power, and very fast response.

How do we do that? The approach that we’ve taken in our company is taking two tenants in here in terms of AR system design principles. One is super efficient networking where you want as much, certain things can be offloaded to the server, but some not. If you do, even for a shared multiplayer AR, you want it to be real-time as much as possible by default to try to achieve that small response time. The other aspect is concurrent programming, I used to work at Intel, I can say that Moore’s law is kind of that, but the world is moving more towards many core rather than a single big core that’s becoming more and faster, so we have to get comfortable and taking advantage of this world where there are more cores to do processing rather than big fat cores.

An example is the processor for iPhone that got released, or even some Samsung devices, the design of four big cores and four small cores. The four small cores are for fast simple computation, the four big fat cores are more for expansive one, so you have to be smarter how do you do that and moving forward, this is just going to increase. It’s looking more a little bit of how GPUs get at the end of extreme where you have thousands of cores which is why you can do a lot of amazing things with deep learning, which is a lot of them because we are being better at paralyzing the core, a lot of the computation there.

Some of the approaches that I’ll just describe about concurrent programming have to do with lottery programming and the concept that I’m not sure if people here are familiar with actor models, and we’ll go into detail about those. First, networking. Speeding up networking, life is in real time. You don’t have loading bars when you’re talking to your friend and you want the AR experience with characters that interact with you to also not have loading bars, they should feel natural.

One of the current constraints on current traditional cloud architecture for services is that a lot of cloud-based applications where your cloud and a lot of the machines are hosted somewhere far away. If you’re in Amazon typically, Virginia or maybe Oregon, the round trip for those in latencies when you’re paying somewhere around the world would be in the hundreds of milliseconds. That alone already does not make a cut of the human perception less than 100 millisecond number that I was mentioning, if you believe the number. If you do the hundreds of milliseconds and you’re trying to render the AR position of your friend, you’re going to render roughly in single digit frames per second which is pretty bad. It’s like watching a really, really, really bad video that does not load from the internet, we don’t want that.

How do we achieve something that can run at 30 frames per second? I’m going to show you first before I tell you how it works that we actually got it working. What you’re going to see here is actual game footage of a multiplayer AR puzzle solving game where actual players are cloaked in an avatar. Look, they’re cloaked in an avatar as if these were right to pre-planned path, but it’s really just following the current position of the phone. We’re not doing anything special to track the humans or anything. It’s really just following the position of the phone, and because it is doing it with such a low latency that these avatars are cloaking the user quite well and it creates this amazing effect that creates a team and game design team came up with which just looks so fun.

This is rendering actually at 60 frames per second because the iPhone, we got this working at 60 frames per second. How do we do that? In terms of your design choices for networking, there’s two access. You could do quote, unquote, real-time networking or non-real time. The other one is sending reliable messages over your network stack or unreliable messages. If you choose this category of reliable messages, that’s the world of the web, which the whole world has done a lot of advances and built very fantastic tools for the web world with HTTP and REST because we want documents and transactions to not be lossy.

WebSockets is an attempt to try to get to a bit more real time, but it’s still not good enough. It’s an attempt of we got so good at doing HTTP that we want to try to continue doing that for the new world, but for AR that’s not enough because the packages there with the header for HTTP are too heavy and maybe you could do something better.

In the world of unreliable messages, in real-time you do have UDP which is actually old technology from the UNIX socket world. It’s not that new, but it actually is pretty fast because it gets rid of a lot of the assumptions of TCP with the handshake and in coordination with that. Just that, on the other quarter of unreliable and non-real time, you don’t want to be like the U.S. postal office. It has its uses, but not for the designing system. I was trying to figure out what you put there, but all I could think of is the U.S. postal office.

How do we design something that is the corner we want to be? The fact that there is actually there’s no magic solution, that there’s some new network protocol or something magical out there, is really carefully building a combination of both. What we’ve done actually is build a real-time peer to peer for AR technology with our own network protocols. Think of it like WebSockets, but better, not so heavy, but optimized for packages for computer vision. The cloud world, if you were trying to do some of this basically, phone once, that’s their current position and it goes to a cell tower, and the cell tower goes to the cloud, and the cloud goes to the cell tower, and to the phone, and that whole round trip is in the hundreds of milliseconds and by the time you won’t see your friend cloaked in the avatar. It will be a bit jarring because it will be off. It will send the previous position, not where they are now.

What we’ve done is actually cut this whole round trip to the cloud and just talk directly with the cell towers. This is an interesting approach because now in the world of 5G, some of the data bandwidth speed is getting even faster. This is another law, there are all these laws for computing, Edmond’s law talks about wireless communication at some point will be as fast as wire line at some point. There are ways of laws of physics and data transmission that can get there.

There’s this other concept around edge computing which is a push for the industry to put actually computation at the cell tower, which would be very interesting for AR for all the reasons I mentioned. You could start aggregating some of the computations and do them in the cell tower. Right now, what we do is actually just doing on the phones and burn a little more the battery of your phone, but later we could do that less. That’s something interesting where the industry is moving and that’s where we’re betting. Now, you cut down from hundreds of milliseconds to tens of milliseconds, and then you hit your magical budget for the human response time.

Speeding up Computation

The other design consideration is this concept of speeding up computation in this world where we’re moving too many cores rather than bigger fat cores. How do we do this? Computer vision is hard, there’s a lot of things that we’re trying to progress the field, there’s that part also the engineering, and marrying the two things together, we can achieve very interesting results.

We’re going to go very high level, I’m ignoring a lot of boxes in here, very high level on what traditional augmented reality SLAM pipeline is and what it is. At a high level you have these four stages. You have the raw sensor inputs that come from your pixels and from your IMU which is a gyro accelerometer and then those come through a box for feature extraction to extract the data in a more useful way, just think of it like the super high density data that all of it might not be useful to something more useful for localizing and mapping to creating this AR map and localizing is telling you where it is, and at the same time, you’re building this map as you go to bootstrap the problem.

Explaining a little bit more the inputs just so you understand why we don’t work with the raw images. For the cameras, if you’re working with 1080P, imagine you’re getting a matrix of 1080 by 720 times 3 because RGB. That’s uncompressed, that’s a lot, that’s 10 to the 6 at 30 to 60 hertz per second. It’s very hard for any system even your wimpy phone to just process all that and work with it all the time.

Then you have these other data and that’s very high res data that you use for science systems called the inertial motion unit which is basically a gyro and the accelerometer that tells you the x, y, z rotation acceleration that’s also used for telling you where you are in the world. It’s lossier. I guess, I was telling Roland about this, where you could technically know where you are in the world based on this if you take the integral of the acceleration couple times, but there’s a lot of error accumulated because these are super cheap sensors.

As I was telling Roland, the Apollo first went to the moon with just that, there were no cameras, but it was a super expensive IMU that was tens of thousands of dollars that could calculate all these numbers, because the math actually works out correct. Of course, we’re not going to put a $50,000 sensor in a phone, but maybe for a self-driving car, you could afford to have more for that because if we get some of the position wrong for AR, the consequences are not as terrible. Nobody really has died from AR getting wrong, but a self-driving car is dangerous, so you do want to put more expensive sensors in cars.

Those are the inputs, the takeaway is that we don’t work with the raw data and in order to do that, we work on feature extraction. What feature extraction is, is you take this raw cam matrix which is 1080 by 720 into 3 channels, here it’s just one channel in gray scale, and extract the interesting features that are based on the texture of the scene. There’s a bunch of algorithms here and that’s one of the hard parts with getting it working reliably because I think I was mentioning to someone that this whole problem in SLAM is a bit of an open loop problem. How do you know that these are the right features and they work, and if the lighting changes at this point, the extractor this time still works? That’s a whole feel of computer vision right there.

You get the super high density matrix and at the end what you get is just a vector of depending on your configuration of the feature. Here, we’re pretending the vector is size four, it’s actually a bit longer, maybe more like 100, but that’s a lot less than 10 to a 6. That’s what you work within the next couple stages. Those points that I showed become these abstract points and then as you build the AR map, I mentioned this concept of building stereo through time, start building the correlation as you move through the video with all the frames that go across and see the points that you see across frame, and start building with that this AR map.

This is how one of their maps for self-driving car looks like, for AR, they’re not as dense, but roughly they look like this and this is what you use to tell where your phone is in the real world, so that at the end your characters can render properly. That’s how it works.

It’ll be Hard to Run in Real-Time

A lot of these algorithms are super expensive to compute. This is from a paper that was trying to run a SLAM on embedded systems that are a bit of a proxy to phones. You look at the numbers, what you want to see is they’re all kind of in the seconds which is not good. This is the academic implementation because the typical implementation this is what they do. Your feature extraction pipeline, you have a lot because you’re going to wait until it’s done and then send it to the other parts, and then wait and lock, and wait and lock a lot of busy loop spinning.

Is there something better you could do? Our answer is yes. We’ve taken traditional computer vision algorithms that when we implement the raw version run in let’s say, single digits frames per second, when we did what I’m going to show you, we were able to achieve 60 frames per second. We did not change anything on the algorithm, this is basically just a new programming paradigm. With that, there’s this new framework for concurrent programming called actor models. At a higher level what they are is that actors are primitive units of computations that are completely isolated and do some sort of computation with the internal state. They don’t block anyone and they have their own memory trunk.

Messages for states when they’re done computing or asking for messages are sent, currently the keyword here is asynchronous. You’re not blocking to wait for other ones because if you actually start benchmarking and running a lot of these SLAM systems, half of the time, you’re just busy looping and waiting, and that’s expensive that you’re doing a change of context. Actor system gets rid of that because they all are completely isolated, independent computation blocks.

How does this line system start looking now? Now, instead of being all locked to each other, you have this message queue where each of the components run at their own pace and when they’re done, the other ones get the data, the messages and do their computation. What this means is that your future structure could maybe take a bit longer, but then could actually consume all that needs to do. It doesn’t have to wait until all the other process loop with localization or mapping is done.


As a summary, we talked about magic realism in AR and what are some of the things that we can do to achieve the suspense of this belief like authors in literature, how do you make the digital believable, and how we build AR systems for human perception, and is this whole concept why, at least for AR, is so important to bill things for speed and optimizing for it. Last thing, just credits, just thanks to Peter and my team, who came up with a lot of these ideas.

Questions and Answers

Participant 1: Thanks for the talk. One of the questions I had was have you considered deeper learning end-to-end approaches which go from an image to 3D reconstruction?

Hu: Yes, those are some of the things that we’re experimenting with. The main challenge with deep learning is that it takes a lot of computation cycles and it’s not going to fully run in real time with your phone while you still need to render and run the game.

Participant 2: I’d love to hear more about how you guys evaluate whether these improvements in latency and speed, about translation into the user experience. Do you test them? How do you measure that someone’s having a good experience with an AR app? It seems easy to benchmark on these more quantitative metrics, but how do you put that in the user’s experience terms?

Hu: At the end, all of this why we designed it the way it is is based on this assumption of human perception. Some of the things that you saw would just not be possible to build at all. If your latency and response time is in the single digit frames per second, you just couldn’t build any of those experience at all. That’s one part, it’s a binary switch whether it’s possible or not.

The other question is what do you do with incremental improvements? At some point it’s good enough, the thing that gets better for a lot of our users is definitely battery consumption. If you get better and more efficient, your battery lasts longer which is very important. Assuming we go into the world of AR which we’re big believers is kind of go into the world of headsets, and those are even more battery hungry because of the optics and all the photons that you need to shoot in order to render stuff.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Julia vs R: Freeing the data scientist mind from the curse of vectoRization

MMS Founder

Nowadays, most data scientists use either Python or R as their main programming language. That was also my case until I met Julia earlier this year. Julia promises performance comparable to statically typed compiled languages (like C) while keeping the rapid development features of interpreted languages (like Python, R or Matlab). This performance is achieved by just-in-time (JIT) compilation. Instead of interpreting code, Julia compiles code in runtime. While JIT compilation has been around for sometime now (e.g. Matlab introduced it in 2002), Julia was designed for performance with JIT compilation in mind. Type stability and multiple-dispatch are key design concepts in Julia that put it apart from the competition. There is a very nice notebook by the Data Science Initiative at the University of California that explains these concepts if you want to learn more.

Somewhere in time, we started using interpreted languages for handling large datasets (I guess datasets grew bigger and bigger and we kept using the same tools). We learned that, for the sake of performance, we want to avoid loops and recursion. Instead, we want to use vectorized operations or specialized implementations that take data structures (e.g. arrays, dataframes) as input and handle them in a single call. We do this because in interpreted languages we pay an overhead for each time we execute an instruction. While I was happy coding in R, it involved having a set of strategies for avoiding loops and recursion and many times the effort was being directed to “how do I avoid the pitfalls of an interpreted language?”. I got to a point where I was coding C functions to tackle bottlenecks on my R scripts and, while performance clearly improved, the advantages of using R were getting lost in the way. That was when I started looking for alternatives and I found Julia.

Next, I will try to show you how Julia brings a new programming mindset to Data Scientists that is much less constrained by the language. 


Let us consider the problem of calculating the distances among all pairs of elements in a vector with 10.000 elements. A solution for this problem requires ~50M to 100M distance calculations (depending on the implementation). The following approaches were implemented and benchmarked:

  • R dist: R’s built-in function for solving our problem (baseline)
  • R loop: nested loops approach in R (14 lines of code)
  • R rep: loops replaced by R’s rep function plus vectorized operations (6 lines of code)
  • R outer: using R’s outer product function plus vectorized operations (4 lines of code)
  • Rcpp loop: C++ pre-compiled implementation based on nested loops integrated with R via the Rcpp package (17 lines of code)
  • Julia loop: nested loop approach in Julia (15 lines of code)
  • Julia comp: loops represented as arrays comprehensions in Julia (2 lines of code)
  • Julia outer: direct translation of R outer approach to Julia (4 lines of code)


The loop-based implementation in R was the slowest, as expected (and would be much slower before version 3.4 where JIT became available). By vectorizing, we decrease computation time but increase memory consumption, which can become a problem as the size of the input increases. Even after our vectorization efforts, we are still far from the performance of R’s dist function.

Rcpp allowed decreasing both computation time and memory requirements, outperforming R’s core implementation. This is not surprising as R’s dist function is much more flexible, adding several options and input validation. While it is great that we can inject C/C++ code into R scripts, now we are dealing with two programming languages and we have lost the goodies of interactive programming for the C++ code.

Julia stands out by delivering C-like performance out of the box. The tradeoff between code compactness and efficiency is very clear, with C-like code delivering C-like performance. Thus, the most efficient solution was based on loops and preallocating memory for the output. Comprehensions are a good compromise as they are simpler to code, less prone to bugs, and equally memory-efficient for this problem. 

comprehensive version of this article that includes the code used for the experiments was originally published at here (open access)

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Presentation: Multi-tenancy in Kubernetes

MMS Founder


Probst: Let me start by stating something that I’ve seen, by talking to many of our customers. What do companies care about? They care about delivering their products to their customers, ideally, as quickly as possible, so velocity, and ideally, with as little costs as possible. These are themes that I see over and over, and people choose tools and infrastructure that help them achieve these goals. What I’m going to be talking about here today is how Kubernetes and, in particular, multi-tenancy in Kubernetes, can be one of the tools in your toolbox that you can look at in order to help you achieve these goals.

Let me introduce myself briefly. My name is Katharina Probst, I’m a Senior Engineering Manager at Google. You can find me on LinkedIn if you’d like. I will also share the slides, so you’re welcome to take pictures, but you can also download them later.

Why Multitenancy

Let’s start with why you might want to take a closer look at multi-tenancy. Do any of you run multi-tenant Kubernetes clusters? A couple, great, I’d love to hear your experiences too, maybe you can share with the room later. Why would you care about multi-tenancy? When you start out with Kubernetes, usually what happens at a very high level is, you have a user, and the user interacts via a command-line tool or the API, or UI with a master. The master, as we just heard, runs the API server and the scheduler, and the controller. This master is responsible for orchestrating and controlling the actual cluster. The cluster consists of multiple nodes that you schedule your pods on, Let’s say these nodes are machines or virtual machines, or whatever the case may be. Usually, you have one logical master that controls one single cluster. Looks relatively straightforward. When you have one user and one cluster, that’s what it is.

Now, what happens when you start having multiple users? Let’s say your company decides to use Kubernetes for a variety of maybe internal applications, and so you have one developer over here, creating their Kubernetes cluster, and you have another one over here creating their Kubernetes cluster, and your poor administrators now have to manage two of them. This is starting to get a little bit more interesting. Now you have two completely separate deployments of Kubernetes with two completely separate masters and sets of nodes. Then, before you know it, you have something that looks more like this. You have a sprawl of clusters. You get more and more clusters that you now have to work with.

What happens now, some people call this cube sprawl, this is actually a pretty well-understood phenomenon at this point. What happens now is, I will ask you two questions of how does this scale? Let’s think a little bit about how this model scales financially. How much does it cost you to run these clusters? The first thing that might stand out is that you now have all of these masters hanging out. Now you have to run all these masters. In general, it is best practice, not to run just one master node, but three or six, so that you get better high availability. If one of them fails, the other ones can take over. When you look at all these masters here, they’re not one single node normally per master, they’re usually three. This is starting to look a little bit more expensive. That’s number one.

Then number two, one of the things that we see a lot is, we see the customers that say, “I have all of these applications, and some of them run during the day, and they take user traffic.” They need a lot of resources during the day, but they really lie idle at night. They don’t really do anything at night, but you have all these nodes.

Then you have some applications that are batch applications, maybe back processing of logs or whatever the case may be, and you can run them at any time you want. You could run them at night, you could have this model where some applications run during the day and then the other applications run at night, and uses the same nodes. That seems reasonable. With this model, where you have completely separate clusters on completely separate nodes, now, you’ve just made that much harder for yourself. That’s one consideration.

Another consideration that people bring up a lot is operational overhead, meaning how hard it is to operate all of these clusters. If you’ve been in a situation like this before, maybe not even with Kubernetes, what you will have noticed is that oftentimes what happens is that all of these clusters look very similar at the beginning, maybe they run very different applications, but the Kubernetes cluster, like the masters are all at the same version of Kubernetes, and so forth, but over time, they tend to drift. They tend to become all of these special snowflakes. The more you have these special snowflakes, the harder it is to operate them. You get alerts all the time, and you don’t know, is it like a specific version, and you have to do a bunch of work. Now we have tens or hundreds of sets of dashboards to look at, to figure out what’s going on. This now becomes operationally very difficult and actually ends up slowing you down.

Now, with all of that being said, there is a model that is actually a very appropriate model under some circumstances. Lots of people choose this model, maybe not for hundreds or thousands, but lots of people choose this model of having completely separate clusters because it has some advantages, such as being easier to reason about and having very tight security boundaries. Let’s say you’re in this situation, and you have hundreds of clusters, and it’s becoming just this huge pain. One thing you can consider is what we call multi-tenancy in Kubernetes.

There are many definitions of multi-tenancy. When you read things on the internet about multi-tenancy in Kubernetes, you have to dig a little bit deeper to understand which model we’re talking about. Usually though, what people talk about is this model that you see up on the slide here. What this model is, is you have many users that interact via the command line, and the API, and the UI, with one logical master. You have one master running, and that master now controls a large cluster – because for small clusters, it doesn’t make that much sense, maybe – but a large cluster and that cluster is divided up into namespaces.

There’s this concept that we just heard about in Kubernetes that’s called namespaces. What namespaces are, it’s very important to understand that they are virtual clusters. You have one physical cluster, but then you divide that cluster up into namespaces. That does not mean that these two nodes belong to this namespace and these three nodes belong to the next namespace. The nodes are actually shared among the namespaces, but the namespace provides a boundary that creates this universe for you. Then you can run different applications in these namespaces but still share the resources.

Let’s dig into this a little bit. Usually, when you run Kubernetes, you have different roles and different kinds of users of this cluster. If you have a multi-tenant cluster, what you can have, more than likely is, you’re going to have a cluster administrator. That cluster administrator, essentially, has a lot of access to all the cluster. They’re the ones that set up the namespaces, they set up the resource limits, as we will see later in the talk, and they make sure that there’s consistency across the namespaces in the cluster so you don’t end up with this divergence and all of these different snowflakes. Of course, oftentimes, they’re responsible for operating the cluster, responding to incidents and making sure everything runs smoothly.

Now, we have a new role that really only applies to this model of multi-tenancy, and that is the namespace administrator. The namespace administrator now does not necessarily have access to our control over the entire cluster, but only one namespace, maybe multiple, but not the entire cluster, so only admin rights to specific namespaces.

Then finally, you have the cluster user, and the cluster user, just like it was before, runs their applications on the cluster. Now, in this multi-tenant model, it’s a little bit different because the cluster user now has access only to certain namespaces, maybe even only to one. It is their responsibility to understand their own namespaces, to run their apps in their namespaces, make sure they understand the resource limits, and make sure they don’t trample on other tenants. We’ll get more into more detail about this further on in the slides.

Essentially, what you’re going to have is you’re going to have different roles, cluster administrator, namespace administrator, and user that you will typically see in these kinds of deployments.

Hard Multitenancy

When people talk about multi-tenancy, they often talk – if you go to, for instance, the open-source Kubernetes community, the working group for multi-tenancy – they talk about this concept of hard multi-tenancy and soft multi-tenancy. I’m going to talk about hard multi-tenancy first, but let me just give you a brief overview of what this means.

On the one end, hard multi-tenancy means that you have tenants that you don’t trust and they don’t trust each other, so there is zero trust. That might be random people uploading code and running it, or it might be different companies that compete with each other. It could be anything, but it’s very much on the end of the spectrum where there’s zero trust.

On the other side is soft multi-tenancy, and I’ll talk more about this later today. When we’re talking about soft multi-tenancy, there’s more trust established between the tenants. One thing that’s important to understand is that people often talk about hard versus soft multi-tenancy. In reality, it’s really a spectrum, because how much you trust your tenants is not a binary, it’s usually a spectrum. Which kinds of use cases work for you, you have to think for yourself and for your specific use case.

Let’s talk a little bit more about hard multi-tenancy. Again, that is the case where there is no trust. Hard multi-tenancy, for a variety of reasons, is not yet widely used in production. Essentially, what it boils down to is the security boundaries and making sure that tenants don’t step on each other. It is not yet widely used in production. There is ongoing work in the Kubernetes community to strengthen and make changes to Kubernetes so that we get closer and closer to a point where that is a very viable thing to do.

Let’s talk a little bit about what it would take to have that. Think about this a little bit. You have now one cluster with a bunch of nodes and these nodes are shared by potentially malicious tenants. What do you need to do to make sure that this actually works smoothly? You need to make sure that there is great security isolation, that’s the second bullet here. Tenants cannot see or access each other’s stuff, they cannot intercept networks requests. They cannot get to the host kernel and escalate their privileges. All of that needs to be made sure so that you can have tenants that you cannot trust. The other thing is that you need to make sure that tenants essentially don’t DoS each other, meaning they don’t impact each other’s access to the other’s resources.

We’ll talk about this a little bit more later on in the talk, but think about this. You have a bunch of nodes that are now shared, and you have to make sure that everybody essentially gets their fair share. That’s one thing it would take. Another thing you have to make sure is that when you have resources, so, for instance, there’s this concept of custom controllers and custom resource definitions, that’s a way to extend Kubernetes. If you now have all of these different tenants, and they extend, they add their own API’s, their own CRD controllers, you have to make sure that they don’t conflict, so that one person over here doesn’t create an API that conflicts with something over here. You have to make sure that they’re very nicely isolated.

Then finally, much of what we talk about is about what we call the data plane, which is the cluster where the nodes are. The same questions apply to the master, which we call the control plane. We have to make sure that the control plane resources are also shared fairly. As we’re on this journey towards making hard multi-tenancy more and more valuable, and more and more practical, and used in production, those are the kinds of questions that we need to answer.

We’re going along this journey towards more and more hard multi-tenancy. Right now, what people do a lot is they use multi-tenancy in a context where there is trust between the tenants. The use cases, for instance, that are very common or pretty common are different teams within the same company. Within one company, you say we share one big pool of resources and different teams share them. The different teams really have good incentive and good reason to behave nicely. They’re not assumed to be malicious, they trust each other and accidents happen, and that’s what you try to protect from, but you don’t assume that they’re completely not trusted.

In that model, as you may by now have guessed, different teams will typically get different namespaces to share in one cluster. As I already said, this is used in production. Oftentimes, what happens is that multi-tenancy is still something that requires a little bit of setup or maybe a lot of setup. There are a bunch of knobs that you need to turn, we’ll talk about that in a little bit. Often, or several times, what I’ve seen is, that companies use multi-tenancy, but then they actually have a few people that are dedicated to making sure the policies are applied correctly and network is set up consistently, and so forth for these shared clusters.

Multitenancy Primitives

There are a number of primitives that exist in Kubernetes, that will help you get a multi-tenant cluster set up and administrated properly. I already mentioned namespaces. The good thing about namespaces is that they were built in very early on in Kubernetes and they’re very fundamental concepts, so they’re actually implemented everywhere and pretty much all the components understand namespaces. That’s good, but namespaces alone are not good enough.

There’s a number of things you need to do in order to set up your multi-tenant cluster in a way that protects tenants from each other’s accidents, for instance. We’re going to talk about three things in a little bit more detail. One is access control, which means who can access what. One is isolation, which means how do I make sure not everybody can see each other’s stuff. Then the last thing, going back to our hard multi-tenancy goals, is fair sharing. What already exists in Kubernetes that lets you ensure fair sharing among tenants.

Let’s talk a little bit about access control. That’s our first primitive that we’re going to talk about here. We already heard a little bit about RBAC, it was mentioned in the previous talk. RBAC is role-based access control in Kubernetes. RBAC is essentially, a tool that’s built into Kubernetes that lets you control who can access what. Basically, the way it works is that you set up these roles and in these roles, you describe, “I’m going to have my administrator role, and this administrator can do all of these different things,” There are two kinds of roles. There’s ClusterRoles, those are roles that apply to the entire cluster, it makes sense.

Then they are just Roles, and those roles are namespace-scoped, meaning they apply to specific namespaces, whichever ones you list. Kubernetes already comes with some default roles that you can use, but then you can create your own and you probably will want to. You create your own roles that say exactly who can access what pods, what namespaces, what secrets, and all of that stuff.

You’ve now created those roles, and so now you get a new employee. Now, you need to make sure that this employee is assigned these roles. The way you do that is with ClusterRoleBinding or RoleBinding. ClusterRoleBbinding lets you bind groups of people or service accounts, or individuals, to ClusterRoles – again, that are cluster-wide – and RoleBindings let you bind individuals or groups, or service accounts to namespace-scoped roles.

You will use this extensively. As very concrete example, you will use this extensively to achieve this isolation that you want. Very concrete example is Secrets. Secrets is a way that Kubernetes provides that lets you store things like passwords if you need them. The way it’s done is, they’re stored in SCD, so in the master. You have to make sure that they’re encrypted, but also you have to make sure that only the correct people that have access to the correct namespaces can access those secrets. Even if you’re in a place where you all trust each other, more or less, you should make sure that secrets are pretty well-protected.

RBAC is this mechanism that you’re going to use extensively to make sure that you essentially create this universe, this virtual cluster out of this concept that’s called a namespace. There are other things that Kubernetes provides that lets you become more granular in your security controls. One of them is Pod Security Policy, it’s similar in that it lets you set security policies. It basically lets you say, “For a specific pod, I will only allow pods into a specific namespace if that pod is not running as privileged,” for instance. So, you can set up a pod security policy, and then, again, apply that to the cluster and to the namespaces so that you have better security.

One of the things that happens, we heard a little bit about network policies. Let me just touch on that in the context of multi-tenancy. When you have a multi-tenant cluster, you have a cluster that is carved up into these namespaces. The namespaces are virtual clusters, so the nodes are shared between all of the namespaces. Now, the pods get scheduled on these nodes, so it’s entirely possible and likely that you will have a node that will have pods from different namespaces on the node.

If you’ve ever worked on a large distributed systems with many different components, you will have experienced that it is a very good idea to be very thoughtful about specifying who can talk to who. I’ve worked on systems where we did not do that at the beginning, and then, three years in, we’re like, “That was a very bad idea,” because now everybody can talk to everybody, and we can’t reason about anything, and we have no idea what’s going on.

In a namespace in a multi-tenant Kubernetes cluster, it is very recommended that you’re very thoughtful about setting network policies. What those network policies let you do is, they let you set ingress and egress, so for specific pods, they let you say, “This pod can talk to this pod, but not to these other pods,” so you can reason better about the topology of your deployments.

Another best practice is to make custom resource definition namespace-scoped. They can be, and arguably they should be. There are some use cases where that might not be the right thing to do, but in general, when you have these custom resource definitions – which are extensions, so you might have different teams writing extensions to Kubernetes and their own custom controllers – in many cases, it makes sense to make them namespace-scoped so they don’t conflict with whatever other people are doing because you might not be interested in other people’s extensions, and you might not like the side effects that they might have.

The final thing I want to touch on in terms of isolation is Sandboxes. People now go around saying containers don’t contain, so they’re not great security boundaries. What that means in practice is if you run your containers on a node, then there are certain security considerations that you have to think about. For instance, you have to make sure that when that pod runs on the node, the pod cannot easily access the host kernel, and then hack into the host kernel and escalate privileges, and then get access to everything else that is running in the cluster.

Sandboxes put a tighter security boundary around each pod, and so you can just launch all your pods in Sandboxes. Then there are several different ones, gVisor is one of them that’s been developed. It’s actually open-source, but Google is investing very heavily in it, so I know a little bit more about it. The way it works is, it puts the security boundary and isolates the pods more. The goal is to make sure that information is not leaked between tenants, and tenants can’t break out accidentally or maliciously and mess everybody up, and stop everybody else’s containers. That’s something to consider.

There are a lot of details here, but what I want you to take away from this part of the presentation is that when you have a large e multi-tenant cluster, you assign namespaces, and that is really the first step. What you then do is, you set up all of the security and isolation mechanisms so that in essence you create a more tightly controlled universe for each namespace.

Let’s talk a little bit about fair sharing. What I’m going to be talking about on this slide is fair sharing in what we call the data plane, which is the cluster of nodes, so fair sharing of resources. The reason I’m talking here about the data plane is because there are different mechanisms on the data plane, and it’s actually better developed than it is on the control plane on the master. I’ll talk about the master in the next part of the presentation. Let’s talk about the data plane a little bit.

When you have all of your different teams running your applications, I have experienced this, maybe some of you have too, even when you want to behave nicely and you’re incentivized to behave nicely, what happens sometimes is that all of a sudden, you get a lot of traffic. Then your autoscaler kicks in and that’s wonderful, and your application still runs, but now others cannot run. You have to make sure that you have the mechanisms in place so that tenants don’t trample on each other and don’t crowd each other out. The most important or maybe the most fundamental way to do this in a multi-tenant Kubernetes cluster is with something called Resource Quotas.

Resource Quotas are meant to allow you to set resource limits for each namespace, which makes sense because you have a number of nodes, and you need to make sure that you carve up the resources among the namespaces. There’s something called LimitRanger, which lets you set defaults for all the namespaces. Essentially, what you’re going to want to do is you’re going to want to think about how many resources does everybody get for CPU, for memory, and also for things like object counts. How many persistent volume claims can I have per namespace? Because there are limits on how many volumes you can mount on each virtual machine, depending on where you run them. You have to also make sure that those are shared fairly as well. Resource Quotas let you do that.

Then there are things that let you put priorities and Quality of Service Classes on the parts. There are related concepts, and what I want you to take away is that there are ways that you can control the little bit, what pods run at higher priority than others. You’re probably familiar with this concept of priority, even if it’s not from Kubernetes, from other systems like Linux. It essentially lets you control them. Quality of Service Classes are another twist to this because they let you also say, “This is how many resources I need, but I can burst out of them potentially,” or I can say, “I need absolute guarantee that these pods will run.”

Then finally, the last two bullet items here, node and pod. Those are mechanisms that lets you influence the scheduler. The scheduler is a complicated complex piece of technology that is not always easy to reason about. Your pods are scheduled and you’re not really all that sure why they ended up where they did. It’s complicated to reason about oftentimes, but there are mechanisms that let you influence the scheduler. We heard a little bit about Affinity and Pod Anti-affinity before. What that means is you can say these two pods shouldn’t be scheduled together. In the context of multi-tenancy, that might mean that applications in two different namespaces should not end up on the same pod. Maybe you have namespaces that run financial applications that you want to keep separate from other things.

One interesting concept that I’d like to call out here is this concept of taints and tolerations, and that’s really interesting in the context of multi-tenancy. What it does, the way it works, is that you say, for a node, you give it a taint, and you just give it a label. You say, “green,” and then only pods that have a toleration that matches that get scheduled on it. Only parts that have that same toleration, green, will get scheduled on that node. What that means for us in the context of multi-tenancy is, it’s a way for you to control if there is a need to have nodes that only schedule parts from a specific namespace. That’s how you get it done.

Control Plane Multitenancy

Let’s talk a little bit more about control plane. Control plane, again, is the master. The API server and the scheduler, that’s what’s most important in our context here. Much of what we talked about were the nodes, so let’s talk a little bit about the master. One of the things you will notice, as we’re going through this, we’re sharing the cluster that’s on the right, so all of these applications are sharing the cluster, but the other thing we’re sharing is the master. We’re still, all together, sharing that one master.

There’s one thing I should point out. Remember how I said at the beginning when people say multi-tenancy, they sometimes mean different [inaudible] not able to do that. We’re definitely in this mode of one master controlling one cluster. That’s fine.

All tenants share the master, and that includes things like secrets. Remember what we talked about, you need to protect your secrets with RBAC. One of the things that the master, in particular, the API server, is not really great at right now is preventing tenants from DDoSing it or DoSing each other and crowding each other out. You have this master running, and the master takes in these requests from users, Imagine you have one user that all of a sudden just sends all of these requests. Then the API server is, “I don’t know what to do with all these requests.” Then the API server gets behind and other tenants don’t get a word, their requests get rejected. It’s actually worse than that because the API server could also drop things like garbage collection and other things that get dropped.

There is work underway that you can check out, it’s going on right now in the open-source community that will enable better fair sharing on the API server. The way this will work is, it’s sort of a redesign of this concept of max inflight requests. Max inflight requests is a concept in the API server that essentially says, “Here is how many requests I can handle at any one given time and the rest, I just reject.”

In this proposal that is currently underway, you can read this in the slide, but what it will do is, it will generalize max inflight request handling in the API server to make more distinctions among requests, and provide prioritization and fairness among the categories of requests. That is long, you can afterwards read that again in the slides, but let me just explain a little bit what that means for us in the context of multi-tenancy.

In this new way of doing things, when different tenants have requests coming in, there will be different priority levels, so the system ones will take the highest priority level, as you might guess, and then different tenants can be perhaps at the same priority level, and different tenants will likely have different cues and then they will compete evenly for the API server.

What Companies Care About

Before I conclude, I’ve given you a number of different things to look at and things to think about how you set this up. Now, let me get back to the beginning of the presentation, and talk a little bit, bring home the point of why multi-tenancy can help you with velocity and cost. You may walk out of this presentation saying, “That’s really complicated,” but when you really get down to it, when you have a shared cluster, you do have the ability now to have policies across the cluster, to have the same network settings across the cluster, and control things more tightly.

In my experience, that will really help you long-term with your velocity, with your speed of getting things out faster, so you don’t have to look in 100 places. Then just going back to the point about cost. Cost is something that you can save by sharing the master, but then also sharing the resources to underlying nodes between all the namespaces.

Key Take Aways

What I want you to take away from this presentation is, think about multi-tenancy as one of the tools in your toolbox if you want better resource efficiency, costs, and operations. We talked about velocity and costs. Think about it, and see if it applies to your use cases.

When you read a little bit more about multi-tenancy, you will hear people talk about hard and soft. Remember that it’s a spectrum, Hard means you don’t trust the tenants at all, soft means you trust them completely. The truth is usually somewhere in the middle because even if you trust the tenants completely, they might make stupid mistakes and you still want to protect yourself against them, to some extent.

We’re on this road towards making hard multi-tenancy really viable. There is ongoing work, which is really encouraging to me and it’s very exciting. Right now, what we see is that quite a number of companies use soft multi-tenancy in production, and as you saw in this presentation, there’s still some setup required and some knobs you have to turn to make sure that it works well for you. It is definitely something that can work out very well.

When I share the slides, I will have a few links at the end that link out to some of the open-source work that I was talking about here, so you can read a little bit more.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Presentation: Graal: How to Use the New JVM JIT Compiler in Real Life

MMS Founder

[Note: please be advised that this transcript contains strong language]


Thalinger: I work for this tiny little company you might know. This talk is not really about Twitter, it’s more about Graal, the compiler. The most important question of the whole conference is, “Who has a Twitter account?” That’s pretty good, it’s almost everybody. If you don’t have an account yet, create one right now because I want you to tweet about my talk and all the other talks that you find interesting.

If you’re going to tweet about my talk, please at that hashtag “twittervmteam” because you might not believe it, but Twitter actually has a VM team and I’m on the VM team. Two of my colleagues are also speaking at QCon Sao Paulo on Wednesday, I think, you can hear from them a little bit about machine learning. We do some machine learning thing and then Flavio is talking about all the Scala optimizations we are doing in Graal.

What Is Graal?

This talk is not about GraalVM. GraalVM is a very unfortunate marketing term, in my opinion, and I bitch about it on Twitter quite a bit because it’s very confusing. GraalVM is an umbrella term that basically consists of three different technologies. One is called Graal, the JIT compiler, the other one is called Truffle, it’s a framework where you compete with language runtimes. Then there is something called Substrate VM, which you might know as Native Image. That’s where everyone is super excited about right now and freaking out. I’m only talking about Graal, the compiler today.

A JIT compiler is when you compile, let’s say Java Scala or Kotlin to Java bytecode to class files. What the JVM does is it takes it and then it interprets it, which is pretty slow. Then there is something called a just-in-time compiler which compiles the Java bytecode into native code while you are running your application. That’s why Java actually is as fast as it is, because of that. Just-in-time compiler, that’s what I’m working on for a couple of years now. I’m going to talk about that later a little bit.

Graal is a just-in-time compiler for HotSpot. It’s actively developed by Oracle Labs. There’s an OpenJDK project called Graal. Most of the work is actually done on GitHub, if you’re interested, I’m going to show you how to build Graal actually from GitHub and use it with the latest JDK11. Graal uses something called JVMCI. It’s a compiler interface we introduced with JEP 243 and JDK 9 so you can actually plug in an external compiler. You’ll see that later as well. Graal is written in Java. If there’s one thing you should take away from this talk is this, it’s written in Java and that’s very important at least today. In the future, this will change.

HotSpot has two JIT compilers, one is called C1 or client compiler, the other one is called C2 or server compiler. C1 is a high throughput compiler. It does not do as much optimizations as let’s say C2 and Graal. The purpose of C1 is to produce native code quickly so we get away from interpreting code so that it would run on native code. C2 is a highly optimizing compiler, it takes profiling information and does all the fancy optimizations that you can think of, bunch of inlining, escape analysis, loop optimizations and so on. Graal is supposed to be a replacement for a C2. That’s a highly optimizing compiler and it’s written in Java.

C1 and C2 are written in C++ while Graal is written in Java. There are two major differences between something that’s written in C++ and something that’s written in Java. It’s how memory allocation works, it’s malloc memory versus you allocate something on a Java heap. Then, at least today, we have to do something called a bootstrap because our compiler that’s part of the JVM is written in a language that the JVM executes so it compiles itself while it starts up. You’ll see that later as well.

Where Do I Get It

Where do you get it? Where do you get Graal? We actually used Graal in JEP 295 for something called ahead-of-time compilation. This is not native image, this is something else. This we did in 9 and it’s basically a small command-line utility that takes Java class files or JAR files, sends off all the methods to Graal to compile it and it spits out a shared library at the other end. Then HotSpot can pick up that shared library so you can skip the interpretation of bytecode. If you have a very big application, this might help start up. I say might because it’s very difficult to get it right. The difference between this one and Native Image is this one is actually Java while Native Image is a subset of Java, it’s not really Java. I’m not going there because it would take too long.

Then Oracle on the JEP 317 added it as an experimental JIT composer. It put Graal, it was already in there, but it said, “Now, we basically announce it and say, it’s an external JIT compiler, you can use it.” That happened in 10. If you are running on a JDK 10 or later – 9 works as well because when we introduced the out-of-time feature, I made it so that it works, no one knew about it but I did.

Get It Demo

I used to do this demo differently. I used to do this talk in a cloud container where I started up an empty cloud container so I could show you that I’m not cheating. Everything I do today, you just have to do what I do, I haven’t prepared anything so that also means a lot can go wrong, which sometimes it does. We’ll see how that goes today. There’s a lot of typing going on today as well, I don’t know if you’ve ever typed in front of a lot of people, so typos – certainly possible.

I’m using the DaCapo benchmarks to show you some benchmarking numbers and then some other things that I need a little bit more to actually execute the code. Do I need something else? No, I don’t think so. JDK 11 – I’m using 11 because it’s LTS. I could use 12 as well, but every one of you is probably using JDK 11 in production right now, that’s why I’m using it.

We set this guy and then that should be enough for this setup. It’s 11.0.2, the one we just extracted, we do the same thing over here because we are going to compare C2 and Graal later in benchmarks runs. That’s why I’m doing this.

Then the other thing I’m doing is, I’m going to set an environment variable that’s called Java tool options and all the launches that the JDK has, we’ll pick up an environment variable automatically. I’ll set a bunch of things, so I don’t have to set it all the time. First, as you might know, since JDK 9, the default GC is G1. Logging output of G1 is a little hard to read. Now, we’re going to look at GC, I’ll put a little bit later. Parallel GC is just so much easier to read, we use that one. We set a maximum heap size, pretty small, only 512 megs. I do this because I want you to see when GCs are happening. If I set the heap size too big then we don’t see a GC and then I cannot show you what I want you to show.

We also set the start size of the heap to 512 megs. The reason why I’m doing this is, if we are running something with C2 and Graal, because Graal uses Java heap memory to do compilations, the heap expansion would be different and then we wouldn’t really compare apples to apples. I’m trying to make it apples to apples as good as I can.

You also know that since JDK 9, we have modules now. As you can see, it picked up this environment variable and it’s using it. I think they’re 75 ish modules that the JDK has and we are looking for modules that are called jdk.internal.vm. As you can see, there are three, one is called jdk.internal.vm.ci, that’s JVMCI. That’s the compiler interface that we introduced in nine, it’s basically an API and a module. There’s obviously some part on the native code side as well because we have to talk to the VM. That’s the Java interface. Then there’s another one called jdk.internal.vm.compiler. It’s just a Graal source code from GitHub in the Java module, that’s really all this is. Then there’s a little bit of management here, ignore that for now.

As I said, I was doing this talk with a cloud container and it always took a while for the cloud container to come up. In the meantime, I was talking about myself. We go through this quickly, I’m working on JVMs for a very long time, all these 14 years, basically on JIT compilers. That’s all I do. I used to work at Sun Microsystems and Oracle on the HotSpot compiler team, mostly working with C2. It’s a major pain in the ass, don’t do it, use Graal. These are the three biggest projects I’ve done at Sun and Oracle. I’ve worked on JSR 92, which you might know as invokedynamic, method handles. If you use Java 8 lambdas, you’re actually using invokedynamic under the hood without you knowing. I wrote a lot of that code. There is a package called java.lang.invoke, I wrote a lot of that Java code. If it doesn’t work, you could technically blame me, but other people touched the code after me, so the code I wrote was perfectly fine and they broke it.

JEP 243 is the interface we already talked about. It’s basically the interface that Graal was using. We just extracted it and made it a somewhat stable API. It’s not an official API because it’s not an official supportive namespace, but it’s stable ish. JEP 295, we already talked about it, they now work for a very great company called Twitter. It’s the best company on the planet.

Why This Talk

Why am I doing this talk and all the other talks that I have? I want you to try Graal. There are a bunch of reasons why I want you to try this because number one, I’m a very nice person. I want you to save some money. I think I have a slide for this that I’m doing where I explain basically how we are saving money by using Graal. We reduce CPU utilization for the stuff we do, we use less machines for everything we do and that’s a lot of money.

Then I would like to fix existing bugs in Graal and we found a bunch and then there’s one talk I actually talk about the bugs we found and I explained them. We haven’t found the bugs in two years. What we need is we need to throw different code at Graal. What I would like to have is that you use your shitty production code and run it on Graal, that would be really nice. Since you all agreed earlier that you run on 11, that’s not a problem.

Then I want to improve Graal, I want to make it a better compiler. We can only improve the compiler if we see issues. We cannot just optimize into the blue, we need to know, “This code doesn’t work as it should or it doesn’t work as well as on C2.” Then we look into it and can actually improve it. That’s why I’m doing this.

Then when I do my presentations, people come up to me and ask me, “Is it safe to use, because it’s an experimental compiler? Does your data center burn down?” No, it does not. We have our own data centers, they are still up and running. You could tweet right now and you would see that it works.

How do I use Graal and where do I get it? This is exactly why I made this talk, for these two questions. Then when they actually try it, they usually send me an email or tweet at me or send me a DM. Most are complaining about benchmarked numbers and that Graal sucks. The reason for this is because they don’t understand the difference between a compiler that’s written in C++ and a compiler that’s written Graal. They look at the numbers they get in the wrong way. I’m now explaining to you today so you don’t make the same mistake.

That’s the money-saving thing. It’s called Twitter’s quest for a wholly Graal runtime. It’s basically the story of my first year working at Twitter, how we started running services on Graal, how much money we save. No, I’m not telling you how much it is because I’m not allowed to, but it’s a lot. It’s way more than they pay me, which I find unfair, but they don’t agree, I don’t understand. Watch this if you’re interested.

Use It Demo

Back to the demo. We’ve already done this, let’s move up. How do we use it? You get a JDK with Graal. If you have that module, jdk.internal.vm.compiler, then the only thing you have to do is to turn it on. Let’s do a demo – how to turn it on.

We go to open JDK and then JEP 243. That’s the JVMCI JEP that I was talking about and we hope that Wi-Fi works faster. These are the problems that I was talking about. I’m glad I’m not doing the demo in the cloud anymore because that wouldn’t work. I wanted to show you in the text of the JEP it actually tells you how to turn Graal on or at JVMCI Compiler. It could be any compiler. If there is a compiler out there who’s implementing the JVMCI API, you could run it.

There’s one called “UnlockExperimentalVMOptions” because it’s still an experimental VM feature. Then there is one called “EnableJVMCI.” It’s basically only turning on access to the API. It’s not automatically turning on the JIT compiler, but it gives you access to the API. Sometimes if you run Truffle – I think Oracle Labs does this sometimes, they actually run on C1 and C2 but use Graal for Truffle. That’s why you only turn on JVMCI but not the compiler. Then the last one is UseJVMCICompiler, that’s really all you need. We copy this and we stick it into this Java tool options thing and then if we do a java-version, we’ll see it picks it all up and then it prints this.

There is a thing called PrintFlagsFinal. It prints all the flags that the JVM has that’s done. We are looking for the ones that have JVMCI in the name and there’s like a 10, 12 or something. As you can see, we have here EnablJVMCI and it’s true because we turned it on. Just use JVMCI compiler it’s true because we turn it on. Then the one that I’m looking for is this one, JVMCI print properties and so we’re going to do this and then it prints a very long list of properties. Most of them are Graal-related stuff, things you can tune with Graal, things you can change and at the very top, it’s like a handful of JVMCI-related properties.

The one I’m looking for here is called InitTimer. Since it’s a Java property, we decided to pass in options to JVMCI and Graal as Java properties because both are written in Java so it makes sense. We do a -D and then InitTimer and then equals true, obviously because that’s how we have to do it. What it does is, it prints some initialization output when JVMCI initializes itself. Let’s do this. Nothing happened. There is no additional log output except the version. Did we do anything wrong? No, we didn’t. The way it works is JVMCI is lazily initialized.

What is tiered compilation? We start at interpreting Java bytecode and then we run interpreted, then we recompile with C1 and there are actually four tier levels in HotSpot and the first three are all tier levels that C1 compiles. The number depends on how much profiling information it gathers. We usually compile on tier three where we get a lot of profiling information which we then use later for C2 to recompile. You are going through tiers and every step of the way your code gets faster. This is tier compilation. C2 is tier four and we are replacing C2 with Graal here.

If we do print compilation here, you see all the methods that are getting compiled when you do a dash version. It runs a little bit of Java code, so it actually compiles stuff. The third column, this one here is the tier level of the compilation. As you can see, there is no four. The reason for this is because no code gets hot enough to actually trigger tier four compilation and that’s why JVMCI is not initialized. What do we have to do is we have to run a little bit more.

We can do a dash L which basically only prints the benchmarks that the framework has. If we do this over here, within an init time on, you can see something is getting initialized. It starts here with the class HotSpot JVMCI runtime, then it does a bunch of things, get some configuration for our architecture, MD 64 but it doesn’t look like it actually finished the initialization. That’s correct because what happens is the list of benchmarks exit sooner before JVMCI is actually fully initialized and compose a method. We need to run a little bit more.

What we do is we run a small benchmark run of a benchmark called Aurora and then you can everything is being initialized. We actually finish. As you can see here, HotSpot JVMCI runtime, it took 56 milliseconds to do this. Then there is a thing called compiler configuration factory, which selects the compiler. As I said earlier, if you had more than one, that’s compilers that support JVMCI, you can select it with a Java property. This is all described in the JEP that I can’t show because the Wi-Fi is broken, but you can see that. Then by default, and since there’s only one competitor, it always selects Graal. We initialize this class called HotSpot Graal runtime, does a bunch of things. As you can see, it creates a back end. Then it also looks like it’s not really finishing the initialization of this. That’s because the Aurora benchmark harness redirects the output into a file. It’s somewhere now in a file, but believe me, it finishes. This benchmark around here was actually done with Graal.

What is Bootstrapping

Now, we have to talk about bootstrapping. Bootstrapping is still a problem. Oracle Labs combined with Oracle, the travel platform group who are doing Java. There is a project called LIP Graal, we don’t have it yet. I think the latest GraalVM release actually has it in it but OpenJDK or OracleJDK does not. LIP Graal uses substrate VM native image to AOT compile Graal itself, which totally makes sense. Then the whole bootstrapping part goes away and also, the memory allocation stuff that we’re seeing later will go away but at this point in time, we have to deal with bootstrapping.

Graal is just another Java application running in your JVM, it’s written in Java, it loads Java classes, obviously. These Java classes have Java methods and these methods – Graal’s own methods need to be compiled at some point. Otherwise, we would interpret our compiler and that would be ridiculously slow. We need to compile it, that’s the bootstrap. Let’s do that.

Bootstrap Demo

You can do an explicit bootstrap like this, Bootstrap JVMCI. Please don’t do this. This is really just for purposes for doing a presentation. Sometimes it can be helpful when you do benchmarking, but don’t do this because it skews everything. Let’s do an upward strip here. As you can see, every dot is a 100 method compilations and it gets faster. The dots come up faster because Graal gets compiled by itself and then it can compile code faster. It makes sense. We compiled 2,500 methods in 19 seconds. No one wants to wait 18 seconds for this fucking Java thing to come up. I know that no one writes LS in Java, but if someone would write LS, 18 seconds – probably not.

The bootstrapping can be done explicitly, or it’s done implicitly in the background. You know that every JVM has GC threads. You might know that, it has a bunch of threads where it does garbage collection in parallel. It does the same for compilations, it has compiler threads and the compiler threads, they do the work in the background. You’re still interpreting your code, you’re running your stuff, whatever it is, and in the background, you’re compiling code. Once they have this done, you then run on the compiled code.

We run again the benchmark. We run three iterations of this Aurora benchmark over here with C2. It takes 2.9 seconds – let’s say three seconds. If we do the same thing over here with Graal, you can see that the first run takes a little bit longer than with C2 – like five seconds. Then the second and the third run, it’s actually faster, which is surprising. The benchmark itself is a little flaky but it’s the performance is about the same. We use the difference between the first run, the one-second difference is because we have to compile Graal. It’s not 18 seconds, it’s basically one.

We at Twitter, we run it that way. We do the bootstrap, we don’t AOT compile Graal or anything because you’re only compiling a limited amount of methods for Graal. It’s not becoming more. If your application is bigger, it’s still only a second. I’m sure all your applications take longer to start up than two seconds or one. If it’s one second more, it really doesn’t matter.

What We Learned

Bootstrapping compiles quite a lot of methods. If you run tiered, what we just did, it’s about 2,500. If you turn tiered off, that means Graal needs to compile more because C1 is not in the mix. It actually compiles roughly 5,000, and 5,000 methods is what you will see when you run a big application. Then you compile about 5,000 Graal methods, but the overhead is really not noticeable.

Then you could do it either upfront – don’t do it – or on-demand during runtime. That’s what we just saw. By default, on-demand compiles Graal, the Graal methods themselves are only compiled with C1. The reason for this is, we don’t want the compilations for the Graal methods to race with the methods for our application. For the Graal methods, we want code as quickly as we can so that it compiles our application. You can turn this off with a flag.

Java Heap Demo

Java heap usage is also very important. Graal is written in Java, that means as I said at the very beginning when we just learned that Graal methods are only compiled by C1 so that’s not a problem, but all the methods that are being compiled for your application use Java heap memory. Possibly, Graal methods too. That’s the flag where you can turn it on and off. Let’s do a quick Java heap demo here.

We are using this benchmark again. We turn on. We’re running the three iterations of Aurora and what the benchmark harness does is, it does a system GC before and after a benchmark iteration. The reason for this is it cleans up the heap. It gets rid of all the stuff. As you can see, during the benchmark, there are no GCs. Aurora is a very compute intense benchmark, it doesn’t do a lot of memory allocations. We have roughly 39 megabytes on the heap after a run and then we collect down to 1. It doesn’t allocate a lot.

If we run the same thing over here with Graal, you can already see that there are more stuff going on because we have GCs happening during our first iteration. We have this one here, we allocated about 130 megs and then we collect down to 7, another one 35 6-ish collect down to 8, and then there’s another 86 megabytes on the heap after the run. That’s Graal doing its work. That’s Graal compiling the benchmark methods.

The second iteration, as you can see, there was no GC during the benchmark. We had about 130 megabytes after the run on the heap. That means we still were compiling some methods, but after the third iteration here, we’re not compiling anymore. All the benchmark methods are compiled, that’s it. That’s the reason why only the first iteration is slower, because it does some work, it does GCs, it slows it down a little bit, but then you’re done.

I’m arguing that when you start up whatever you have if you have a microservice or application or whatnot, it will take a while for this thing to come up. Almost all the compilations for your application depending on the size obviously, but almost all of them happen in the first 30 seconds, first minute, maybe first two minutes. Then rarely sprinkled in a few compilations but the majority is done at the very beginning. At that point in time, your application is not even using all the Java heap memory because it’s not even fully up yet. If you live in a microservice world, it has to connect to a trillion of other microservices network connections, maybe it does a little bit of a warmup loop, and then at that point, everything is compiled and then you are ready to accept production requests.

The whole thing with that Graal is using up your Java heap memory is not really true. There’s actually an advantage here and I was arguing with the Oracle people that LIP Graal takes this advantage away. In a cloud world, and we all live in a cloud world today, you’re basically paying for memory. You know about metadata, there’s some memory on the side you have to reserve. If you have an eight-gigabyte cloud container, you cannot make your Java heap eight gigs, you all know that, because there’s some additional native memory on the site that JVM needs. One of that, and you never paid attention to this, is that JIT compilers need memory to compile stuff.

If you’re not leaving out enough memory on the side, you could actually get your container killed because the compiler uses too much memory. It’s the drastic one I’ve ever seen. This is not normal but the craziest compilation we’ve ever seen at Oracle was a C2 compilation that used one gigabyte of memory because it was huge, it did a lot of loop optimizations and that makes compiler graphs really big. That’s, as I said, not normal. The normal size of compilations to memory allocation is like 10 megabytes or 20-30, big ones may be 100, but you need to reserve that memory on the site. If you’re not using C2, you could technically take that memory and give it to the Java heap because then you have a little bit more Java heap and at the very beginning Graal would use it and then later you have more memory for your application. That’s an advantage but it will go away, I think.

What did we learn? Graal uses Java heap memory to do the compilations. There is no heap isolation yet. That’s the LIP Graal thing I was talking about. Most Graal memory usage is during startup. As I said, most compilations happening at the beginning when your application is not fully up yet. Remember the memory is used anyway. It’s either malloc memory or on the Java heap, but you can’t escape it. We need the memory to do the compilations.

Build Graal Demo

What I will do is, I will clone my local thing, we clone Graal. Then we need something called MX. It’s a script that Oracle Labs is using for everything. It can clone the code, it can build it, it can run the test, it creates configuration files for your ID, it does everything. I don’t even know how big it is now, it’s over 10,000 lines of Python code. I want to throw up on stage. I complain so many times about this, but it’s not going away. If you want to run Bleeding Edge, you need this.

I would need to put MX on my path because you have to run it. This is the only time where I’m cheating because I’m running out of time. I have MX already in my path. I’m going to unset Java tool options really quick so we’re not getting all this output. Then the only thing you have to do is MX build. I set Java home, it’s JDK 11. I think it even prints it that it’s picking that up and now it’s building it. I can’t pull the latest version, but I pulled this one before I came to QCon in my hotel room. It’s from today, it’s the latest thing.

Let’s wait for this, it takes a few minutes. Let me show you in the meantime, Xlog:class+loads, and the benchmark, and then the Aurora thing again, and we grab for GraalVM. As you can see, when we run this and we log all the class load and that’s going on, we see all these Graal classes being loaded. They’re being loaded from this thing. JRT is like the file system that the JDK uses internally to load files from modules. It loads the codes, the Graal code from this module, which we’ve seen before. jdk.internal.vm.compiler, that’s the one that’s being shipped with OpenJDK.

We are waiting for this compiler to finish over here, it’s compiling a bunch of stuff. You can also see that it’s compiling Truffle. We only want a thing that’s called Archiving Graal. Then GraalVM Graal compiler. MX VM basically just runs the VM with the Graal version we just built. We can do mx vm- version here. It should work because I think it’s done compiling. It prints some stuff and then I can do a verbose, prints a bunch of more stuff. Ignore the stuff at the top, this is the important line that’s using the JDK 11. Then as you can see, it throws in things here, upgrade module path to this guy which is basically a modular Truffle which was built. That’s the graphing we just built here on the left.

Now, we are running with the latest version of Graal. If we do this guy with MX VM and Graal, the Graal class files, they are now being loaded from a file called Graal.jar. You can actually see, it’s picking up the latest Graal version we just built and it runs the thing. Let me remove the logging here and run three iterations of that guy. Every time I do this on stage, I hope that it’s actually faster than it was before, but it never is because the Aurora is not. There’s not much we can do about it anymore. You see it’s five seconds, the first one and then 5.2, 5.6. That’s where we had earlier as well.

That’s how easy you can get Graal, build it, and then play around with it. You can use MX to create the ID configuration, load it up in your favorite ID – I hope it’s Eclipse – then you can play around. Do whatever. It’s so easy because it’s Java. You change something, you save, and then you just run with it. You don’t even have to recompile your JDK. It’s amazing.

Scala Demo

That was the demo. I cannot do this production demo because A, I’m running out of time and B, it doesn’t build, so I’ll ignore that. I want to do a Scala demo though.

For this one we’re going back here and then we are setting the Java tool option thing again, but we are increasing our heap size to two gigabytes. The reason for this is, otherwise, we have too many GCs. Let’s do a benchmark run of default size and we do two iterations, that should be enough, of a benchmark called factorie. Let me run this over here with C2, then we do the same here and we wait.

Scala bench. This is the benchmark file I’ve been using all the time. There’s a page with all the benchmarks it has and there’s this one called factorie, that’s the one we are running right now. It says, “That’s a toolkit for deployable probabilistic modeling to extract topics using Latent Dirichlet Allocation. I have no fucking clue what it is. It’s a very good use case to show you what I want to show you. I gave this presentation one time and after my talk, a guy came up to me and said, “We use LDA at my company.” I said, “What?” This is LDA, the Latent thingy, they are actually using it. I don’t know why, really doesn’t matter. I forgot something very important. We have to lock GC, otherwise, we don’t see what the hell is going on.

You can see it’s allocating roughly 600. It grows a bit in size, but 650, 680 megabytes per GC and we do 2 iterations because the first one builds up some internal data structures and then the second one is a little more closer to the truth. It took 20 seconds to run that benchmark, let me start this over here while I talk xlog:gc. It took 20 seconds to do this benchmark iteration and we had roughly – you can see 37 and 64, so not quite 30 GC cycles. Oh, damn it. The heap size is too small.

Let’s wait for this guy. As you can see, the first iteration with C2 over here took 22 or 23 seconds and it did this many GCs. You can already see the first iteration over here was much quicker and did less GCs. Let’s wait for the second iteration. This one took 20 seconds to run and this one took 16 and we did 32 or 53 – only 20 GCs. I’m not actually sure what is happening, it usually cuts GCs in half. The reason for this is the way Graal compiles code. It has something called escape analysis. That’s an optimization, I’m not going to explain it now, it has a better escape analysis implementation than C2. Escape analysis can get rid of temporary object allocations. This benchmark is written in Scala and Scala allocates all these temporary objects all the time. Graal, the compiler can very well optimize that code. That’s also why we at Twitter, save so much money because pretty much all of our microservices except a handful, are written in Scala.

We don’t do as much GCs that reduces GC cycles, user CPU time, we reduce CPU utilization with that. That’s what I wanted to show you. This benchmark is not really representative. It’s the best one that’s out there but I picked it because I want to look cool on stage. I’m not picking one that’s 1% better.


The summary of this talk it’s the summary of all of my talks and I know I’m over time, but it’s very simple. That’s all, I want you to try it. This is all you have to do if you have JDK 10 or later, which I know is an issue because we at Twitter, we also run on eight. You can download an 8 version from Oracle which is GraalVM basically. If you download GraalVM, the community edition, it’s what I showed you today because the community edition has the Graal version that’s on GitHub as open source. If you’re on 8, download that one, please try it. Please let us know how it works. If it works better for you, I’m very happy for you. Tweet it out. If it crashes, that would be amazing because I want you to find a bug. I want you to find the bug and not us, because if Twitter is down because of a compiler bug, my life is in hell. I’m here right now. Give it a shot, let us know how it works. If you run something and it’s faster, excellent, it’s great. If it’s slower, also let us know. What I said earlier, we would like to figure out what it is. Maybe we can do something about it.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Instana Pipeline Feedback for Release Performance

MMS Founder

August 15th the application performance management (APM) service Instana launched Pipeline Feedback for release performance tracking and analysis. Pipeline Feedback provides automatic tracking of application releases, feedback on release performance, and integration with Jenkins.

Instana products are designed to monitor cloud, container, and microservice applications. Its APM service features automatic discovery of services and tracing on every request through the application in order to detect and diagnose complex application performance issues. Pipeline Feedback is being released as part of Instana’s APM offering. Pipeline Feedback was designed to provide engineers greater visibility into software releases and quicker feedback on release quality. Instana correlates all application logging and metrics data alongside the release timing information into a centralized service dashboard.

As part of APM, Pipeline Feedback automatically deploys monitoring to each part of the application and traces all application requests. The tool detects changes in real-time and updates its metrics and reports to reflect any performance impact. Pipeline Feedback introduces the concept of Release Markers to represent a single release. While releases reflect any change to application code or infrastructure, a Release Marker represents when changes are packaged and made generally available. Release Markers are indicated by users via the APM API and can be customized to reflect every code change or a group of changes as a release.

The time of releases are correlated with events and issues to detect potential problems with a release. With the Pipeline Feedback Dashboard, users can review the health of a release and also access release notifications. 

Pipeline Feedback Dashboard from the Instana Blog

Another feature introduced with Pipeline Feedback is the Release Health Indicator, which monitors potential impact a release had on the overall application health. This metric is visible in the Instana Incidents Dashboard. Instana Incidents detect when edge services and critical infrastructure have become unhealthy based on when a Key Performance Indicator (KPI) has been breached. Instana automatically monitors service KPIs for load, latency, and errors.

Pipeline Feedback provides an integration for Jenkins, and open source automation server commonly used for CI/CD pipelines and release management. The Pipeline Feedback integration is a Jenkins plugin that sends release data to the Instana backend. Data can also be sent via an API call. The integration enables Jenkins releases to be annotated on Instana APM graphs so users can plot releases alongside performance data.

Several application monitoring tools provide similar APM and pipeline health visibility features. AppDynamics Continuous Delivery tools automatically track and monitor applications and provide alerts and monitoring for releases and key business transactions. New Relic’s APM service offers a deployment dashboard that lists recent deployments and their impact on end users, response times, throughput, and errors

Instana offers a free trial to learn more about APM and Pipeline Feedback.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Tech Talk with Rowel Atienza: Is Keras the Perfect Deep Learning Library in Python?

MMS Founder

Keras [Chollet, François. “Keras (2015).” (2017)] is a popular deep learning library with over 250,000 developers at the time of writing, and over 600 active contributors. This library is dedicated to accelerating the implementation of deep learning models. This makes Keras ideal when we want to be practical and hands-on.

Keras enables us to build and train models efficiently. In the library, layers are connected to one another like pieces of Lego, resulting in a model that is clean and easy to understand. Model training is straightforward requiring only data, a number of epochs of training, and metrics to monitor. The end result is that most deep learning models can be implemented with a significantly smaller number of lines of code.

Advanced Deep Learning with Keras is a comprehensive guide to the advanced deep learning techniques available today, so you can create your own cutting-edge AI. In this book, Professor Atienza strikes a balance between advanced concepts in deep learning and practical implementations with Keras.

Let’s find out why Rowel thinks it’s the perfect deep learning Library in Python:

  1. What is Keras? Why should one prefer Keras over other libraries in Python? Why do you think Keras is the perfect deep learning library?

    Rowel Atienza: Keras provides APIs for rapidly building, training, validating and deploying deep learning algorithms. It is suitable for someone who is starting in the field and also for advanced users. Keras is characterized by ease of use yet flexible enough to build complex networks especially with TensorFlow. Its tight integration with TensorFlow makes it a good choice for deep learning projects that could be deployed on production scale operations.

  2. What are the future trends of Keras and how it is going to affect deep learning and AI in the long run? Where do you see the future of Keras with the release of TensorFlow 2.0?

    RA: Beginning with TF2.0, Keras has become the primary front-end API of TensorFlow. This means that Keras will be actively developed and used in the immediate future. With the combined users base of Keras and TensorFlow, we can expect a good number of projects will be developed in Keras.

  3. What are the key challenges in adoption of Keras in Deep Learning? 

    RA: Keras should attract more contributors in its library. It should act fast in adapting rapid advances in different subfields of deep learning. For example, the best implementations of graph neural networks are written in PyTorch. If PyTorch keeps on attracting serious contributors to implement new developments in deep learning, Keras and TensorFlow will lose their competitiveness.

  4. What are your views on the statement “Keras is an essential tool in the toolbox of a data scientist working with neural networks”?

    RA: In Kaggle, a good percentage of solutions are implemented in Keras. Data scientists find the appeal of a tool that helps them rapidly build, train and validate neural networks. Keras fits these requirements. Keras will remain in every data scientist’s toolbox.

  5. How do you combine multiple models into a single Keras model?

    RA: Keras is like pieces of lego. It could be modular at the layer or model level. As long as the input and output tensors of layer or model fit, they can be easily combined.

  6. Which approach would you prefer, the design and implementation of a machine learning algorithm versus modifying an already-prepared one?

    RA: I always prefer to build machine learning algorithms from what I think are suitable for the problem. Understanding the problem and finding the right solution is more important than just plugging in a start of the art solution to the problem without thinking how it works. The key is understanding the problem and the possible solution regardless whether it is built from the ground up or by modifying an existing one.

About the Book

Advanced Deep Learning with Keras is a comprehensive guide to the advanced deep learning techniques available today, so you can create your own cutting-edge AI. Using Keras as an open-source deep learning library, you’ll find hands-on projects throughout that show you how to create more effective AI with the latest techniques.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.