October 2023 - Mobile Monitoring Solutions

Uncategorized

Presentation: Backends in Dart

MMS • Chris Swan

Article originally posted on InfoQ. Visit InfoQ

Transcript

Swan: I’m Chris Swan. Welcome to my talk on Backends in Dart. This was the final picture taken by the vehicle before it crashed into the asteroid. I wish this was a talk about the double asteroid redirection test. That would be super cool. It’d be even extra cool if that project had happened to use the Dart programming language. Dart is a client-optimized language for fast apps on any platform. At least that’s what it says at dart.dev. Today is going to be a story about the three bears. We start off with the big bear, and he’s wearing the t-shirt for JIT virtual machine. The big bear has got a lot of capabilities, can do a lot of stuff, but tends to use a bit more resource as a result of that. Then we have the little bear. The little bear is ahead-of-time compilation. Little bear is small and nimble and light, but misses some of the capabilities that we saw with big bear. Then, of course, lastly, we have the medium-sized bear. Medium-sized bear is representing jit-snapshot. This is something that the Dart compiler can produce, which should give us those best of both world qualities. Quick startup time, but without some of the overheads of running a full virtual machine. The question is, does that bear actually give us the perfect recipe for running our Dart applications?

I’m going to talk about what I do. This is the job description I wrote for myself when I joined Atsign, about a year and a half ago. Part one, improve the quality of documentation, samples and examples, so that atSigns are the easiest way for developers to build apps that have privacy preserving identity. Part two is to build continuous delivery pipelines as a first step towards progressive delivery and chaos engineering, so that we can confidently deploy high quality, well-tested software. Then part three is to lay the foundations for an engineering organization that’s optimized for high productivity by minimizing cognitive load on the teams and the people working in those teams.

Outline

I’m going to start out talking about why Dart, both from an industry perspective and from a company perspective. Then take a look at some of the language features, so that if you’re not familiar with Dart, you’ve at least got an overview of what it’s generally capable of. I’m going to spend a bunch of time looking at what’s involved in just-in-time runtime versus the ahead-of-time compilation, and the various tradeoffs between those. Zoom in to some of the considerations that are put in place running Dart in containers. Take a look at the profiling and performance management capabilities that are offered by the Dart virtual machine. Then try and answer that question about the middle bear. Is the middle way possible with the jit-snapshot?

Why Dart (Industry Big Picture)?

Why Dart, and look at the industry big picture. This is a snapshot from RedMonk’s top 20 programming languages. They take a bunch of data from GitHub and Stack Overflow in order to figure out which are the languages that people are using and talking about and struggling with. From that, the top 20 is assembled, and there’s a bit of discussion in each successive report every 6 months or so, about why particular languages are moving up and down in this environment. Dart, when I started using it was just outside of the top 20 and seemed to be rising fast. Now it’s positioned equal 19th along with Rust. That’s largely come about due to the adoption of Flutter. Flutter is a cross-platform frontend framework that’s come out of Google. Dart is the underlying language that’s used for writing Flutter apps. It’s become an immensely popular way of building applications that can be build once, run anywhere. That’s been pulling Dart along with it. Dart’s also interesting for the things you can do across the full stack.

If I look at why we adopted Dart for Atsign, then I commonly say that we adopted Dart for many of the same reasons that RabbitMQ chose Erlang for their implementation of the AMQP protocol. Atsign is a personal datastore. What we’re doing is providing a mechanism for end-to-end encryption. As usual, we talk about Alice being able to have a secured conversation with Bob. In this illustration, we’re focusing on internet of things. Alice is a thing, wanting to send data from the thing to Bob using an application to get a view onto that data. We use personal datastores as a way of mediating the data flow and the network connectivity, and the sharing of keys. Alice and Bob each have their private keys, but they have their public keys in their personal datastores. The personal datastore acts as an Oracle for those. Then the network connectivity is through the personal datastores. Alice only needs to talk to their personal datastore, Bob only needs to talk to their personal datastore. The personal datastores can take care of talking to each other, in order to complete the end-to-end message path, which is encrypted all the way.

When we create a Dart ahead-of-time binary, that makes for a very small container that we can use to implement these personal datastores. Looking at the sizing here, on ARMv7, that container is as small as a 5-meg image. It’s only a little bit larger, going up to about 6 megs for an AMD64 image. That is the full implementation of the personal datastore, including everything it needs for end-to-end encryption, and the JSON key-value store, which is used to share data between Alice and Bob, or whatever else is trying to share data in an end-to-end encrypted manner. This is the infrastructure that we use. This sizing of infrastructure will support about 6500 atSigns. We’re using Docker Swarm to avoid the overhead of Kubernetes, for pod management there, and 3 manager nodes along with 18 worker nodes will comprise a cluster. One of those clusters will be able to support those thousands of atSigns. As we build out more atSigns for our customers, then we build out more clusters like this, in what I refer to as a coal truck model. As we’ve filled up one coal truck, we just pull the next one into the hopper and start filling that up with atSigns.

Language Features

Having a look at the language features then. Dart is a C-like language, and so Hello World in Dart looks pretty much like Hello World in C, or Java, or any of the other C derived languages. It’s got an Async/Await feature that’s very similar to JavaScript. In this example code, we’re making use of Await in order to eventually get back the createOrderMessage. In the example, there’s just a fixed delay being put in there in order to show that things are taking a little bit of time. In reality, that would normally be network delays, processing delays, database query delays, and whatever else. Its concurrency model is based upon the notion of isolates. Where an isolate is an independent worker that’s similar to threads but not sharing memory, so communicating only through messages. Since March last year, Dart has been implementing sound null safety. What this means is, by default, the types are not nullable. If I say that something is an integer, then it must actually be an integer and it can’t be null. However, if I want to make sure that I can still have a null case for an integer like type, then I can declare it with the question mark at the end. Then I’ll have a type which can be either null or an integer. Dart 3 is going to be coming out next year. A big part of the shift from the Dart 2 versions at the moment, to Dart 3 is going to be greater enforcement of sound null safety across everything that Dart’s doing.

Licensing for Dart itself is using the BSD-3 license, which is one of the OSI approved licenses. Generally speaking, the packages that go along with Dart tend to be licensed with BSD-3 as well, just about everything that we open sourced from atSign is using BSD-3. As we look at our dependencies elsewhere, they’re typically BSD-3 as well. The package manager for Dart and Flutter is called pub.dev. Going into that package manager, you can see a huge variety of packages supporting many typical things that you would want to do with code. As package managers go, I found it to be one of the best that I’ve worked with. It’s very recently got support in GitHub, so that Dependabot will identify updates to packages and automatically raise pull requests for those.

JIT vs. AOT

Moving on to just-in-time versus ahead-of-time. I’m going to illustrate this using this showplatform mini application, which is really just a slightly more useful Hello World. It’s importing from the dart:io package, two things, the platform module, and standard out. That then lets me have the ability to print out the platform version. This gives you a similar output to invoking Dart with –version, which will give you the platform version that it’s been built on and for. At the time that I was putting these slides together, the latest release of Dart was 2.18.2, at least the latest stable release. You can see here, I was running that on a Linux ARM64 virtual machine. With just-in-time, I can use dart run and just run it in the virtual machine. Here I’m preceding it with time so I can see how long these things take. If I just dart run showplatform.dart, I get the same output as before. Time took a little over 6.5 seconds. That 6.5 seconds is the time it’s taking to spin up the virtual machine for the virtual machine to bring in the code, turn it into bytecode, run the bytecode, produce its output and shut itself down. That’s quite a big cold start time. What I can do instead with Dart is use it to compile an ahead-of-time binary. This will result in the output of a native binary that I can run on that platform. Then I can time running the native binary and see how long that takes. We can see here that the native binary is taking a fraction under three-tenths of a second. It’s really quick in comparison to that 6.5 seconds of spinning up the VM. For things like Cloud Functions, AOT binaries are a good choice, because it’s getting past that cold start time.

Of course, there’s a tradeoff. In this case, the tradeoff is that compilation is slower than just running the application in the first place. If I time how long it takes me to compile that executable, then I can see that that was taking of the order of 20 seconds to create the executable. Generally speaking, that’s a tradeoff that’s going to be well worth taking. Because that time is going to be taken in a continuous delivery pipeline, where it’s not affecting the user experience in any way. The user experience is going to be the one of getting a quick response from the native binary.

Dart in Containers

Taking a brief look at Dart in containers. Here’s a typical Dockerfile for an ahead-of-time compiled Dart binary. I start off with saying FROM dart AS build. This is going to be a two-stage Dockerfile. In stage one, I’m basically doing all of the Dart stuff and compiling my binary. Then in stage two, I’m creating the much smaller environment that’s going to run the binary. Dart has for a while now been an official Docker image, which is why I can simply say FROM dart. Then, AS build means that later on, I can use that label as something to reference when I’m copying parts out of the build image into the runtime image. I then set up my working directory, I copy my source file, and then dart pub get will ensure that all of the dependencies are put in place. Then as before, dart compile exe is going to create a native binary. The second stage here of the build is beginning with, FROM scratch. That’s an empty container, so absolutely minimal. Dart is a dynamically linked language. I think that’s one of the more fundamental philosophical decisions that has been made about Dart. Hence, there’s a need to copy in some runtime dependencies. Dart dynamically links to libc components, and the runtime directory in the Dart Docker image contains everything that’s needed from a dependency perspective to run those ahead-of-time compiled binaries. With runtime in place, I can then copy the binary itself into the container, and I can then define the ENTRYPOINT into that for the container image. The image size for that dartshowplatform trivial application is pretty small. It comes out at just over 3 megs for ARMv7, and about 4.5 megs for AMD64. As I mentioned earlier on, even for something non-trivial like an atSign, then the images can still be incredibly small. The atSign image for AMD64 which is the architecture we most commonly use to run it is just under 6 megs there.

Profiling and Performance Management

One of the nice things about Dart is it’s got a whole bunch of profiling and performance management capabilities built into the tool chain. Where with other languages, we might need to buy sometimes expensive third-party tools in order to get that profiling and performance management data, that’s all there built in with Dart. I think that’s often very important from a developer accessibility point of view. Because if there’s a hurdle of having to pay for tools, then very often developers will then be deprived of it, or have to do a bunch of extra work to get the tools that they need. This chart explores the range of Dart DevTools that are on offer. You can see here that that is a most complete for Flutter applications running on mobile or desktop environments. Flutter can also cross-compile to JavaScript for the web. In doing so, it’s then divorcing itself from the Dart runtime environment, and so there’s much less available there in the way of profiling. The same goes for other web approaches. Then Dart command line applications of the type that I’ve been using examples in which we might use to implement APIs to be the backend services that we’re using for full stack Dart applications, have everything except for Flutter inspector. Flutter inspector is a Dart DevTool that’s specific to building Flutter applications. Dart command line applications get everything else. That’s quite a good range of tools that are on offer.

Here’s a look at the memory view. Very often, we’re concerned about how memory is being used, and what’s happening in terms of garbage collection cycles, how frequently they’re running and how long they’re taking. As this illustration shows, we can connect to a running Dart application in its virtual machine and get a really comprehensive profile of the memory utilization and garbage collection cycles using the memory view. Similarly, Flame Charts give us a performance understanding of where an application is spending its time. That allows us to focus effort on some of the more timely aspects of the stack, and maybe some of the underlying dependencies and how those are being used. There’s a little bit of a catch here, DevTools needs to connect to a virtual machine. DevTools are for just-in-time compiled applications only. I reached out to one of the Dart engineers to ask if there was any way of tracking down memory leaks for ahead-of-time Dart apps running in containers. His response was, essentially, “Yes, we got that for Flutter.” All of the pieces are there, but right now, that’s not part of the SDK offering. You’d have to compile your own SDK in order to make use of that.

A Middle Way with JIT-Snapshot?

Is there a middle way then with jit-snapshot? Let’s have a look at what’s involved with that. Just as I can dart compile exe, I can just parse in a different parameter here and say, dart compile jit-snapshot with my application. Then once that compilation process has taken place, I can run the jit-snapshot in the Dart virtual machine. What we see here is, having made a jit-snapshot, the Dart startup time is much quicker than before. Before it was 6.5 seconds, we’re now under 1.8 seconds in order to run that Dart application. We haven’t completely gotten past the cold start problem but we’re a good way there. I’d note that for a non-trivial application, really what’s needed here is some training mode. That would be so that the application goes through the things that you would want to be taking a snapshot of, but then cleanly exits, so that the compiler knows that that’s the time to complete the jit-snapshot, and close that out.

There’s a bit more involved in putting a jit-snapshot into a container than there was with an AOT binary. The first stage of the Docker build here goes pretty much as before, except we’re compiling a jit-snapshot instead of an executable. In the second stage, as well as those runtime dependencies, we also then need the Dart virtual machine. Not just the single executable for the Dart virtual machine, but the complete environment of the machine and its libraries and snapshots that comes with it. That results in a pretty substantially larger than before container image. With that in hand, what we can then do is define an ENTRYPOINT where we’re parsing in that observed parameter, and that allows us to have the Dart DevTools connecting into that container and able to make use of the observability that’s on offer there, rather than before having to fly blind with the AOT binaries.

In terms of, are we in the Goldilocks Zone here? Let’s compare the container image size. A secondary is the name that we gave to the implementation of an atSign. The previous observable secondary that we use with a complete Dart SDK inside of it was weighing in at 1.25 gigs. Now compare that to the uncompressed image for a production AOT binary at only 14.5 megs. It’s a huge difference. My hope had been that the jit-snapshot would give us images, that would be of the order of a 10th of the size of the full SDK. They’re not that small. I think I could have maybe squeezed them a little bit more, but they’re coming in at about a third of the size. In the end, that’s not the main consideration for these. Running a larger state of atSigns, we care about resource usage. To get 6500 atSigns on to one of those Swarms I illustrated earlier, that’s primarily dependent upon the memory footprint of the running atSigns. If I look at resource utilization here at runtime, then the AOT binary in a quiescent state is using about 8.5 megs of RAM, whereas the observable secondary using jit-snapshot is using more than 10 times that. The practical consequence of that is my cluster that’s presently able to support 6500 atSigns would only be able to support about 650, if I move them all over to being observable, by making use of the jit-snapshot.

In this particular case, the jit-snapshot wasn’t the Goldilocks Zone I was looking for. It’s a vast improvement over the approach that we’ve been taking before, but still not quite there in terms of giving us the ability to make all of the environment observable, which would have been a desirable feature. I think this might be possible if I was using a memory bubbling virtualized environment, because a lot of these containers are going to have essentially the same contents. You need a really smart memory manager in order to be able to determine the duplicate blocks, and work around those in terms of how it’s allocating the memory. This was the picture of the Dart vehicle crashing into the asteroid. My exploration of jit-snapshot has crashed into reality in terms of the memory overhead is still far much more than I would want it to be to make widespread use of it.

Review

It brings us around to a review of what I’ve been going through here. I started out talking about why Dart, and looked at how Dart is becoming much more popular in the industry, as especially people use it to write Flutter applications. I think that pushes towards the use of full stack Dart so that people aren’t having to use multiple languages in order to implement all of their application. The language features of Dart are pretty much those that people would expect from a modern language, especially in terms of the way it deals with asynchronicity and its capabilities for concurrency. I spent some time examining the tradeoff between just-in-time and ahead-of-time. Just-in-time is able to offer really good profiling and performance management tools through Dart DevTools. We’ll also be able to optimize code better over a long duration, but it comes with that cold start problem that we’re all familiar with from languages like Java. Ahead-of-time is what we have been using in order to have quickly starting small applications that are quick to deploy.

Getting Dart into containers is pretty straightforward, given that it’s now an official image. That official image also contains that minimal runtime environment that we need to put underneath any AOT binaries that we make. A pretty straightforward two-stage build allows us to construct really small containers with those binaries. When we use AOT, we miss out on some of the great profiling and performance management tools that Dart has on offer. This led me to exploring a middle way with jit-snapshot. Jit-snapshot is where we ask the compiler to start an application, to spend some of its time doing JIT optimizations. Then, essentially, take a snapshot of what it’s got so that we can launch back into that point later on, and get past some of the cold start overhead. This did minimize resource utilization somewhat, but not enough to be a direct replacement for the ahead-of-time binaries that we’ve previously been using.

Call to Action: Try Dart

I’d like to finish with a call to action, and suggest, try Dart yourself. There’s a really great set of resources called Codelabs. One of those is an intro to Dart for Java developers. If you’re already knowledgeable about Java, and I expect many people will be, then that’s a great way of essentially cross training. Then the Dart cheat sheet is a really good run-through of all of the things that make Dart different from some of the other languages that you may have come across. When I was going through this myself a few months back, it was a really good illustration of how code can be powerful but concise and easy to understand, rather than just screen after screen of boilerplate. Try those out at dart.dev/codelabs.

Questions and Answers

Montgomery: Is there anything that you’d like to update us on, like any changes to Dart, or to things that you’ve noticed since you put the material together originally?

Swan: As I was watching that playback through, I noticed I was using Dart 2.18.2 at the time. Nothing too major has happened since. Dart 2.18.5 came out, but that’s just been a set of patches. I might have mentioned RISC-V support coming in Dart. That had been appearing in the beta channel on the Dart SDK download page. In fact, one of my colleagues has been doing a whole bunch of work with RISC-V single-board computers based on that. I went to do some stuff myself. I was actually having a view on trying to get the RISC-V Docker support knocked into shape. It turns out that it shouldn’t have been in the beta channel, it’s still in dev. There’s still not a test infrastructure regularly running around that. It’s going to be a little longer for RISC-V support than maybe people were thinking. Then, also, as I was embarking on that adventure, things were a little bit dicey with upstream dependencies as well. The Dart Docker image is based on Debian. Debian hasn’t yet actually released their RISC-V full support. You can’t get RISC-V images from Debian itself, and there’s no Docker RISC-V images yet. Ubuntu are a little ahead of them there. I can’t see folk changing track at the moment from Debian to Ubuntu in order to support that.

The big news coming is Dart 3. That’s going to be more null safety. Null safety was released last March with the 2.12 version. That ended up being much more of a breaking change than I think was anticipated. Although there was a mechanism there that you could continue compiling things without sound null safety, and you could continue casting variables without null safety. I think as people were upgrading their Pub packages and stuff, it was a much more disruptive change. Maybe that should have been Dart 3, and then we’d be talking about Dart 4, because what’s coming next is essentially enforced null safety. We can think of Dart 2.12 to now, and that will take us through 2.19, and maybe 2.20, as being easing into null safety. Then with Dart 3, it’s null safety all the way. Yes, that should result in faster code, it should result in safer code. Obviously, it’s something of an upheaval to get there. This transitionary period should have helped. It also means that they’re acknowledging there’s a whole bunch of stuff, won’t carry on working with that, and we’ll be leaving behind the world of unsafe nulls.

Montgomery: It will also probably shake out a lot of, not only usage, but I do know that a little bit of when you’re having an environment and you’re changing, and you’re enforcing that, you’re also adopting best practices to support that. That will be very interesting to see how that shakes out. It’s just like heavy use of Optional within lots of languages, has always introduced a fun ride for the users and maintainers. I know C# went through a very interesting time. Java with Optional is going through, has in various places been very interesting to adopt. I anticipate with Dart, it will come out the other side and be very interesting.

For which type of application do you recommend using Dart?

Swan: The primary use case that people have for Dart at the moment is building Flutter applications. Flutter being, as I said, the cross-platform development framework that’s come out of Google. Especially mobile apps that have iOS and Android from one code base, Flutter is just great for that. Dart’s the natural language to do it. I think the place where full stack Dart then becomes most attractive is you’ve written your frontend in Flutter, you’ve had to learn Dart to do that, why then use another language for the backend? For a team that’s then using Dart as a matter of habit for their frontend, it really makes sense to be able to do that on the backend. There’s a bunch of accommodations that have been put in place. There’s a functions framework that can be used in order to build functions that are going to run in Cloud Run. That’s offering a bit of help. I think beyond that, we’ve found Dart to be a great language for implementing a protocol. There’s aspects of Dart that I feel are quite Erlang-ish. Erlang being the language out of Ericsson, to run on routers, and again, very much a language for implementing protocols. If we look at how Erlang became popularized over the last decade or so, the main thing with that was RabbitMQ. I think Rabbit on its own led to massive proliferation of Erlang VMs, and got a lot of people to be Erlang curious. The same things that made Erlang good for AMQP make Dart good for implementing our protocol. I would extend that to other protocols.

Then, really, we get back to this lightweight AOT thing. I think Dart is probably neck and neck with languages like Go for being able to create these very compact Docker images that have got the functionality inside of them. I was having a conversation with a customer, and the question was asked, how do you secure your Docker images? There’s no OS in there. It’s giving you the opportunity to get rid of a whole bunch of security surface area, by not having to have the operating system in there. Of course, there’s other languages that can get you either a statically compiled binary, or a dynamically compiled binary and a minimum set of libraries that you need to slide in underneath that. Then into the tradeoffs around expressiveness and package manager support and things like that.

We’ve also found it pretty good on IoT. We’ve ourselves for demos and stuff been building IoT applications for devices, so system-on-chip devices, not MCU devices, because it does need Linux underneath it. In that case, we can do interfacing to things like I2C, and then go straight up the stack into our protocol implementation and onto the wire with that. That’s been an easier journey than we might have anticipated. Because things like the I2C driver implementation is just there as a pub.dev package. We didn’t have to go and write that ourselves.

Montgomery: I think one of the things that has stood out to me is some of the things that Dart does do that is quite different for most frontend languages, that actually does lend itself pretty well.

What are the advantages of Dart compared to TypeScript? Considering the fame of TypeScript being one of the go-to frontend languages now that’s used.

Swan: I think there’s frontend and front end. I didn’t talk about Flutter on the web. Maybe this just gives me an opportunity to briefly touch on that. Dart, when it was originally conceived, Google thought that they were going to put a Dart VM into Chrome, and people would run Dart inside the browser. That didn’t end up happening. History went off a different leg of the fork. I think one way we can think of frontend these days is anything where we’re presenting the user interface of an application. The natural language of the web is JavaScript. Then TypeScript has clearly improved upon that by giving us types and type safety in JavaScript. The natural language of devices is Swift or Kotlin. Then we’ve got all sorts of cross-platform frameworks, Flutter being one of them, that allow us to write once and run on those. There’s also Flutter web. What happens with that is Dart actually transpiles into JavaScript. We can take a single code base and not just have Android, iOS, and macOS, Linux, and Windows, but we can also have web applications from that as well. I think the advantages versus TypeScript, taking you into considerations around a title defined set of packages, and some of the idioms around those. Also, it’s not just JavaScript with types bolted on, but a language where types and now null safety have been baked in from the very beginning. It’s much more stronger leaning in that direction. The result is you can still get stuff that runs on the web out of it. It’s not a big thing for my own work, because key management inside of browsers is a nightmare. There’s lots of other applications for it.

Montgomery: Can Dart be used for serverless functions? I don’t know if there’s any AWS support, or any other cloud provider supporting it.

Swan: If we’re talking about Lambda and Google Cloud Functions and Azure Functions, none of those have native support for Dart. Interestingly, even Google doesn’t have native support for Dart in their own functions offering. However, this is where the functions framework comes into play, because the functions framework is all about helping you build functions that you’re going to deploy as containers. Of course, all of those clouds now have mechanisms for you to deploy your function in a container, and route traffic through to that and have the on-demand calling and stuff like that. You’re going to be getting some of the advantage I was talking about there, because, generally speaking, you’d use an AOT compiled binary in that container, which is going to have super swift spin-up. You’re not then worrying about VM startup as you might be with a language like Java, which people also use for serverless.

In terms of support from AWS, AWS doesn’t have native support for lambda. AWS in another part of AWS with Amplify, has now got a really strong Dart and Flutter team. Some of those guys, GDE, so Google Developer Experts, I keep an eye on what they’re up to, and of course, they were all at re:Invent. Dart and Flutter has been embraced to that extent, by AWS. I’ve also seen both AWS and Azure people showing up in contexts where it’s a conversation about what we’re here for, Dart on the backend. Because if people are going to be writing Dart on the backend, they don’t want them to be just defaulting to Google, because it’s a Google language. They want to be saying, our serverless implementations or Kubernetes implementations are going to be just as good at running your Dart code, when it’s packaged up in that manner.

Montgomery: Maybe take a look and see what gets announced out of re:Invent. You never know what may happen.

Swan: Some pre:Invent stuff came out from the AWS Amplify team about new stuff that they’re doing with Dart. They’ve put together a strong team there, they’re really active, and showing up to not just AWS events, but community events. There was a Flutter community event called FlutterVikings, and there was a whole stand of AWS people there, in the mix alongside a lot of the Googlers.

See more presentations with transcripts

Uncategorized

MongoDB CPO Sahir Azam on data sovereignty, empowering devs – The Stack

MMS • RSS

Posted on mongodb google news. Visit mongodb google news

For a long time the ability to build powerful AI applications was largely confined to a small circle of advanced, well-resourced teams.

Yet unleashing AI’s potential requires democratised access to tools that empower a much wider pool of developers to build generative AI apps – think of examples like improving customer support, where appalling user journeys are rife, or making it easier to tackle technical debt.

This is increasingly a reality thanks to technology from innovative companies such as MongoDB, whose multi-cloud Atlas database service provides a scalable, secure and highly flexible platform to build on – and intelligent, targeted use of AI itself is making that possible.

At the MongoDB.local London event – one of 29 developer user conferences that MongoDB is hosting around the world this year – AI was centre-stage, as MongoDB announced capabilities that will let everyone from small start-ups to large enterprises, build AI into their applications.

That included a powerful new vector search engine it is calling Atlas Vector Search, which lets users search unstructured data and create vector embeddings with machine learning models like OpenAI and Hugging Face, then store them in Atlas for retrieval augmented generation (RAG), semantic search, recommendation engines, and dynamic personalization.

MongoDB CPO Sahir Azam emphasises fundamentals

Yet speaking in a keynote, Chief Product Officer Sahir Azam took his time to get to AI: MongoDB has been focused on fundamentals too.

That includes delivering significant improvements to working with time series data, especially demanding, high-volume datasets of all shapes with MongoDB 7.0; new algorithms to dynamically adjust the maximum number of concurrent storage engine transactions (including both read and write tickets) to optimise database throughput; delivering encryption and security capabilities that reassure CIOs and CISOs at the most demanding levels, and ensuring its managed database can play quite literally anywhere – including at the most edgy of Edge locations.

Unlocking AI capabilities

Yet AI matters too and MongoDB is innovating hard here and using generative AI itself to further democratise access for developers. A simple example: Converting source data into vectors and enabling developers to easily search based on semantics, such as “give me images that look like…”, for example unlocks vast capabilities without the need to query or transform the data itself, learn a new stack or syntax, or manage a whole new set of infrastructure; reducing friction for developers as they build.

MongoDB CPO: “An AI application still needs resiliency, scalability…”

“It’s easy to be swept up in the hype of AI but, as a customer recently said to me, an AI application is still an application, which means it still needs resiliency, scalability, performance, flexibility and the underlying operational capability,” Sahir Azam, CPO at MongoDB, told The Stack.

“Already we have seen over 500 companies choose MongoDB as a core database for the foundation of their AI applications,” he added.

“What we’ve done now is extend that capability to support new abilities to store and manage data, such as vectors in our system, which allows you to search for different objects based on their characteristics more seamlessly. We haven’t simply bolted on a separate vector search database like you see in other environments, but rather brought together the flexibility of MongoDB’s document model with the capabilities of vector search in a single engine that people are familiar with. This will really make it so much easier to add AI capabilities to an application.

“For me, this capability is very exciting,” he adds.

“I’ve very rarely seen so much industry traction on a big platform shift like this. Fifteen years ago or so there was the whole cloud and public cloud shift. Now AI is top of mind and the ability for us to build technology that makes it easier for the average developer or team to create these really sophisticated applications is very exciting.”

While trendy AI advancements might dominate the headlines, there were also plenty of other announcements at MongoDB.local London to appeal to the more bread-and-butter needs of the developer community.

Greater speed, for instance, is constantly craved by developers to streamline their tasks, so many cheered the news that MongoDB is now using generative AI technology to enable developers to query data in natural language. When asked questions, simple or complex, in plain English, MongoDB’s GUI (Compass) will automatically generate the corresponding query in MongoDB query language MQL. By reducing friction in the development process and negating the need to learn MongoDB’s query language, this feature will enable all developers to iterate and build faster.

Also popular was the news that SQL query conversion in Relational Migrator can now convert queries and stored procedures to MongoDB query language at scale, seamlessly shifting resources from query creation to review and implementation. Converting queries and application code is one of the most difficult and time-intensive steps when migrating from using a relationship database to NoSQL, so this development was warmly received among developers.

“Always a key investment”

“Requirements among our developer community are always increasing. People are building more sophisticated apps that need to be more secure and serve more and more users with faster and faster performance,” said Azam. “That’s why improving the core MongoDB platform, making it fit for the next generation of applications, is always a key investment for us.

“There are new capabilities that are part of the platform, whether it be vector or relevance-based search or time series, but oftentimes it’s hard to unlock these new capabilities because you’re stuck in legacy relational database systems. Our relational migrator helps you modernise your applications off of traditional databases by modernising the data model, migrating the data and, now, automatically converting legacy SQL queries into the MongoDB query language, which removes the barrier to entry and adds to the ease at which organisations can adopt MongoDB.”

Produced in partnership with MongoDB

Article originally posted on mongodb google news. Visit mongodb google news

Uncategorized

AWS Adds New Code Generation Models to Amazon SageMaker JumpStart

MMS • Anthony Alford

Article originally posted on InfoQ. Visit InfoQ

AWS recently announced the availability of two new foundation models in Amazon SageMaker JumpStart: Code Llama and Mistral 7B. These models can be deployed with one click to provide AWS users with private inference endpoints for code generation tasks.

Code Llama is a fine-tuned version of Meta’s Llama 2 foundation model and carries the same license. It is available in three variants: base, Python, and Instruct; and each has three model sizes: 7B, 13B, and 34B parameters; for a total of nine options. Besides code generation, it can also perform code infilling, and the Instruct models can follow natural language instructions in a chat format. Mistral 7B is a seven billion parameter large language model (LLM) that is available under the Apache 2.0 license. There are two variants of Mistral 7B: base and Instruct. In addition to code generation, with performance that “approaches” that of Code Llama 7B, Mistral 7B is also a general purpose text generation model and outperforms the larger Llama 2 13B foundation model on all NLP benchmarks. According to AWS:

Today, we are excited to announce Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. Code Llama is free for research and commercial use. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML.

The Mistral 7B models support a context length of up to 8k tokens. This long context can be used for “few shot” in-context learning in tasks such as question answering, or for maintaining a chat history. The Instruct variants support a special format for multi-turn prompting:

[INST] {user_prompt_0} [/INST] {assistant_response_0} [INST] {user_prompt_1} [/INST]

The various sizes of Code Llama models support different context lengths: 10k, 32k, 48k respectively; however, the 7B models only support 10k tokens on ml.g5.2xlarge instance types. All models can perform code generation, but only the 7B and 13B models can perform code infilling. This task prompts the model with a code prefix and code suffix, and the model generates code to put between them. There are special input tokens,

, , and  to mark their locations in the prompt. The model can accept these pieces in one of two different orderings: suffix-prefix-middle (SPM) and prefix-suffix-middle (PSM). Meta’s paper on Code Llama recommends using PSM when “the prefix does not end in whitespace or a token boundary.” The PSM format is:
 {prefix_code} {suffix_code} 
The Instruct version of Code Llama is designed for chat-like interaction, and according to Meta “significantly improves performance” on several NLP benchmarks, with a “moderate cost” of code generation. An example application of this model is to generate and explain code-based solutions to problems posed in natural language; for example, how to use Bash commands for certain tasks. Code Llama Instruct uses a special prompt format similar to that of Mistral 7B, with option of a “system” prompt:
[INST] <>
{system_prompt}
<>

{user_prompt_0} [/INST] {assistant_response_0} [INST] {user_prompt_1} [/INST]
The Code Llama announcement says that the models are available in the US East (N. Virginia), US West (Oregon) and Europe (Ireland) regions. AWS has not announced the regions where Mistral is available.
 
About the Author
 
 

Anthony Alford

                        

                            Show moreShow less

Uncategorized

Premarket Mover: Mongodb Inc (MDB) Up 0.48% – InvestorsObserver

MMS • RSS

Posted on mongodb google news. Visit mongodb google news

News Home

Tuesday, October 31, 2023 08:51 AM | InvestorsObserver Analysts

Mentioned in this article

Premarket Mover: Mongodb Inc (MDB) Up 0.48%

Mongodb Inc (MDB) has gained Tuesday morning, with the stock increasing 0.48% in pre-market trading to 337.9.

MDB’s short-term technical score of 86 indicates that the stock has traded more bullishly over the last month than 86% of stocks on the market. In the Software – Infrastructure industry, which ranks 51 out of 146 industries, Mongodb Inc ranks higher than 79% of stocks.

Mongodb Inc has fallen 2.76% over the past month, closing at $331.61 on October 3. During this period of time, the stock fell as low as $327.33 and as high as $374.67. MDB has an average analyst recommendation of Strong Buy. The company has an average price target of $430.54.

MDB has an Overall Score of 59. Find out what this means to you and get the rest of the rankings on MDB!

Mongodb Inc has a Long-Term Technical rank of 75. This means that trading over the last 200 trading days has placed the company in the upper half of stocks with 25% of the market scoring higher. In the Software – Infrastructure industry which is number 55 by this metric, MDB ranks better than 55% of stocks.

Important Dates for Investors in MDB:

-Mongodb Inc is set to release earnings on 2023-12-05. Over the last 12 months, the company has reported EPS of $-3.45.

-We do not have a set dividend date for Mongodb Inc at this time.

Click Here To Get The Full Report on Mongodb Inc (MDB)

Depth Analysis including key players: Arangodb, Azure Cosmos Db, Couchbase

MMS • RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

“ Nosql Software

Report Description:

The New Report By Global Market Vision Titled, ‘Global ‘Nosql Software Market’ Size, Share, Price, Trends, Report and Forecast 2023-2030’, gives an in-depth analysis of the global Nosql Software market, assessing the market based on its segments like Type, application, end-use, and major regions. The Nosql Software Market Report Contains 132 pages Including Full TOC, Tables and Figures, and Chart with In-depth Analysis Pre and Post COVID-19 Market Outbreak Impact Analysis and Situation by Region.

The Nosql Software Market Research Report is a thorough business study on the current state of the industry that studies innovative company growth methods and analyses essential elements such as top manufacturers, production value, key regions, and growth rate. The Nosql Software market research examines critical market parameters such as historical data, current market trends, environment, technological innovation, forthcoming technologies, and the technical progress in the Nosql Software industry.

(An In-Depth TOC, List of Tables & Figures, Chart), Download Sample Report: https://globalmarketvision.com/sample_request/207065

Major Market Players Profiled in the Report include:

Mongodb, Amazon, Arangodb, Azure Cosmos Db, Couchbase, Marklogic, Rethinkdb, Couchdb, Sql-Rd, Orientdb, Ravendb, Redis, Microsoft

A method has been achieved here with the appropriate tools and procedures, transforming this Nosql Software market research study into a world-class document. This report’s market segmentation can be better understood by breaking down data by manufacturers, region, type, application, market status, market share, growth rate, future trends, market drivers, opportunities, challenges, emerging trends, risks and entry barriers, sales channels, and distributors.

Nosql Software Market by Type:

Cloud Based, Web Based

Nosql Software Market by Application:

E-Commerce, Social Networking, Data Analytics, Data Storage, Others

An examination of the market downstream along with upstream value chains and supply channels is covered. This study examines the most recent market trends, growth potential, geographical analyses, strategic suggestions, and developing segments Nosql Software Market.

Competition Analysis

The global Nosql Software market is divided on the basis of domains along with its competitors. Drivers and opportunities are elaborated along with its scope that helps to boosts the performance of the industries. It throws light on different leading key players to recognize the existing outline of Nosql Software market. This report examines the ups and downs of the leading key players, which helps to maintain proper balance in the framework. Different global regions, such as Germany, South Africa, Asia Pacific, Japan, and China are analyzed for the study of productivity along with its scope. Moreover, this report marks the factors, which are responsible to increase the patrons at domestic as well as global level.

The study throws light on the recent trends, technologies, methodologies, and tools, which can boost the performance of companies. For further market investment, it gives the depth knowledge of different market segments, which helps to tackle the issues in businesses. It includes effective predictions about the growth factors and restraining factors that can help to enlarge the businesses by finding issues and acquire more outcomes. Leading market players and manufacturers are studied to give a brief idea about competitions. To make well-informed decisions in Nosql Software areas, it gives the accurate statistical data.

The major key questions addressed through this innovative research report:

What are the major challenges in front of the global Nosql Software market?

Who are the key vendors of the global Nosql Software market?

What are the leading key industries of the global Nosql Software market?

Which factors are responsible for driving the global Nosql Software market?

What are the key outcomes of SWOT and Porter’s five analysis?

What are the major key strategies for enhancing global opportunities?

What are the different effective sales patterns?

What will be the global market size in the forecast period?

Table of Content (TOC):

Chapter 1 Introduction and Overview

Chapter 2 Industry Cost Structure and Economic Impact

Chapter 3 Rising Trends and New Technologies with Major key players

Chapter 4 Global Nosql Software Market Analysis, Trends, Growth Factor

Chapter 5 Nosql Software Market Application and Business with Potential Analysis

Chapter 6 Global Nosql Software Market Segment, Type, Application

Chapter 7 Global Nosql Software Market Analysis (by Application, Type, End User)

Chapter 8 Major Key Vendors Analysis of Nosql Software Market

Chapter 9 Development Trend of Analysis

Chapter 10 Conclusion

Direct Purchase this Market Research Report Now @ https://globalmarketvision.com/checkout/?currency=USD&type=single_user_license&report_id=207065

Get in Touch with Us

Sarah Ivans | Business Development

Phone: +1 805 751 5035

Phone: +44 151 528 9267

Email: [email protected]

Global Market Vision

Website: www.globalmarketvision.com

”

Uncategorized

NoSQL Database Market Development Status 2029 | ObjectLabs Corporation, Skyll …

MMS • RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

[New York, November 2023] A comprehensive market analysis report on the NoSQL Database Market has been unveiled by [Your Company Name], offering valuable insights and intelligence for both industry veterans and newcomers. This in-depth report not only provides revenue forecasts for the NoSQL Database market and its subsegments but also equips stakeholders with a deep understanding of the competitive landscape. It empowers businesses to craft effective go-to-market strategies and positions them for success in the ever-evolving marketplace.

Get a sample report:https://www.statsndata.org/download-sample.php?id=153713

Provides an overview including market, definition, applications and developments, and manufacturing technology. This NoSQL Database market research report tracks all the recent developments and innovations in the market. It provides data on the obstacles encountered when starting a business and provides guidance for overcoming future challenges and obstacles.

Some of the major companies influencing this NoSQL Database market include:

• DynamoDB
• ObjectLabs Corporation
• Skyll
• MarkLogic
• InfiniteGraph
• Oracle
• MapR Technologies
• he Apache Software Foundation
• Basho Technologies
• Aerospike

This NoSQL Database research report highlights the major market players that are thriving in the market. Track business strategy, financial status and upcoming products.

First, this NoSQL Database research report provides an overview of the market, covering definitions, applications, product launches, developments, challenges, and geographies. The market is expected to see a solid development thanks to the stimulation of consumption in various markets. An analysis of the current market design and other fundamentals is provided in the NoSQL Database report.

The regional scope of the NoSQL Database market is mostly mentioned in the region-focused report.

• North America
• South America
• Asia Pacific
• Middle East and Africa
• Europe

Market Segmentation Analysis

The NoSQL Database market is segmented on the basis of type, product, end user, etc. Segmentation helps provide an accurate description of the market.

Market Segmentation: By Type

• E-Commerce, Social Networking, Data Analytics, Data Storage, Others

Market Segmentation: By Application

• Column, Document, Key-value, Graph

Customization Requests: https://www. statsndata.org/request-customization.php?id=153713

Purpose of this report:

Qualitative and quantitative trends, dynamics and forecast analysis of the NoSQL Database market from 2023 to 2029.
Use analytical tools such as SWOT analysis and Porter’s Five Competitive Skills analysis to describe the abilities of NoSQL Database buyers and suppliers to make profit-driven decisions and build their business.
An in-depth analysis of market segmentation helps identify existing market opportunities.
After all, this NoSQL Database report helps you save time and money by providing unbiased information in one place.

Segmentation	Specification
Historic Study on NoSQL Database	2019 – 2022
Future Forecast NoSQL Database	2023 – 2029
Company Accounted	• DynamoDB • ObjectLabs Corporation • Skyll • MarkLogic • InfiniteGraph • Oracle • MapR Technologies • he Apache Software Foundation • Basho Technologies • Aerospike
Types	• E-Commerce, Social Networking, Data Analytics, Data Storage, Others
Application	• Column, Document, Key-value, Graph

Conclusion

NoSQL Database Market attractiveness assessments have been published in publications regarding the competitive potential that new entrants and new products might offer to existing entrants. This research report also mentions the innovations, new developments, marketing strategies, branded technologies and products of key players in the global industry. An in-depth analysis of the competitive landscape using value chain analysis to provide a clear vision of the market. Future opportunities and threats for major NoSQL Database market players are highlighted in the post.
Table Of Content

Chapter 1 NoSQL Database Market Overview

1.1 Product Overview and Scope of NoSQL Database

1.2 NoSQL Database Market Segmentation by Type

1.3 NoSQL Database Market Segmentation by Application

1.4 NoSQL Database Market Segmentation by Regions

1.5 Global Market Size (Value) of NoSQL Database (2018-2029)

Chapter 2 Global Economic Impact on NoSQL Database Industry

2.1 Global Macroeconomic Environment Analysis

2.2 Global Macroeconomic Environment Analysis by Regions

Chapter 3 Global NoSQL Database Market Competition by Manufacturers

3.1 Global NoSQL Database Production and Share by Manufacturers (2019 to 2023)

3.2 Global NoSQL Database Revenue and Share by Manufacturers (2019 to 2023)

3.3 Global NoSQL Database Average Price by Manufacturers (2019 to 2023)

3.4 Manufacturers NoSQL Database Manufacturing Base Distribution, Production Area and Product Type

3.5 NoSQL Database Market Competitive Situation and Trends

Chapter 4 Global NoSQL Database Production, Revenue (Value) by Region (2018-2023)

4.1 Global NoSQL Database Production by Region (2018-2023)

4.2 Global NoSQL Database Production Market Share by Region (2018-2023)

4.3 Global NoSQL Database Revenue (Value) and Market Share by Region (2018-2023)

4.4 Global NoSQL Database Production, Revenue, Price and Gross Margin (2018-2023)

Continue…

Get 20% Discount on Full Report: https://www.statsndata.org/ask-for-discount.php?id=153713

Contact Us

[email protected]

https://www.statsndata.org

Uncategorized

MongoDB: the database provider scaling up to profit – Trustnet

MMS • RSS

Posted on mongodb google news. Visit mongodb google news

Past performance does not predict future returns. You may get back less than you originally invested. Reference to specific securities is not intended as a recommendation to purchase or sell any investment.

Innovation cash kings of the next decade – In this series we look at the standout companies that are ‘pivoting to profit’ and are at an inflection point of converting customer-driven innovation into shareholder value. In the penultimate article of this series, we take a closer look at MongoDB.

This is the eighth article in the series, you can read the other articles here:

Twilio – delivering the message on profitability

Airbnb: booking beds and profits

Shopify – the engine of ecommerce refocusing on its core

Uber’s road to profitability

Atlassian – fostering collaboration and cashflow

Toast – serving a side order of profitability

Service Now – Tidying the Mess

You might not be familiar with MongoDB yet while this American company which provides a popular NoSQL database solution might not always be in the limelight, its impact on the tech world is undeniable.

Directly or indirectly, you will have encountered platforms and services powered by MongoDB. If you’ve engaged with Adobe’s digital platforms, MongoDB plays a pivotal role, powering Adobe Experience Manager to provide consistent digital experiences. Fitness enthusiasts tracking their performance on apps like Strava indirectly leverage MongoDB’s efficient data handling capabilities. Meanwhile, shoppers worldwide, browsing through Gap’s vast collection, experience a smooth digital journey, thanks in part to MongoDB’s scalable database solutions.

MongoDB’s inception by co-founders Dwight Merriman and Eliot Horowitz was driven by their experiences at DoubleClick, acquired by Google for $3.1 billion in 2007, where they grappled with the inflexibilities of traditional relational databases. Recognising a growing need for scalability and agility in the digital era, they sought a solution that would enable horizontal scaling – when you increase the capacity of a system by adding additional machines – offer a flexible data model, and boost developer productivity. They also saw the potential in an open source model, believing it would catalyse innovation and broaden adoption.

This ethos led to the creation of MongoDB, a NoSQL database tailored for the modern web’s demands. Today, MongoDB has established itself as the clear next-generation database leader with over $1 billion of annual revenue, while its core Atlas (database-as-a-service) offering has grown 85% year on year (MongoDB 2023 report).

The global database market is estimated by the International Data Corporation to reach $138 billion by 2026. With MongoDB still having market share in low single digits – the market being dominated by known names such as Oracle, IBM and Microsoft – the company arguably has a real growth opportunity ahead. Its core product Atlas differentiated itself in the market due to its developer-centric simplicity, multi-cloud support, and robust security. It is designed to allow developers to focus on building applications, as Atlas handles operational intricacies with features like automatic scaling, integrated backups, and performance optimisation tools. The Atlas product has been downloaded 240 million times since 2009, including 85 million times in the last 12 months (MongoDB 2023 report). MongoDB has a consumption-based model, offering a free tier for beginners and scaling costs as usage intensifies, ensuring customers only pay for what they use, similar to other cloud service providers such as Snowflake.

MongoDB, like many other tech startups, initially pursued aggressive growth, emphasising adding new users rather than making a profit. This strategy has resulted in the company growing by 35,200 customers in total and even adding 2,200 new customers in the last quarter.

While these strategic foundations strengthened the company’s status as the top NoSQL database provider, questions emerged last year about the longer-term profitability of the business. In response, this year MongoDB has shifted its focus towards sustainable growth. It has streamlined operational costs, enhancing efficiency without sacrificing product quality. While maintaining its core product as open source, it emphasised monetising premium services, with MongoDB Atlas being a notable example, targeting businesses seeking managed solutions.

Moreover, rather than just expanding its customer base, Mongo DB concentrated on deepening ties with existing clients, prioritising tailored solutions and retention. In response, the company has recorded a profit of 6.1% last quarter versus -1.5% a year ago. We expect this to be the beginning of MongoDB’s path to profitability.

While prioritising profit, the company has not forsaken innovation, continuing substantial R&D investments to stay ahead of competitors. MongoDB’s evolution away from aggressive expansion to sustainable growth is an example of a tech startup’s maturation and the start of its next phase of growth. This could provide an attractive opportunity for long-term investors.

Understand common financial words and terms SEE OUR GLOSSARY

Read, watch and listen to more insights from Liontrust fund managers here >

KEY RISKS

Past performance is not a guide to future performance. The value of an investment and the income generated from it can fall as well as rise and is not guaranteed. You may get back less than you originally invested.

The issue of units/shares in Liontrust Funds may be subject to an initial charge, which will have an impact on the realisable value of the investment, particularly in the short term. Investments should always be considered as long term.

The Funds managed by the Global Innovation Team:

May hold overseas investments that may carry a higher currency risk. They are valued by reference to their local currency which may move up or down when compared to the currency of a Fund. May have a concentrated portfolio, i.e. hold a limited number of investments. If one of these investments falls in value this can have a greater impact on a Fund’s value than if it held a larger number of investments. May encounter liquidity constraints from time to time. The spread between the price you buy and sell shares will reflect the less liquid nature of the underlying holdings. Outside of normal conditions, may hold higher levels of cash which may be deposited with several credit counterparties (e.g. international banks). A credit risk arises should one or more of these counterparties be unable to return the deposited cash. May be exposed to Counterparty Risk: any derivative contract, including FX hedging, may be at risk if the counterparty fails. Do not guarantee a level of income.

The risks detailed above are reflective of the full range of Funds managed by the Global Innovation Team and not all of the risks listed are applicable to each individual Fund. For the risks associated with an individual Fund, please refer to its Key Investor Information Document (KIID)/PRIIP KID.

DISCLAIMER

This is a marketing communication. Before making an investment, you should read the relevant Prospectus and the Key Investor Information Document (KIID), which provide full product details including investment charges and risks. These documents can be obtained, free of charge, from www.liontrust.co.uk or direct from Liontrust. Always research your own investments. If you are not a professional investor please consult a regulated financial adviser regarding the suitability of such an investment for you and your personal circumstances.

This should not be construed as advice for investment in any product or security mentioned, an offer to buy or sell units/shares of Funds mentioned, or a solicitation to purchase securities in any company or investment product. Examples of stocks are provided for general information only to demonstrate our investment philosophy. The investment being promoted is for units in a fund, not directly in the underlying assets. It contains information and analysis that is believed to be accurate at the time of publication, but is subject to change without notice. Whilst care has been taken in compiling the content of this document, no representation or warranty, express or implied, is made by Liontrust as to its accuracy or completeness, including for external sources (which may have been used) which have not been verified. It should not be copied, forwarded, reproduced, divulged or otherwise distributed in any form whether by way of fax, email, oral or otherwise, in whole or in part without the express and prior written consent of Liontrust.

Article originally posted on mongodb google news. Visit mongodb google news

Uncategorized

Doing DynamoDB Better, More Affordably, All at Once – The New Stack

MMS • RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Doing DynamoDB Better, More Affordably, All at Once – The New Stack

2023-10-30 11:00:43

Doing DynamoDB Better, More Affordably, All at Once

sponsor-scylladb,sponsored-post-contributed,

Most decisions to move away from DynamoDB boil down to two critical considerations: cost and cloud vendor lock-in. Here’s a way to overcome them.

Oct 30th, 2023 11:00am by

Tzach Livyatan and Felipe Cardeneti Mendes

Featued image for: Doing DynamoDB Better, More Affordably, All at Once

Image from Pixel-Shot on Shutterstock.

It’s easy to understand why so many teams have turned to Amazon DynamoDB since its introduction in 2012. It’s simple to get started, especially if your organization is already entrenched in the AWS ecosystem. It’s relatively fast and scalable, especially in comparison to other NoSQL options that offer a low learning curve like MongoDB. And it abstracts away the operational effort and know-how traditionally required to keep the database up and running in a healthy state.

But as time goes on, drawbacks emerge, especially as workloads scale and business requirements evolve. Factors like lack of transparency into what the database is doing under the hood and the 400 kilobytes limit on item size cause frustrations. However, the vast majority of decisions to move away from DynamoDB boil down to two critical considerations: cost and cloud vendor lock-in.

Let’s take a closer look at those two major DynamoDB challenges and at a new approach to overcoming them — a technical shift with a simple migration path.

DynamoDB Challenge 1: Cost

With DynamoDB, what often begins as a seemingly reasonable pricing model can quickly turn into “bill shock,” especially if the application experiences a dramatic surge in traffic.

What influences DynamoDB cost? Data storage, write and read units, deployment region, provisioned throughput, indexes, global tables, backups — to name some of the many factors. Although (uncompressed) data storage and the baseline number of reads and writes are the primary components driving a monthly DynamoDB bill, there are several other aspects helping to skyrocket prices. When following a “pay per operations” service model, your ability to accurately predict the cost of a workload on DynamoDB highly depends on how much the workload in question is subject to variability and growth.

Write-heavy, latency-sensitive workloads are typically the main contributing factors to alarmingly high bills. For example, a single write capacity unit (WCU) is equivalent to a non-transactional write of up to 1 KB per second. If you decide to purchase reserved capacity, then 100 WCUs are charged as 0.0128 USD per hour (or $150/year), given a one-year commitment. If your workload requires as little as 100KB writes per second, then you would need to make at least a $150,000/year investment to sustain only the baseline writes, without considering other aspects.

That’s the scenario a large media-streaming company faced when reviewing its DynamoDB bills. For a single use case, it had an operations-per-second baseline of half a million writes. Since the use case required multiregional replication, it used Amazon DynamoDB’s global tables feature. Such a high throughput workload, combined with replication and other aspects, meant the team was spending millions per year on just a single use case.

Moreover, teams with latency-sensitive workloads and strict p99 requirements typically rely on DynamoDB Accelerator (DAX) to achieve their SLA targets. Depending on how aggressive the service-level agreements are, caching solutions can easily comprise a considerable amount of a DynamoDB bill. For example, a small three-node DAX cluster running on top of r3.2xlarge instances is priced as high as $2,300 per month, or $27,600 a year.

DynamoDB Challenge 2: Cloud Vendor Lock-In

When your organization is all in on AWS, DynamoDB is usually quite simple to add to the larger AWS commit. But what happens to your database if the organization later makes a high-level decision to adopt a multicloud strategy or even move some applications to on premises?

One of the drawbacks of DynamoDB is that it is a proprietary and closed source. DynamoDB’s development, implementation, inner workings and control are confined to the AWS ecosystem, which inflicts a distinct pain. As you decide to switch to a different platform (cloud or on premises), you’ll need to look for alternatives.

Although AWS provides DynamoDB Local to run DynamoDB locally, the solution is primarily designed for development purposes and is not suitable for production use. And if your organization wants to extend beyond the AWS ecosystem, moving to a different database isn’t easy.

Evaluating a different database requires engineering development time and involves a careful analysis of the compatibility across two solutions. Later in the process, migrating the data and users from one solution to another is not always a straightforward process — and based on what we’ve heard, AWS is unlikely to assist.

Depending on how large your DynamoDB deployment became, your company could be locked into it for a long time, as re-engineering, testing and moving an entire fleet of DynamoDB tables from various use cases will require a lot of planning and, fairly often, a lot of time and effort.

For example, a large AdTech organization decided to switch all of its cloud infrastructure to a different cloud vendor. It needed to migrate from DynamoDB to a database that would be supported in the new cloud environment it had committed to. It was apparent that a database migration could be challenging because 1) the database was supporting a business-critical use case, and 2) the original application developers were no longer part of the company, so moving to a different solution could potentially incur a major application rewrite. To avoid business disruption as well as burdensome application code changes, it sought out DynamoDB-compatible databases that would offer a smooth path forward.

How ScyllaDB Helps Teams Overcome DynamoDB Challenges

These are just two of the reasons why former DynamoDB users are increasingly moving to ScyllaDB, which offers improved performance over DynamoDB with lower costs and without the vendor lock-in.

ScyllaDB allows any application written for DynamoDB to be run, unmodified, against ScyllaDB. It supports the same client SDKs, data modeling and queries as DynamoDB. However, you can deploy ScyllaDB wherever you want — on premises or on any public cloud, AWS included. ScyllaDB provides lower latencies without DynamoDB’s high operational costs. You can deploy it however you want via Docker or Kubernetes or use ScyllaDB Cloud for a fully managed NoSQL Database as a Service solution.

Reducing Costs: iFood

Consider iFood, the largest food delivery company in Latin America, which moved its Connection-Polling service from Postgres to DynamoDB after passing the 10 million orders per month threshold. But the team quickly discovered that DynamoDB’s autoscaling was too slow for the application’s spiky traffic patterns.

iFood’s bursty traffic naturally spikes around lunch and dinner times. Slow autoscaling meant it could not meet those daily bursts of demand without a high minimum throughput, which was expensive, or the team managed scaling themselves, work they were trying to avoid with a fully managed service.

At that point, they transitioned their Connection-Polling service to ScyllaDB Cloud, keeping the same data model they built when migrating from PostgreSQL to DynamoDB. iFood’s ScyllaDB deployment easily met its throughput requirements and enabled the company to reach its midterm goal of scaling to support 500,000 connected merchants with one device each. Moreover, moving to ScyllaDB reduced the database cost of the Connection-Polling service alone from $54,000 to $6,000.

Freedom from Cloud Vendor Lock-In: GE Healthcare

GE Healthcare’s Edison AI Workbench was originally deployed on AWS cloud. But when the company took it to its research customers, the customers said, “This is great. We really like the features and we want these tools, but can we have this Workbench on premises?”

Since DynamoDB was a core component of the solution, the company had two choices: rewrite the Edison Workbench to run against a different data store or find a DynamoDB-compatible database that could be deployed on premises.

The team recognized the challenges involved with the former option. First, porting a cloud asset to run on premises is a nontrivial activity, involving specific skill sets and time-to-market considerations.

Additionally, the team would no longer be able to perform the continuous delivery practices associated with cloud applications. Instead, they would need to plan for periodic releases as ISO disk images, while keeping codebases synchronized between cloud and on-premises versions. Thus, maintaining a consistent database layer between cloud and on-premises releases was vital to the team’s long-term success.

So, they opted for the latter option and moved to ScyllaDB. “Without changing much, and while keeping the interfaces the same, we migrated the Workbench from AWS cloud to an on-premises solution,” explained Sandeep Lakshmipathy, director of engineering at GE Healthcare. This newfound flexibility enabled them to rapidly address the requested use case: having the Edison Workbench run in hospitals’ networks.

How ScyllaDB and DynamoDB Compare on Price and Performance

To help teams better assess whether a move makes sense, ScyllaDB recently completed a detailed price-performance benchmark analyzing:

How cost compares across both DynamoDB pricing models under various workload conditions, distributions and read:write ratios.
How latency compares across a variety of workload conditions.

You can read the detailed findings in this comparison report, but here’s the bottom line: ScyllaDB costs are significantly lower in all but one scenario. In realistic workloads, costs would be five to 40 times lower with up to four times better P99 latency.

Here is a consolidated look at how the DynamoDB and ScyllaDB compare on cost and performance for just one of the many workloads we tested (based on prices published in Q1 of 2023). DynamoDB shines with a Uniform distribution and struggles with the others. We chose to highlight a case where it shines.

Additionally, here are some results for the more realistic hotspot distribution:

Again, we encourage you to read the complete benchmark report for details on what the tests involved and the results across a variety of workload configurations.

Is ScyllaDB Right for Your Team’s Use Case?

Curious if ScyllaDB is right for your use case? Sign up for a free technical consultation with one of our architects to talk more about your use case, SLAs, technical requirements and what you’re hoping to optimize. We’ll let you know if ScyllaDB is a good fit and, if so, what a migration might involve in terms of application changes, data modeling, infrastructure and so on.

How much could you save by replacing DynamoDB with an API-compatible alternative that offers better performance at significantly lower costs — and allows you to run on any cloud or on premises? For a quick cost comparison, look at our pricing page and cost calculator. Describe your workload, and we’ll show you estimates for ScyllaDB, as well as other NoSQL (Database as a Service) DBaaS options.

Group
Created with Sketch.

Tzach Livyatan is vice president of product at ScyllaDB. He has bachelor’s and master’s degrees in computer science and has had a 15-year career in development, system engineering and product management. In the past he worked with telecoms, focusing on…

Uncategorized

MongoDB Insiders Sold US$24m Of Shares Suggesting Hesitancy – Simply Wall St News

MMS • RSS

Posted on mongodb google news. Visit mongodb google news

The fact that multiple MongoDB, Inc. (NASDAQ:MDB) insiders offloaded a considerable amount of shares over the past year could have raised some eyebrows amongst investors. When evaluating insider transactions, knowing whether insiders are buying versus if they selling is usually more beneficial, as the latter can be open to many interpretations. However, shareholders should take a deeper look if several insiders are selling stock over a specific time period.

While we would never suggest that investors should base their decisions solely on what the directors of a company have been doing, we do think it is perfectly logical to keep tabs on what insiders are doing.

See our latest analysis for MongoDB

The Last 12 Months Of Insider Transactions At MongoDB

The Chief Revenue Officer, Cedric Pech, made the biggest insider sale in the last 12 months. That single transaction was for US$6.1m worth of shares at a price of US$409 each. We generally don’t like to see insider selling, but the lower the sale price, the more it concerns us. The silver lining is that this sell-down took place above the latest price (US$335). So it is hard to draw any strong conclusion from it.

In the last year MongoDB insiders didn’t buy any company stock. You can see a visual depiction of insider transactions (by companies and individuals) over the last 12 months, below. If you click on the chart, you can see all the individual transactions, including the share price, individual, and the date!

insider-trading-volume — NasdaqGM:MDB Insider Trading Volume October 30th 2023

For those who like to find winning investments this free list of growing companies with recent insider purchasing, could be just the ticket.

MongoDB Insiders Are Selling The Stock

Over the last three months, we’ve seen significant insider selling at MongoDB. In total, insiders sold US$8.8m worth of shares in that time, and we didn’t record any purchases whatsoever. This may suggest that some insiders think that the shares are not cheap.

Insider Ownership Of MongoDB

Another way to test the alignment between the leaders of a company and other shareholders is to look at how many shares they own. I reckon it’s a good sign if insiders own a significant number of shares in the company. MongoDB insiders own about US$860m worth of shares (which is 3.6% of the company). Most shareholders would be happy to see this sort of insider ownership, since it suggests that management incentives are well aligned with other shareholders.

What Might The Insider Transactions At MongoDB Tell Us?

Insiders sold stock recently, but they haven’t been buying. And even if we look at the last year, we didn’t see any purchases. The company boasts high insider ownership, but we’re a little hesitant, given the history of share sales. In addition to knowing about insider transactions going on, it’s beneficial to identify the risks facing MongoDB. While conducting our analysis, we found that MongoDB has 3 warning signs and it would be unwise to ignore these.

If you would prefer to check out another company — one with potentially superior financials — then do not miss this free list of interesting companies, that have HIGH return on equity and low debt.

For the purposes of this article, insiders are those individuals who report their transactions to the relevant regulatory body. We currently account for open market transactions and private dispositions of direct interests only, but not derivative transactions or indirect interests.

What are the risks and opportunities for MongoDB?

MongoDB, Inc. provides general purpose database platform worldwide.

View Full Analysis

Rewards

Revenue is forecast to grow 18.61% per year

Risks

Shareholders have been diluted in the past year
Significant insider selling over the past 3 months
Currently unprofitable and not forecast to become profitable over the next 3 years

View all Risks and Rewards

Have feedback on this article? Concerned about the content? Get in touch with us directly. Alternatively, email editorial-team (at) simplywallst.com.

This article by Simply Wall St is general in nature. We provide commentary based on historical data and analyst forecasts only using an unbiased methodology and our articles are not intended to be financial advice. It does not constitute a recommendation to buy or sell any stock, and does not take account of your objectives, or your financial situation. We aim to bring you long-term focused analysis driven by fundamental data. Note that our analysis may not factor in the latest price-sensitive company announcements or qualitative material. Simply Wall St has no position in any stocks mentioned.

Article originally posted on mongodb google news. Visit mongodb google news

Uncategorized

Presentation: Performance and Scale – Domain-Oriented Objects vs Tabular Data Structures

MMS • Donald Raab Rustam Mehmandarov

Article originally posted on InfoQ. Visit InfoQ

Transcript

Mehmandarov: Have you ever created data structures, and put your data into those and actually thought about how much memory do those take inside your application? Have you thought what will happen if you multiply that tiny little set, or a list, or whatever you created by hundreds, thousands, or millions, and how much memory that would take and how much memory savings you can have? This talk is about all that. This talk will be talking about performance and scale in Java and how you can handle both domain-oriented objects and tabular data structures in JVM, in Java in general.

Raab: My name is Donald Raab. I’m a Managing Director and Distinguished Engineer at Bank of New York Mellon. I am the creator, project lead, and committer for the Eclipse Collections project, which is managed at the Eclipse Foundation. I’m a Java Champion. I was a member of the JSR 335 Expert Group that got lambdas and streams into Java 8. I am a contributing author to the, “97 Things Every Java Programmer Should Know.”

Mehmandarov: My name is Rustam. I am a Chief Engineer at a company in Norway, Oslo called Computas. I’m a Java Champion as well, and also a Google Developer Expert for Cloud. I am a JavaOne Rockstar. I also have been involved throughout times in different communities, and I’m leading one of those at the moment as well.

The Problem with In-Memory Java Architectures (Circa & After 2004)

Let’s talk about a little bit of history. Rewind a tiny little bit back and talk about what was happening in 2004 and why it’s important.

Raab: Back in 2004, to give some context, I was working in a financial services Java application, and I had a problem, it didn’t fit into a 32-bit memory space. My mission was to try and fit 6 gigabytes of stuff into 4 gigabytes of space. Once I dug into looking at the problem, it turns out, there were millions of small lists that map instances roaming around that were created with default sized array. At the time, the solution we came up with was to actually roll our own small size Java Collections. For those that know the movie, The Martian, I felt a bit like Mark Watney, where he said, “In the face of overwhelming odds, I’m left with only one option, I’m going to have to science some stuff out of this.”

You fast forward to after 2004, when we got 64-bit JVMs with the JDK 1.4 release, we could now access more memory on our hardware, but the software problem, accessible memory became a hardware problem. It’s like you wind up with, over a period of time, how much memory do you actually have to access? Really, it’s like, we wound up pushing hardware limits in some cases, with some of our heap sizes. Compressed OOPS gave us a great benefit when that became available, JDK 1.6 in 2006. We then got the ability to have 32-bit references instead of 64-bit references in our heap. Saved a significant amount, but there was this then problem where it’s like, we got back to a software problem again. It’s like, now, we have the sweet spot for the 32-bit references, we got to keep it below 32 Gigs. Or there’s this ability to play around with different shifting options where you could maybe get 32-bit references up to 64-bit Gig, but you’ll get a different alignment so it might cost you 16-byte alignment versus 8-byte. You got to be careful with that. 32 Gig winds up being the sweet spot. Because of this, we wound up then rolling our own solutions for our own mutable sets, maps, and lists, and also built eventually our own primitive collections. The whole goal being that we wanted to reduce the total memory footprint of our data structures and make sure our data was taking the most space, not the data structures.

The (Simulated) Problem Today

Mehmandarov: The problem today is, I call it simulated problem today because we have actually created a little bit of that problem, because now you have 64-bit memories. It’s the systems, you have a bunch of memory. Everything is nice and shiny. The problem is still there because you still need to process more data and you would like to do it in the most efficient way. What we have now, what we’ve actually tried to create to this problem is that we’ve thought of a data put into a CSV file that we would like to read in. Since we do quite a bit of conferences, both of us, we decided to go with conference data. I’ll show you the data and what it consists of on the next slide. For now, we just want to say that we need to read in data. We need to process. We need to do something with that data, and we want to do it in the most efficient way. How can we do that in Java? What decisions can we make to make it more efficient? How should we think about our data? Should we think of it as rows of data, columns of data? Which one is better? Also, what libraries we’ll be looking at. We’ll be looking at three different ways of doing stuff. We’ll be looking at the batteries-included version of Java. Java Collections that Java comes with. We’ll also be looking into Eclipse Collections, a library that Donald started, and a bunch of people have been contributing and creating, evolving. Also, we’ll be talking quite a bit actually about a library that was built on top of Eclipse Collections created by another developer, Vlad, called DataFrame-EC. EC actually stands for Eclipse Collections. This talk will be focusing on memory and memory efficiency, and the strategies and techniques connected to that. We’ll not be talking about other kinds of performance tuning and all those things.

Sample CSV Data to Load

Let’s talk about data. The data, the way it looks, it’s a bunch of columns, like typically what you would expect from your data. It has an event name. It has a country where that event exists or will take place. It has a city where it will take place. It has two sets of date objects or dates, where we have a start date and end date. It also has some indication of what session types does it have, so a list of elements, so it might be lightning talks, regular talks, or workshops. We have some integer values, or typically values that would be represented by integers: number of tracks, or number of sessions, or maybe even number of speakers, or cost, so how much that would cost. To create all that, we have a script or a Java class that randomly generates data. Then we can create and then generate that in a deterministic way, where we can just randomly generate a number of random strings going as names between a certain length of a string. It can be a little bit smaller or a little bit bigger, but roughly the same size and randomly generated. Same goes with the countries. We have a set of countries that we pull from, and we can use those countries. Same goes for the cities. We also have dates. Dates are, in our dataset, just for our testing purposes. We limited that to be all possible dates within 2023. Also, session types. You can choose between those three that you can see there. It can be different variations of those and different number of those. Also, the other numbers are going to be random within a certain range. We’ve generated it in different shapes and sizes. We generated it for 1 million rows. That takes roughly 90 Megs on disk. We have a 10 million one which is 10 times bigger, so roughly 900 Megs. We also have 25 million, which takes roughly 2.2 Gigs on disk.

Measuring Memory Cost in Java

Measuring memory cost, so how do we do that?

Raab: This is a practical thing to walk away from this talk with is, there’s a tool at the OpenJDK project called Java Object Layout, referred to as JOL. It’s a great tool for analyzing the object layout schemes of various objects. It’s very precise in terms of you can look at individual objects and look at their cost and layout. You’ve got the Maven dependency here. You can go check that out. Also, there’s a nice article here from Baeldung talking about memory layout of objects, and actually talks about JOL in more depth. There’s also a Stack Overflow question, which was in regards to whether or not JOL works with Java records. It does. Pretty much it requires using a VM Flag when you’re running with Java 14 or above. If you want to use records, you got to set this magicFieldOffset to true.

Memory Considerations

Mehmandarov: Memory considerations, there are actually quite a few things. We talk about boxed versus primitive. I’m just going to list them up, and then leave the explanation of that as a cliffhanger. Boxed versus primitive is a thing that we’ll be thinking about, and we have been thinking about, and we’ll be talking about. Also, we’ll be talking quite a bit about data structures that are mutable versus immutable. We’ll be talking about data pooling, and what that actually means and what results it will give you. We’ll also talk about way of thinking of data. Do you want to think about it in a row-based way? You typically think about your data in relational databases and stuff, like row and row, another row and another row of data, versus column based. That’s another thing. We’ll also be talking a bit about how memory can be improved or will be improved in the future by the things that are planned to become a part of Java as well.

Memory Cost – Boxed vs. Primitive

Raab: We’re looking at two classes here, and there’s some giveaways. On the left, you got the MinMaxPrimitivesBoxed, and you’ve got a set of fields which represent the min and max values of different types. On the right, you’ve got something that looks very similar, different class name, all the field names are the same, and the primary difference is the type. On the left, we’ve got the type with the uppercase letter, on the right, the type with the lowercase letter. A question for you to think about is, what is the difference going to be between these two classes, specifically from a memory footprint perspective?

What we’re going to do is we’re using JOL here to basically print out the memory footprint of these two objects. You can see we’re using this thing called GraphLayout or parseInstance. We give it an instance of MinMaxPrimitivesBoxed, and we print out its footprint to system out. Then we do the same thing with MinMaxPrimitivesPlain, and print out its footprint. For the type that we have on the left that had the uppercase type names, like Boolean, Byte, Character, all uppercase. What you’ll see is that we actually created in memory, 17 different objects. Pretty much it was like the object that we wanted, the MinMaxPrimitivesBoxed, and then we wound up creating two objects for each of the MinMax values of these boxed types. In total, 17 objects, and for a total of 368 bytes of RAM for that single class. What you can see is like, in the middle, you see the average, you’re seeing the cost of each of these boxed types. It’s just a useful metric for you to understand. When you’re using these boxed types in memory, what do they actually take in memory to use them? Then the MinMaxPrimitivesBoxed as well, you see it takes 80 bytes itself. That 80 bytes is counting up the 4-byte references plus the object header cost. In addition to the cost of each of these objects, you’ve got the reference to the object as well, contained in the main object, so 368. If we look then at MinMaxPrimitivesPlain, you’ll see we get one total object. We don’t have to create 16 extra objects, we just have the primitive values, and the total cost of that object is less as well. It’s 8 bytes less than the cost of the MinMaxPrimitivesBoxed minus also then the 16 extra boxed wrappers that we get. Our recommendation here is pretty simple, don’t box primitive values. Understand what the cost of doing that is. Unfortunately, because Java has autoboxing in it, that can be a useful feature, but it’s also somewhat evil in that what it’s doing is silently creating stuff on the heap for you taking up memory. Using autoboxing, you may actually be hiding memory bloat.

Memory Footprint – Boxed vs. Alternate vs. Primitive Sets

Now we’re going to actually take a look at boxing different data structures, so boxed versus primitive data structures as well as an alternate data structure in here as well. Here, what we’ve got is, we’re creating three different sets. One is a java.util.HashSet. We’re going to then create a UnifiedSet from Eclipse Collections. Basically, it’s going to be a set of numbers from 1 to 10. The first two are boxed. You can see they have HashSet of integer and mutable set of integer. The third one is not boxed, it’s primitive. Here, we’re going to basically compare two set implementations and then a primitive set implementation using JOL, doing the same thing, just parsing each one and doing their footprint. What do you think the difference is going to be between these three classes?

Looking at HashSet, what you’ll see is like, we wind up actually creating 24 objects here. Ten of the objects are the boxed integers and they’re taking 160 bytes. We have the HashSet itself. Then contained within a HashSet is the HashMap. You’ll see like within a HashMap, you’ve got an array of this HashMap$Nodes, and you’ve got 10 of these HashMap$Nodes. These are basically the map entry objects contained inside of the map. Now the interesting thing is like, a set containing a map, winds up carrying along with it all this extra infrastructure to support its setness. If you then compare it to UnifiedSet from Eclipse Collections, so the set for Eclipse Collections doesn’t contain a HashMap. You immediately cut the number of objects that you’re creating in half. We wind up with an array of objects. We still have the 10 integer objects. We can’t get rid of those because this is boxing happening. Then we’ve got the UnifiedSet cost itself. A big difference there, from 640 bytes down to 272, so more than cutting in half.

The third thing we can look at is the primitive set, so IntHashSet. IntHashSet, what you can see is we get rid of the integer objects, so that’s 160 bytes gone. We have an int array, which is that [I. Then we have the cost of IntHashSet itself, which is 40 bytes. In total, it’s 120 bytes. From UnifiedSet to the IntHashSet, you can see it’s a tremendous savings. Once again, more than cutting in half in terms of the memory. Recommendation here, avoid using java.util.HashSet, unless you’re using in a place where you’re going to create it and then get rid of it. Don’t hold on to it long term, because it’s a lot of cost for the value it’s providing of being a set. It’s a memory hog. Also, remember, don’t box primitive values if you can. Autoboxing is evil. You can see we get that 160 extra bytes for integer objects when they’re just int values, they should be 4 bytes each. It’s hiding bloat in your heap, potentially.

Memory Footprint – Mutable vs. Immutable Sets

Next thing we’re going to look at, mutable versus immutable. Here, we’re going to compare two things, they’re both in the JDK. We’re going to compare HashSet holding on to two integer values, 1 and 2. We’re going to compare it to the ImmutableSet in the JDK. In this case we’re going to create a copy of the HashSet to save some code. We do Set.copyOf, and what this is going to do is create an ImmutableSet for us. Then we’re going to compare their two footprints. If you look, here’s the thing you just saw on the previous slide, you’ve got the footprint. Now this is only holding on to two integers now. Those cost then 32 bytes. Then you’ve got the cost of the data structure itself. For a HashSet with two elements, you can see it’s 272 bytes, compared to this special type called Set12, it’s an inner class in ImmutableCollections. All you have is the two integers, 32 bytes, and then 24 bytes for the Set12 for a total of 56 bytes. A real tremendous savings in the mutable range. Once again, avoid java.util.HashSet, if you can. Recommendation here is, when you’re loading data up, if you can trim it at the end using immutable data structures, you can save a lot of memory that way. I would say, load mutable, because it’s going to be faster performance-wise to grow something that’s mutable. At the end, if you can trim, and wind up with just a memory required going to an immutable version of the collection is very helpful.

Memory Comparison – Sweating the Small Stuff

Then we can talk about sweating the small stuff. We looked at a set in a two-element range. It’s like, what other optimizations can happen in that small element range? We’re going to look at the ImmutableList space. In the JDK, there winds up being two optimized immutable things. There’s list 1, 2, which covers basically one element and two element. Then there’s list n, which is basically going to be a trimmed array to the number of elements that you have. You can see, comparing JDK to Eclipse Collections, this is what the differences look up to size 11, from size 0 to 11. Obviously, you’re looking at memory in bytes, so the smaller the better. You can see there’s a reasonable gap at each level between JDK and Eclipse Collections. The reason is Eclipse Collections actually has named types from 0, so it’s 0. Even though it looks like JDK cost a lot more than Eclipse Collections here, it’s completely meaningless because empty, there’s only one in the JVM. You actually have a singleton instance of that. You only ever create one. There’s no multiple. The multiple for it is one. It’ll never be more than that. Whereas the other ones you’re going to wind up with a multiple effect, depending on how many you have. If you have millions of them, that’s where savings can add up. Eclipse Collections actually has named types from singleton all the way to decapleton, and everything in between. Then at 11, it switches to be array based, which is where then JDK and Eclipse Collections get very close, in terms of the cost.

There’s a thing to be aware of, in terms of with these memory savings that you get, there’s potentially a tradeoff of performance. The core JDK team made sure that they limited the number of classes they created to try and reduce the possibility of megamorphic call site sneak into your code. When you have a call site that is bimorphic, or monomorphic, it’s very fast. Once you enter into the range of megamorphic, where you’ve got three or more implementations of an interface, you wind up with a virtual lookup and method call. That actually gives you a significant potential performance hit. You got to be aware of like, what’s more important to you? Is it memory or is it performance? In that case, then making sure you understand your performance profile, and where you have potential bottlenecks, just be aware of the tradeoffs. In 2004, I had the problem of, I needed to shave as much memory as I could to fit into a 32-bit memory space. That’s where the design decision was like driving having all of these smaller types available.

Exploring Three Libraries

Let’s go on and talk about exploring the three libraries we looked at.

Mehmandarov: We talked quite a bit about how should we present that to you. We’ve mentioned already all of them. We’ve talked about Java Streams, Eclipse Collections, and DataFrame-EC. We thought of, what is the best way of actually introducing them and actually explaining to them. If you’ve used or played or have seen Legos before, you have seen them in lots of different shapes and sizes and different things you can build into things. In general, there, you will see three different types of sets. You’ll have the basic ones that are consisting of more generic pieces. That’s what Java Streams is. As fun, we actually looked at the age limits for those sets that we see in the picture here. It’s a fun comparison also for us as a programmer, so think about that as well. Call it maturity, or experience, or whatever. It doesn’t really translate to that, obviously, but it’s a little fun thing that made us giggle a little bit. Java Streams is basic building blocks. It has quite a bit of assembly required to make it look like a car, but it’s also a standard set of things that you can build into a house or an airplane or a boat or something. Most of the things. With Java Streams, it’s the same. Some assembly is required. More of a low-level control, so you can actually build your own stuff in different ways. Also, Java Streams has this row-based approach to domain objects. You have domain objects and you have row-based approach to those. You put a bunch of attributes into an object, and they represent a row in a database.

Eclipse Collections is a little bit different. It’s more closer to what you see in a technique where you have all the pieces from the standard and basic stuff, but also has extra pieces that are very specialized to build that particular thing. A particular set of cogs to do a transmission box for a car, or a tiny little thing to do the grille on a car. It’s not the standard, you can use it on many other things. It’s the same thing with Eclipse Collections, it has more specialized building blocks, but it’s still compatible with the basic one and it’s also optimized for performance. It’s optimized for building that particular car, obviously. It still has the row-based approach to domain objects. You still put things into objects, and you still handle them as a bunch of attributes inside an object.

DataFrame-EC is a little bit different. It has a bit more of an approach similar to Lego Mindstorms, where it has a smart thing in the middle there that can be programmed to do things. The way I like to think about DataFrame-EC, which is probably not exactly the correct way, but it still helps me to build a mental model is think of it as a spreadsheet. Where you have both data, but also smartness and filtering that you can build into a spreadsheet that you typically would have on your computer, whatever program you’re using for that. Here, it’s like more specialized building blocks, even more specialized. They’re still compatible with the two previous libraries or versions of doing things. It is a much more higher-level approach. It also can be programmed in a more specific way to do specific tasks. It simplifies some of the tasks that can be done by the other ones, but are a bit more tedious. This one actually has a different approach. It has a column-based approach to the tabular data structure. Now we’re looking at data in columns instead of looking at it in rows.

Conference Explorer

The Conference Explorer class that we had to implement to read all that data that we’ve shown you earlier, it’s in at least two of the three cases that we’ve been playing around with for Java Streams and Eclipse Collections. Remember, row-based approach? That is a record. That’s implemented as a record where we have a bunch of attributes that we put into that. Then, for Java Streams, we create a set of conferences and countries. For Eclipse Collections, we do a specific type of a set, which is ImmutableSet, which is an Eclipse Collections specific object, of again, set conferences and countries. For DataFrame-EC, things are a little bit different. Here, we actually have DataFrame objects for all of those things. Now we have a DataFrame for conferences, DataFrame for country codes, and all those kinds of things. You see, it’s not countries, it’s actually country codes, because all the countries and everything, it’s inside the conferences DataFrame. Country codes we need for other things to generate some other fun things. If you want to know about APIs, and all these, how it’s been implemented, we’ve done a few talks at Devnexus and Devoxx Greece, where you can actually see the same data, see the same code, but where we talk more about using the APIs and how you can use it in different settings.

Conference Explorer – Memory Cost Comparison (Mutable vs. Immutable)

The question here is, what would it cost to load 1 million conferences into data structures like these? Let’s find out. First thing we would like to see is how it behaves when you do a type of library and a type of data structure. A type of library like Java or Eclipse Collections, or DataFrame, and for each of them, we’ll try to see for mutable and immutable data structures. You can see the memory footprint of mutable structures are much bigger, at least for Java sets. It gets smaller for the other ones.

Raab: A Java set is exactly the set we warned you about in the previous slides, that’s java.util.HashSet, and you’re seeing the extra cost there or how it translates as you scale up.

Mehmandarov: Still, even for the other ones, it’s still higher. This is for 1 million conferences. It’s not for 25, or 50, or whatever. Also, the funny thing is that you can also see that the DataFrame-EC one, obviously it doesn’t have the mutable version, but immutable version of that is even half the size. If you wonder why, we’ll see that in a bit. We’re going to keep that and leave that as a question, so why are the collection alternatives comparing so badly to DataFrame-EC? This is one of the main answers. The answer is something called pooling. Now we’ve done the same thing, library, only immutable data structure, but now with or without pooling, and based on our dataset. This is the graph. You will not see exactly like this for your own data if your data is different from ours. For us, it looked like that. With introducing pooling, we halved or cut the size of the whole thing in two. Like I said, it’s not the data structure. It’s exactly the same data. It’s exactly the same data structures. The only difference is if we do pooling or not. We implemented custom pooling using Eclipse Collections for Eclipse Collections-based solutions. The recommendation here that’s important, if you’re going to take away anything from this slide, is that you should understand your data and analyze it using some tools like Java or something else, to understand how it behaves and where you can optimize it.

What Is Pooling?

What is pooling? Can you explain a little bit about that?

Raab: The first thing we’re going to talk about is, what is a pool? The way I would describe pools, it’s a set of unique values that you can put things into and get them out of. If you think about a HashSet in Java, you realize like the set interface, you can add to it, but you can’t actually get anything out of it. You can check for containment. If a set had a Get method, it turns out in Eclipse Collections, our UnifiedSet is actually a set and a pool, so actually have a Get method on our set. What is a pool useful for? Why would you want to have a Get method on a set? What it’s useful for is like, it helps you reduce the number of duplicate items that you have for a specific set of data in memory. Think of it basically as a HashMap of the key and value being the same thing. I want to look up this key, and if I have that key, I want to look up this value, and I only want to keep this value in memory. It’s just for the set. Since key and value are the same, if you have a Get method, you’re looking up the key and you get back the key, is really what it comes down to.

In JDK, there are different kinds of pools that actually happen. They’re not implemented as sets, but like, there’s the type of pooling if you haven’t heard of before, which actually gets used for literal strings or String.intern. There is an internal pool that the JVM manages, that literal strings use, and you can use it as well. It’s a method on string to use a pooling. There are articles out that have existed for quite a long time, and done over the years explaining when and when not to use String.intern and different issues with it. It is available there. It is a pool. There are also these pools available on the boxed wrappers that actually get used through autoboxing. There’s a value of method on each of the boxed wrappers like Boolean, Short, Integer, Long. There is a range of values that are basically cached or pooled for each of these types. For integer there’s 256, I think integers. It’s like negative 127 to 128, or negative 128 to 127. They’re both ranges. They keep these small integers both in negative and positive cache.

As it turns out, and this was actually a mystery for us. At first, we didn’t know why DataFrame-EC was doing so well. We didn’t do anything to it. We just loaded data into it. We thought we had a bug or something, like why is it half the memory? It turns out, like DataFrame-EC actually uses the Eclipse Collections UnifiedSet underneath to actually pull the data for each column. It’s very smart. It makes sense because like for each column, since it’s a column-based datastore, it can say, if I have a string, let me have a pool of strings, and I can unique while I’m loading. Or if I have a date, let me have a pool of dates, and even if I have a tremendous number of years, the number of dates is probably going to be much less than hundreds of thousands or millions.

Mehmandarov: Think about it, like 365 possible dates a year, and if you have even dates for 30 years, 50 years, it’s still much less than 1 million elements.

Raab: After we started understanding our data and looking at it and seeing where the costs were, we said, what can we do here with pooling? We saw like, we have a lot of duplicate cities. We actually have a set of 6 of them that we load. Immediately, we can get rid of 1 million strings, and just through pooling, wind up with 6. There’s a tremendous savings. We have start date and end date. We have a million of each of those. There’s only 364 that we wind up loading for 2023, so we go from a million to 364. Then, as it turns out, session types, which Rustam talked about before with the CSV data, we’ve got talks, workshops, lightning talks. It’s a combination of either anywhere from one, two, or three, and then the combination of those, and you can see in total, you wind up with seven instances. What’s interesting is when you use the Eclipse Collections ImmutableSet, you wind up with named classes for each of the sizes. It actually tells you more about the distribution of your data. I refer to this as a tracer bullet, it’s like I can see what’s out there as I’m shooting my sets into the heap, what they actually look like and where they land, and see, I’ve got 375,000-plus singleton sets, 541,000 doubleton, and 82,000 tripleton, and those then reduce down to those.

Row-Based vs. Column-Based Structures

Now we can talk a bit about rows versus columns, and something to think about in this space. Row-based structures, really, the benefit you get is like you can get custom tuning ability. You can actually really try and compress down the data in your row. You’ve got limits with that. You do have also the ability to do custom pooling, which is what we did after the fact. We’re going to show a little bit of what we can do in terms of achieving more with rows to squeeze even more memory. Some of the challenges that you get is like you get this object header cost. For every row, I got like an object header. We’re going to talk a little bit about what Java is going to do, eventually, to help us in this space. You also have object alignment costs to think about. The way objects get this 8-byte alignment, so whatever you can fit into 8 bytes. If you don’t fit 8 bytes there, you fit 4, it’s still going to cost you 8. You got to consider that. There’s a great article from Aleksey Shipilev on object alignment and how that works. With column-based structures, you only get an object header cost per column, so a lot less columns, let’s say 10 columns versus a million rows. You get great compression and performance, especially if you’ve got primitive types, things can maybe just be loaded directly in from a second level cache into the processor and get good cache locality. Then you are limited, though, in terms of tuning to the available types. DataFrame-EC only has a long value type for integral values, it doesn’t have Short, Int, or Byte. That’s a place where it could actually give you more savings.

Fine-Tuning Memory in Conference Explorer

Let’s go look at fine-tuning memory in Conference Explorer and what we did. Through some, just manual, what can I do to squeeze memory here? We only did it for the one column. We could have done it for all of the first four, because this was really making changes in the Conference record, changing the types. What we did was we took what were 4 int fields initially, and we made one of the int fields, byte, because the number of tracks is always typically going to be less than 10. In a byte, I can fit 128. Then the int value is like, I don’t really need 2 billion for number of speakers, number of sessions, and cost. A nice value is Short. It’s much smaller. Hopefully, you don’t hit the max size for the cost, but you’re definitely not going to hit the max size for speakers and sessions. This gave us the ability to shrink 16 bytes down to 7. Then we did this really gnarly trick of like, let’s get rid of an object reference and that will save us 4 million extra bytes, potentially, if we can get the object alignment working out right. We combined the two dates into this pair object, it’s called the twin, because twin is the same type, and basically, we were able to reduce the reference cost. This is that funny date math, now we have the combination of from and to. Since to is always greater than from, you wind up with, at max, like 66,000. That’s still a lot less than a million. Even if I have to create 66,000 of these at 24 bytes, the million times 4 is going to be more than that. We got to see things from there. What you can see is, we effectively got the Eclipse Collections ImmutableList row-based approach to be a little more than 10 Megs less memory than the DataFrame-EC, where we didn’t have to really do anything. That’s explaining what happened here.

Let’s turn the volume up to 25. We wanted to see, what if we turn the volume up actually to 25, where we went from 1 million to 25 million? We still have that manual tuning for just the Eclipse Collections column. What we added here is we wanted to remind you on the file size, so you can actually compare like, this is the file size that when we generated the 25 million objects it took 2.19 Gig. You can see in comparison, how much does it require in memory. As it turns out, when we actually turned it up to 25 million, and I tried to run our code, it exploded. We ran out of memory. It turns out because we hadn’t tuned this at all, we’re using Jackson CSV, and we’re using this mapping iterator, readAll method. ReadAll what it does is it creates basically a list of maps. We wound up creating 25 million maps. They were probably just JDK HashMaps. It was like it blew out the memory space. What you need to do is, in order to scale using Jackson CSV, we use the iterator directly. We’re creating one row at a time, and then turning that into the conference row, not creating maps and then herding maps into conferences. This was much better. The manual savings here that was adding up when you talk about 25 million rows, you can see like, we’re saving now over 300 Meg compared to DataFrame-EC. That manual tuning is starting to pay off.

What Will the Future Bring?

We could talk a little bit about what’s going to happen with Java.

Mehmandarov: We talked quite a bit about actually how we can fine-tune, and we can do it by knowing our data or using the right data structure for that. There is also things happening in Java world that will bring the size of the memory footprint down a bit. There are a few projects that are working in different directions. They’re doing different things. All in all, at the end, at least two of them for sure, and the third one kind of, will influence the value of its memory footprint. Project Lilliput works with techniques to downsize the Java object headers in general, in the HotSpot JVM, from 128 bits to 64 bits or less. Project Valhalla will also have this thing called value objects. If you want to read more about that, there are links to both descriptions of the project, but also to, in this case, a blog post from Brian Goetz, where he explains how this works. The interesting call to take from here, it says, like primitives, the value set of an inline class is the set of instances of that class, not object references. Also, Project Amber, which is not exactly memory related, but still will influence it, and also help to think about data-oriented programming and approach to that in Java. You can also read about that. Also, you can read this article by Brian as well, which is a really interesting insight of data-oriented programming in Java.

Summary

What can we say? Data-oriented programming in Java is actually possible. You don’t need to go for another framework, language, whatever. You can do it. It’s feasible. It’s flexible. It can be fun with all these fine-tuning and small things that you can do, and see your memory footprint go down. You can do all these fun things in your favorite language. Understanding and measuring your data is the most important key that will be there, no matter what you choose, no matter what framework, language, whatever you will go for, it will be a very important part of it. You should use tools to measure that. You should consider using pooling to get a lot of memory benefits, especially if you have repeating values and values that are not absolutely 100% unique. Object compression is also possible. It’s possible using smaller primitive types with fixed number ranges. We need to think about also column-based approach versus domain-oriented or object-oriented approach to the structure. When we do column-based, you should try to stick to primitives. Primitives will generate a lot of memory savings. You should think about providing support for smaller integral types that can add to the memory savings. When it comes to domain-oriented approach, we should think about how it can be tuned manually, because here, you need to know your data. You need to know how it’s put together. You need to know where you can cut things. One of the most important things, one of the most low-hanging fruits, probably, is to convert things into immutable data structures. After you’re done playing around with it, doing something with it, put it into immutable data structures, and just leave them there because it will give you a better memory footprint.

See more presentations with transcripts

Presentation: Backends in Dart

MMS • Chris Swan

Transcript

Outline

Why Dart (Industry Big Picture)?

Language Features

JIT vs. AOT

Dart in Containers

Profiling and Performance Management

A Middle Way with JIT-Snapshot?

Review

Call to Action: Try Dart

Questions and Answers

Subscribe for MMS Newsletter

Did you know...

MongoDB CPO Sahir Azam on data sovereignty, empowering devs – The Stack

MMS • RSS

MongoDB CPO Sahir Azam emphasises fundamentals

“Always a key investment”

Produced in partnership with MongoDB

Subscribe for MMS Newsletter

Did you know...

AWS Adds New Code Generation Models to Amazon SageMaker JumpStart

MMS • Anthony Alford

About the Author

Anthony Alford

Subscribe for MMS Newsletter

Did you know...

Premarket Mover: Mongodb Inc (MDB) Up 0.48% – InvestorsObserver

MMS • RSS

Mentioned in this article

Important Dates for Investors in MDB:

Share this article:

Stay In The Know

Related Articles

You May Also Like

Subscribe for MMS Newsletter

Did you know...

Depth Analysis including key players: Arangodb, Azure Cosmos Db, Couchbase

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

NoSQL Database Market Development Status 2029 | ObjectLabs Corporation, Skyll …

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

MongoDB: the database provider scaling up to profit – Trustnet

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

Doing DynamoDB Better, More Affordably, All at Once – The New Stack

MMS • RSS

DynamoDB Challenge 1: Cost

DynamoDB Challenge 2: Cloud Vendor Lock-In

How ScyllaDB Helps Teams Overcome DynamoDB Challenges

Reducing Costs: iFood

Freedom from Cloud Vendor Lock-In: GE Healthcare

How ScyllaDB and DynamoDB Compare on Price and Performance

Is ScyllaDB Right for Your Team’s Use Case?

Subscribe for MMS Newsletter

Did you know...

MongoDB Insiders Sold US$24m Of Shares Suggesting Hesitancy – Simply Wall St News

MMS • RSS

The Last 12 Months Of Insider Transactions At MongoDB

MongoDB Insiders Are Selling The Stock

Insider Ownership Of MongoDB

What Might The Insider Transactions At MongoDB Tell Us?

Subscribe for MMS Newsletter

Did you know...

Presentation: Performance and Scale – Domain-Oriented Objects vs Tabular Data Structures

MMS • Donald Raab Rustam Mehmandarov

Transcript

The Problem with In-Memory Java Architectures (Circa & After 2004)

The (Simulated) Problem Today

Sample CSV Data to Load

Measuring Memory Cost in Java

Memory Considerations

Memory Cost – Boxed vs. Primitive

Memory Footprint – Boxed vs. Alternate vs. Primitive Sets

Memory Footprint – Mutable vs. Immutable Sets