Database warhorse SQL Server 2025 goes all-in on AI – The Register

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Ignite A new version of Microsoft’s database warhorse, SQL Server, is on the way, with some useful improvements squeezed between the inevitable artificial intelligence additions.

New in SQL Server 2025 will be performance and availability enhancements lifted from Azure SQL. According to a Microsoft spokesperson, there’s optimized locking, optional parameter plan optimization, faster batch mode, and columnstore indexing in the release. There is also REST API support alongside Regular Expression enablement.

“Additionally, native JSON support enables developers to more effectively deal with frequently changing schema and hierarchical data, facilitating the creation of more dynamic applications,” the spokesperson said.

There’s support for Entra managed identities, which Microsoft says will improve credential management and compliance, and failover reliability has also been enhanced. And, of course, Copilot is in SQL Server Management Studio to “streamline SQL development by offering real-time suggestions, code completions, and best practice recommendations.”

Unsurprisingly, Microsoft is going all-in with AI in this release. “SQL Server 2025 has AI built-in, simplifying AI application development and retrieval-augmented generation (RAG) patterns with secure, performant, and easy-to-use vector support, leveraging the T-SQL language,” the company said.

“In this latest SQL Server version, flexible AI model management within the engine using REST interfaces allows our customers to use AI models from ground to cloud.”

Microsoft SQL Server is just over 35 years old – older, if one considers its Sybase origins – and the most recent release, SQL Server 2022, will remain in mainstream support until January 11, 2028. Extended support will go to January 11, 2033. The spokesperson told us that SQL Server 2025 would likely follow Microsoft’s Fixed Lifecycle policy, with five years of mainstream support followed by another five years of extended support.

Assuming SQL Server 2025 makes it to general availability in 2025 – it is currently in Private Preview – this translates to support until at least 2035.

If SQL Server 2022 was all about making everything “Azure-enabled,” SQL Server 2025 reflects Microsoft’s obsession with AI. “SQL Server 2025 transforms SQL Server into an enterprise AI-ready database, bringing AI to customers’ data in a secure, efficient manner,” the spokesperson said.

“This release continues SQL Server’s legacy of impressive performance and security, adding new features and AI assistance that optimizes customer data for the era of AI.”

As before, the company was tight-lipped on costs, although pay-as-you-go licensing for on-premises customers is available with Azure Arc integration.

It is hard to say if this might be the last hurrah for SQL Server. Microsoft has various alternative database options these days, and hybrid and cloud-based services. But there will always be customers who want to keep their data out of the cloud and firmly on-premises.

The spokesperson was non-committal: “The SQL Server schedule is dependent on industry trends, customer feedback, and our strategic vision. We will continue to evaluate SQL Server releases according to these factors as time continues.” ®

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MongoDB Expands Azure Integrations, Boosts Real-Time Analytics And GenAI – CRN

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

The database and development platform provider is announcing a number of initiatives at Microsoft Ignite this week that make it easier for customers and partners to work with MongoDB on Azure cloud.

MongoDB is extending the scope of integrations between its cloud database development platform and Microsoft Azure, a move the company says will make it easier for partners and customers to build real-time data analytics links and develop generative AI applications.

In a series of announcements today at this week’s Microsoft Ignite conference, MongoDB is integrating the MongoDB Atlas cloud database with Microsoft’s Azure OpenAI services and launching its MongoDB Enterprise Advanced database management tools on the Azure Marketplace.

MongoDB said the new integrations will provide partners and customers with greater flexibility in data development on Azure – particularly to help meet the exploding demand for data for AI and generative AI applications.

[Related: MongoDB CEO Ittycheria: AI Has Reached ‘A Crucible Moment’ In Its Development]

“I think the pace is phenomenal, things are changing daily,” said Alan Chhabra, MongoDB executive vice president of worldwide partners, speaking in an interview with CRN about the rapid growth of AI and GenAI development. He said experimentation with GenAI, especially within larger enterprises, “is through the roof.”

Despite competing with Microsoft and its Azure Cosmos database, MongoDB has been steadily expanding its alliance with Microsoft – along with its partnerships with Amazon Web Services and Google Cloud – in recent years.

Last year MongoDB extended its multi-year strategic partnership with Microsoft, committing to a broad range of initiatives including close cooperation between the two companies’ sales teams and making it easier to migrate database workloads to MongoDB Atlas on Azure. That followed steps in 2022 that allowed developers to work with MongoDB Atlas through the Azure Marketplace and Azure Portal.

“Microsoft has become our fastest growing partnership,” Chhabra said, noting how MongoDB and Microsoft sales representatives cooperate in selling MongoDB for Azure, particularly for AI and GenAI development.

At the Ignite event Tuesday MongoDB announced that customers building applications powered by retrieval-augmented generation (RAG) can now select MongoDB Atlas as a vector store in the Microsoft Azure AI Foundry, combining MongoDB Atlas’s vector capabilities with generative AI tools and services from Microsoft Azure and Azure Open AI Service.

That makes it easier for customers to enhance large language models (LLMs) with proprietary data and build unique chatbots, copilots, internal applications, or customer-facing portals that are grounded in up-to-date enterprise data and context, the company said.

Chhabra said the new capabilities are designed to help customers develop and deploy GenAI applications. “It’s not easy. There’s a lot of confusion. There’s also a lot of experimentation, because everyone knows they need to use it [but] they’re not sure how.

“This integration will make it way easier and seamless for customers to deploy RAG applications leveraging their proprietary data in the combination of their LLMs,” Chhabra said.

In May MongoDB launched the MongoDB AI Applications Program (MAAP) that provides a complete technology stack, services and other resources to help businesses develop and deploy at scale applications with advanced generative AI capabilities.

Chhabra said MongoDB systems integration and consulting partners will benefit from the new integrations “because we’re making it easier for them to deploy Gen AI pilots and help them take it to production for customers.”

While large enterprises are conducting lots of AI development and experimentation in-house, Chhabra said SMBs are looking for more complete packaged AI and GenAI solutions.

“I believe there’s a large play for ISV application [developers] who are building purpose-built GenAI applications in the cloud on Azure, leveraging the MongoDB stack, leveraging our MAAP program,” Chhabra said. “So instead of customers having to build, they can buy GenAI solutions. When big companies like Microsoft work with cutting-edge growing companies like MongoDB, we make it easier for customers and partners to deploy GenAI [and] the whole ecosystem benefits.”

In another announcement at Ignite, MongoDB said users looking to maximize insights from operational data can now do so in near real-time with Open Mirroring in Microsoft Fabric for MongoDB Atlas. That connection keeps data in sync between MongoDB Atlas and OneLake in Microsoft Fabric, enabling the generation of near real-time analytics, AI-based predictions, and business intelligence reports, according to MongoDB.

And the announced launch of MongoDB Enterprise Advanced on Azure Marketplace for Azure Arc-enabled Kubernetes applications gives customers more flexibility to build and operate applications across on-premises, hybrid, multi-cloud, and edge Kubernetes environments.

Eliassen Group, a Reading, Mass.-based strategic consulting company that provides business, clinical, and IT services, will use the new Microsoft integrations to drive innovation and provide greater flexibility to their clients, MongoDB said.

“We’ve witnessed the incredible impact MongoDB Atlas has had on our customers’ businesses, and we’ve been equally impressed by Microsoft Azure AI Foundry’s capabilities. Now that these powerful platforms are integrated, we’re excited to combine the best of both worlds to build AI solutions that our customers will love just as much as we do,” said Kolby Kappes, vice president – emerging technology, at Eliassen Group, in a statement.

The new extensions to the Microsoft alliance come a little more than a month after MongoDB debuted MongoDB 8.0, a significant update to the company’s core database that offered improved scalability, optimized performance and enhanced enterprise-grade security.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Navigating LLM Deployment: Tips, Tricks, and Techniques

MMS Founder
MMS Meryem Arik

Article originally posted on InfoQ. Visit InfoQ

Transcript

Arik: I’m Meryem. I’m co-founder and CEO of TitanML. My background is, I was a physicist, originally, turned banker, then turned AI, but always really interested in emerging tech. We at TitanML built the infrastructure to make serving LLMs efficiently, much better. I’m going to frame today through a conversation that I had at a wedding last summer. No one really understands what we do, at least they didn’t before ChatGPT came out. They’re starting to now. I always find myself having to have this conversation over again. Fortunately, it wasn’t actually me having this conversation. It was my co-founder who I was at the wedding with, we’re all university friends. This is the conversation. Russell, he’s also a friend of mine from university. He’s a data scientist at a hedge fund. Really smart guy. This is Jamie. He is my co-founder. He’s our chief scientist. He essentially is the person that makes our inference server really fast.

Outline

What I’m going to do is I’m firstly going to explain why LLM deployment is hard, because a lot of people don’t necessarily appreciate that it is. Then I’m going to give an assortment, I think it’s seven, that I landed on, tips, tricks, and techniques for better LLM deployments.

Why is LLM (AI) Deployment Hard?

We’ll start with this conversation. Typically, it’s like, what have you been up to? Then he’s like, I’ve been working on making LLM serving more easy. Then he says, is LLM deployment even hard, don’t I just call the OpenAI API? Then he’s like, sort of. Because everyone, when they think of LLMs, just thinks of OpenAI. APIs are really easy to call. You might be like, why is she even here talking? I can’t just call the OpenAI API. Everyone here knows how to do that. However, there are more than one ways that you can access LLMs. You can use hosted APIs. I have a bunch of them here, OpenAI, Cohere, Anthropic, AI21 Labs. These are all situations where they’ve done the hosting for you and they’ve done the deployment for you. All you have to do is call into them. I don’t want to minimize it too much, because there’s still complexity you have there. You still have to do things like hallucination reduction, but they’ve done a lot of the heavy lifting. For a lot of use cases, you might want to self-host. This is when you’re calling into like a Mistral, or you’re hosting a Llama, or one of the others. Essentially, you’re hosting it in your own environment, whether that’s VPC or on-prem environment.

He’s like, but why would I want to self-host anyway? To which we say, lots of reasons. There’s broadly three reasons why you might want to self-host. Firstly, there’s decreased cost at scale. It is true that if you’re just doing proof of concepts, then OpenAI API based models are much cheaper. If you’re deploying at scale, then self-hosting ends up being much cheaper. Why does it become much cheaper? Because you only have one problem to solve, which is your particular business problem. You’re able to use much smaller models to solve the same problem. Whereas OpenAI, they’re hosting a model that has to solve both coding and also writing Shakespeare, so they have to use a much bigger model to get the same output.

At scale, it’s much cheaper to use self-hosted models. Second reason why you might want to self-host is you have improved performance as well. When you’re using a task specific LLM, or you fine-tuned it, or you’ve done something to make it very narrow to your task, you end up typically getting much better performance. Here’s a couple of snippets from various blogs, although I think they’re a bit old now, but the point still stands. Then the third reason, which is why most of our clients self-host, which is privacy and security. If you’re part of a regulated industry maybe for GDPR reasons, or your compliance team, then you might have to self-host as well. These are the three main reasons why you should self-host. If these aren’t important to you, use an API.

Typically, we find that the reasons why enterprises care about open source, and I have, I think, a couple graphs from a report by the VC, a16z. The three main reasons are control, customizability, and cost. The biggest one by far is control. Being able to have that AI independence, that if OpenAI decides to fire its CEO again, that you will still have access to your models, which is important, especially if you’re building really business important applications. The majority of enterprises also seem to agree that these reasons are important to them. The vast majority of enterprises, apart from 18%, expect to shift to open source, either now or when open sources matches the performance of a GPT-4 quality model. If you are looking to self-host, you are very much not alone, and most enterprises are looking to build up that self-hosted capability.

Russell, he works at a hedge fund, he’s like, privacy is really important for my use case, so it makes sense to self-host. How much harder can it really be? I hear this all the time, and it infuriates me. The answer is a lot harder. You really shouldn’t ignore the complexity that you can’t see. When you call an API based model, you benefit from all of the hard work that their engineers have done under the hood to build that inference and serving infrastructure. In fact, companies like OpenAI have teams of 50 to 100 managing this infra. Things like model compression, like Kubernetes, batching servers, function calling, JSON forming, runtime engines, are all the things you don’t have to worry about when you’re using the API based model, but you do suddenly have to worry about when you’re self-hosting.

He’s like, but I deploy ML models all the time. You might have been deploying XGBoost models or linear regression models in the past. How much harder can it really be to deploy these LLMs? To which we say, do you know what the L stands for? It’s way harder to deploy these models. Why? The first L in LLM stands for large language model. I remember when we started the company, we thought a 100 million parameter BERT model was large. Now a 7 billion parameter model is considered small, but that is still 14 gig, and that is not small. GPUs are the second reason why it is much harder. GPUs are much harder to work with than CPUs. They’re much more expensive, so using them efficiently really matters. Doesn’t really matter if you don’t use your CPUs super efficiently, because they’re a couple orders of magnitude cheaper.

That cost, latency, performance tradeoff triangle that we sometimes talk about is really stark with LLMs in a way that it might not have been previously. The third reason why it’s really hard is the field is evolving crazy fast. Half of the techniques that we use to serve and deploy and optimize models didn’t exist a year ago. Another thing that I don’t have here, but maybe it’s worth mentioning, is also the orchestration element. Typically, with these large language model applications, you have to orchestrate a number of different models. RAG is a perfect example of this. You have to orchestrate in the very classic sense, an embedding model and a generation model. If you’re doing state of the art RAG, you’ll probably need a couple models for your parses, maybe an image model and a table model, and then you’ll need a reranker. Then you end up with five or six different models. That gets quite confusing. Plus, there’s all the other reasons why deploying applications is hard, like scaling and observability.

Tips to Make LLM Deployment Less Painful

He then says something like, that sounds really tricky. What can I do? Then Jamie says, “Luckily, Meryem has some tips and tricks that make navigating LLM deployment much easier.” That’s what exactly he said. We’ll go through my tips to make LLM deployment less painful. It’ll still suck, and it’ll still be painful, it might be less painful.

1. Know Your Deployment Boundaries

My first tip is that you should know your deployment boundaries. You should know your deployment boundaries when you’re building the application. Typically, people don’t start thinking about their deployment boundaries until after they’ve built an application that they think works. We think that you should spend time thinking about your requirements first. It’ll make everything else much easier. Thinking about stuff like, what are your latency requirements? What kind of load are you expecting? Are you going to be deploying an application that might have three users at its peak, or is this going to be the kind of thing like DoorDash, where you’re deploying to 5 gazillion users? What kind of hardware do you have available? Do you need to deploy on-prem, or can you use cloud instances? If you have cloud instances, what kind of instances do you have to have?

All of these are the kind of things that you should map out before. You might not know exactly, so it’s probably a range. It is acceptable if my latency is below a second, or above X amount. It’s just good things to bear in mind. Other things that I don’t have here is like, do I need guaranteed JSON outputs? Do I need guaranteed regex outputs? These are the kinds of things that we should bear in mind.

2. Always Quantize

If you have these mapped out, then all of the other decisions will be made much easier. This goes on to my next point, which is, always quantize. I’ll tell you why it links to my first point earlier. Who knows who Tim Dettmers is? This guy is a genius. Who knows what quantization is? Quantization is essentially model compression. It’s when you take a large language model and you reduce the precision of all of the weights to whatever form you want. 4-bit is my favorite form of quantization, going from an FP32. The reason why it’s my favorite is because it’s got a really fantastic accuracy compression tradeoff. You can see here, in this we have accuracy versus model bits, so the size of the model. Let’s say the original is FP16. It’s actually not, it’s normally 32.

That’s your red line there. We can see that when we compress the model down, we’ll go 10 to the 10, for a given resource size, you can see that the FP16, red line, is actually the worst tradeoff. You’re way better off using a FP8 or an INT4 quantized model. What this graph is telling you is that for a fixed resource, you’re way better off having a quantized model of the same size than the unquantized model. We start with the infra and we work backwards. Let’s say we have access to L40S, and we have that much VRAM. Because I know my resources that I’m allowed, I can look at the models that I have available to me, and then work backwards. I have 48 gigs of VRAM. I have a Llama 13 billion, so that’s 26. That’s all good. That fits. I have a Mixtral which is current state of the art for open-source models. That’s not going to work.

However, I have a 4-bit quantized Mixtral which does fit, which is great. I now know which models I can even pick from, and I can start experimenting with. That graph that I showed you earlier with Tim Dettmers, that tells me that my 4-bit model will be better performing, probably. Let’s say my Llama was also the same size, my 4-bit model will be better performing than my Llama model, because my model retains a lot of that accuracy from when it was really big and compressed down. We start with our infra and work backwards. We essentially find the resources that we can fit, and then find the 4-bit quantized model that’ll fit in those resources. The chances are that’s probably the best accuracy that you can get for that particular model.

3. Spend Time Thinking About Optimizing Inference

Tip number three, spend a little bit of time thinking about optimizing inference. The reason why I tell people spend just a little bit of time optimizing inference is because the naive things that you would do when you’re deploying these models is typically completely the wrong thing to do. You don’t need to spend a huge amount of time thinking about this, but just spending a little bit of time can make multiple orders of magnitude difference to GPU utilization. I can give one example of this, batching strategies. Essentially, batching is where multiple requests are processed in parallel. The most valuable thing when you’re deploying these models that you have is your GPU utilization. GPUs, I think I said earlier, are really expensive, so it’s very important that we utilize them as much as we can. If I’m doing no batching, then this is more or less the GPU utilization that I’ll get, which is pretty bad. The naive thing to do would either be to do no batching or dynamic batching.

Dynamic batching is the standard batching method for non-Gen AI applications. It’s the kind of thing that you might have built previously. The idea is that you wait a small amount of time before starting to process a request. Group any of those requests that arrive during that time, and then process them together. In generative models, this leads to a downtime in utilization. You can see that it starts really high and then it goes down, because users will get stuck in the queue waiting for longer generations to finish. Dynamic batching is something that you might try naively, but it actually tends to be a pretty bad idea. If you spend a little bit of time thinking about this, you can do something like continuous batching. This is what we do.

This is a GPU utilization graph that we got a couple weeks ago, maybe. This the state-of-the-art batching technique designed for generative models. You let incoming requests interrupt in-flight requests in order to keep that GPU utilization really high. You get much less queue waiting, and much higher resource utilization as well. You can see going from there to there is maybe one order of magnitude difference in GPU costs, which is pretty significant. I’ve not done anything to the model, nothing will impact accuracy there.

Second example I can give you is with parallelism strategies. For really large models, you often can’t inference them on a single GPU. For example, a Llama 70 billion, or a Mixtral, or a Jamba, for example, they’re really hefty models. Often, I’ll need to split them across multiple GPUs in order to be able to inference them. You need to be able to figure out how you’re going to essentially do that multi-GPU inference. The naive way to do this, and actually this is probably the most popular way to do this, in fact, common inference libraries like Hugging Face’s Accelerate, does this, is you split the model layer by layer. It was a 90-gigabyte model. I have 30 on one, 30 on one, and then 30 on the third GPU. At any one time only one GPU is active, which means that I’m paying for essentially three times the number of GPUs that I’m actually using at any one time.

That’s just because I split them in this naive way, because my next GPU is having to wait for my previous GPU. That’s really unideal. This is what happens in Hugging Face Accelerate library, if you want to look into that. Tensor Parallel is what we think is the best one, which is, you essentially split the model lengthwise so that every GPU can be fully utilized at the same time for each layer, so it makes inference much faster, and you can support arbitrarily large models as well with enough GPUs. Because at every single point, all of your GPUs are firing, you don’t end up paying for that extra resource. In this particular example, we’ve got, for this particular model, a 3x model, a GPU utilization improvement. Combining that with the order of magnitude we had before, that’s a really significant GPU utilization improvement. It’s not a huge amount of time to think about this, but if you just spend that little bit of time, then you might end up improving what you can put on those GPUs.

4. Consolidate Infrastructure

What have I done so far? I’ve done, think about your deployment requirements, quantize, inference optimization. Fourth one is, consolidate your infrastructure. Gen AI is so computationally expensive that it really benefits from consolidation of infrastructure, and that’s why central MLOps teams like Ian runs, make a lot of sense. For most companies, ML teams tend to work in silos, and therefore are pretty bad at consolidation of infrastructure. It wasn’t really relevant for previous ML sources. Deployment is really hard, so it’s better if you deploy once, you have one team managing deployment, and then you maintain that, rather than having teams individually doing that deployment, because then each team individually has to discover that this is a good tradeoff to make. What this allows is it allows the rest of the org to focus on that application development while the infrastructure is taken care of.

I can give you an example of what this might look like. I will have a central compute infrastructure, and maybe as a central MLOps team, I’ve decided that my company can have access to these models, Llama 70, Mixtral, and Gemma 7B. I might periodically update the models and improve the models. For example, when Llama 7 comes out, instead of Llama 2, I might update that. These are the models that I’ll host centrally. Then all of those little yellow boxes are my application development teams. They’re my dispersed teams within the org. Each of them will be able to get access to my central compute infrastructure, and personalize it in the way that works for them. One of them might add a LoRA, which is essentially a little adapter that you can add to your model when you fine-tune it. It’s very easy to firstly do, and then also add into inference. Then maybe I’ll add RAG as well. RAG is when we give it access to our proprietary data, so our vector store, for example.

I have each of my application teams building LoRA’s RAGs, LoRA’s RAGs. Maybe I don’t even need LoRAs, and I can just do prompt engineering, for example, and my central compute is all managed by one team, and it’s just taken care of. The nice thing about this is what you’re doing is you’re giving your organization the OpenAI experience, but with private models. If I’m an individual developer, I don’t think about the LLM deployment. Another team manages it. It sits there, and I just build applications on top of the models we’ve been given access to. This is really beneficial. Things to bear in mind. Make sure your inference server is scalable. LoRA adapter support is super important if you want to allow your teams to fine-tune. If you do all of this, you’ll get really high utilization of GPUs. Because, remember, GPU utilization is literally everything. I say literally everything. There’s your friends, and there’s your family, and then there’s GPU utilization. If we centrally host this compute, then we’re able to get much higher utilization of those very precious GPUs.

I can give you a case study that we did with a client, RNL, it’s a U.S. enterprise. What they had before was they had four different Gen AI apps. They were pretty ahead at the time. They built all of this last year. Each app was sitting on its own GPU, because they were like, they’re all different applications. They’ve all got their own Embedders, their own thing going on. They gave them each their own GPUs, and as a result, got really poor GPU utilization, because not all the apps were firing all the time. They weren’t all firing at capacity. What we did with them is something like this. It doesn’t have to be Titan, it can be any inference server. They had Mixtrals and Embedders, essentially, is all they had. We hosted a Mixtral and an Embedder on one server and exposed those APIs. The teams then built on top of those APIs, sharing that resource. Because they were sharing the resource, they could approximately half the number of GPUs that they needed. We were able to manage both the generative and the non-generative in one container. It was super easy for those developers to build on top of. That’s the kind of thing that if you have a central MLOps team, you can do, and end up saving a lot of those GPU times.

5. Build as if You Are Going to Replace the Models Within 12 Months

My fifth piece of advice is, build as if you’re going to replace the models within 12 months, because you will. One of our clients, they deployed their first application with Llama 1 last year. I think they changed the model about four times. Every week they’re like, this new model came out. Do you support it? I’m like, yes, but why are you changing it for the sixth time? Let’s think back to what state of the art was a year ago. A year ago, maybe Llama had come out by then, but if before that, it might have been the T5s. The T5 models were the best open-source models. What we’ve seen is this amazing explosion of the open-source LLM ecosystem. It was all started by Llama and then Llama 2, and then loads of businesses had built on top of that.

For example, the Mistral 70B was actually built with the same architecture that Llama was. We had the Falcon out of the UAE. We had Mixtral by Mistral. You have loads of them, and they just keep on coming out. In fact, if you check out the Hugging Face, which is where all of these models are stored, if you check out their leaderboard of open-source models, the top model changes almost every week. Latest and greatest models come out. These models are going to keep getting better. This is the performance of all models, both open source and non-open source, as you can see the license, proprietary or non-proprietary. The open-source models are just slowly scaling that leaderboard. We’re starting to get close to parity between open source and non-open source. Right now, the open-source models are there or thereabouts, with GPT-3.5. That was the original ChatGPT that we were all amazed by.

My expectation is that we’ll get to GPT-4 quality within the next year. What this means is that you should really not wed yourself to a single model or a single provider. Going back to that a16z report that I showed you earlier, most enterprises are using multiple model providers. They’re building their inference stack in a way that it’s interoperable, in a way that if OpenAI has a meltdown, I can swap it out for a Llama model. Or, in a way that if Claude is now better than GPT-4 as it is now, I can swap them really easily. Building with this interoperability in mind is really important. I think one of the greatest things that OpenAI has blessed us with is not necessarily their models, although they are really great, but they have actually counterintuitively democratized the AI landscape, not because they’ve open sourced their models, because they really haven’t, but because what they’ve done is they’ve provided uniformity of APIs to the industry. If you build with the OpenAI API in mind, then you’ll be able to capture a lot of that value and be able to swap models in and out really easily.

What does this mean for how you build? API and container-first development makes life much easier. It’s fairly standard things. Abstraction is really good, so don’t spend time building custom infrastructure for your particular model. The chances are you’re not going to use it in 12 months. Try and build more general infra if you’re going to. We always say that at this current stage where we’re still proving value of AI in a lot of organizations, engineers should spend their time building great application experiences rather than fussing with infrastructure. Because right now, for most businesses, we’re fortunate enough to have a decent amount of budget to go and play and try out this Gen AI stuff.

We need to prove value pretty quickly. We tend to say, don’t work with frameworks that don’t have super wide support for models. For example, don’t work with a framework that only works with Llama, for example, because it’ll come back to bite you. Whatever architecture you pick or infrastructure you pick, making sure that when Llama 3, 4, 5, Mixtral, Mistral comes out, they will help you adopt it. I can go back to this case study that I talked about before. We built this in a way, obviously, that it’s super easy to swap that Mixtral for Llama 3, when Llama 3 comes out. For example, if a better Embedder comes out, like a really good Embedder came out a couple weeks ago, we can swap that out easily too.

6. GPUs Look Really Expensive, Use Them Anyway

My sixth one, GPUs look really expensive. You should use them anyway. GPUs are so phenomenal. They are so phenomenally designed for Gen AI and Gen AI workloads. Gen AI involves doing a lot of calculations in parallel, and that happens to be the thing that GPUs are incredibly good at. You might look at the sticker price and be like, it’s 100 times more expensive than a CPU. Yes, it is, but if you use it correctly and get that utilization you need out of it, then you’ll end up processing orders of magnitude more, and per request, it will be much cheaper.

7. When You Can, Use Small Models

When you can, use small models. GPT-4 is king, but you don’t get the king to do the dishes. What the dishes are: GPT-4 is phenomenal. It’s a genuinely remarkable piece of technology, but the thing that makes it so good is also that it is so broad in terms of its capabilities. I can use the GPT-4 model to write love letters, and you can use it to become a better programmer, and we’re using the exact same model. That is mental. That model has so many capabilities, and as a result, it’s really big. It’s a huge model, and it’s very expensive to inference. What we find is that you tend to be better off using GPT-4 for the really hard stuff that none of the open-source models can do yet, and then using smaller models for the things that are easier. You can massively reduce cost and latency by doing this. When we talked about that latency budget that you had earlier, or those resource budgets that you had earlier, you can go a long way to maximizing that resource budget if you only use GPT-4 when you really have to.

Three commonly seen examples are like RAG Fusion. This is when your query is edited by a large language model, and then all queries are searched against, and then the results are ranked to improve the search quality. For example that, you can get very good results by not using GPT-4, only using GPT-4 when you have to. You might, for example, with RAG, use a generative model just to do the reranking, so just check at the end that the thing that my Embedder said was relevant, was really relevant. Small models, especially fine-tuned models for things like function calling are really good. One of the really common use cases for function calling is if I need my model to output something like JSON or regex, there are broadly two ways that I could do this. Either I could fine-tune a much smaller model, or I could add controllers to my small model. A controller is really cool. A controller is essentially when, if I’m self-hosting the model, I can ban my model from saying any tokens that would break a JSON schema or that would break a regex schema that I don’t want. Stuff like that, which actually is majority of enterprise use cases, you don’t necessarily need to be using those API based models, and you can get really immediate cost and latency benefits.

Summary

Figure out your deployment boundaries and work backwards. Because you know your deployment boundaries, you know that you should pick the model that when you’ve quantized it down is that size. Spend time thinking about optimizing inference so that can make the difference of genuinely multiple orders of magnitude. Gen AI benefits from consolidation of infrastructure, so try to avoid having each team being responsible for their deployments, because it will probably go wrong. Build as if you’re going to replace your model in 12 months. GPUs look expensive, but they’re your best option. When you can, you’ll use small models. Then we said all of this to Russell, and then he was like, “That was so helpful. I’m so excited to deploy my mission critical LLM app using your tips.” Then we said, “No problem, let us know if you have any questions”.

Questions and Answers

Participant 1: You said, build for flexibility. What are the use cases for frequent model replacements? The time and effort we have spent on custom fine-tuning, on custom data, will have to be repeated? Do you have any tips for that in case of frequent model replacements?

Arik: When would you want to do frequent model replacement? All of the time. With the pace of LLM improvement, it’s almost always the case that you can get better performance, literally just by swapping out a model. You might need some tweaks to prompts, but typically, just doing a one-to-one switch works. For example, if I have my application built on GPT-3.5 and I swap it out for GPT-4, even if I’m using the same prompt, the chances are my model performance will go up, and that’s a very low effort thing to do. How does that square with things like the engineering effort required to swap? If it is a month’s long process, if it’s not a significant improvement, then you shouldn’t make that switch. What I would suggest is trying to build in a way where it’s not a month’s long process and actually can be done in a couple days, because then it will almost always be worth that switch.

How does that square as well with things like fine-tuning? I have a spicy and hot take, which is, for the majority of use cases, you don’t need to fine-tune. Fine-tuning was very popular in deep learning of a couple years ago. As the models are getting better, they’re also better at following your instructions as well. You tend to not need to fine-tune for a lot of use cases, and can just get away with things like RAG, prompt engineering, and function calling. That’s what I would tend to say. If you are looking for your first LLM use case, speaking of swapping models, a really good first LLM use case is to just try and swap out your NLP pipelines. A lot of businesses have preexisting NLP pipelines. If you can swap them for LLMs, typically, you’ll get multiple points of accuracy boost.

Participant 2: How do you see the difference for the on-prem hardware, between enterprise grade hardware and consumer maxed out hardware, because I chose to go for consumer maxed out hardware because you go up to 6000 meg transfers on the memory, and the PCI lanes are faster.

Arik: Because people like him have taken all the A100s, when we do our internal development, we actually do it on 4090s, which is consumer hardware. They’re way more readily accessible, much cheaper as well than getting those data center hardware. That’s what we use for our development. We’ve not actually used consumer grade hardware for at-scale inference, although there’s no reason why it wouldn’t work.

If it works for your workload. We use it as well. We think they’re very good. They’re also just much cheaper, because they’re sold as consumer grade, rather than data center grade.

Participant 3: You’re saying that GPU is a whole and it’s most important. I’m a bit surprised, but maybe my question will explain. I made some proof of concept with small virtual machines with only CPUs, and I get quite good results with few requests per second. I did not ask myself about scalability. I’m thinking about how much requests shall we switch to GPUs?

Arik: Actually, maybe I was a bit strong on the GPU stuff, because we’ve deployed on CPU as well. If the latency is good enough, and that’s typically the first complaint that people get, is latency, then CPU is probably fine. It’s just that when you’re looking at economies of scale and when you’re looking at scaling up, they will almost always be more expensive per request. If you have a reasonably low number of requests, and the latency is fine, then you can get away with it. I think one of our first proof of concepts with our inference server was done on CPU. One thing that you will also know is that you’ll be limited in the size of model that you can go up to. For example, if you’re doing a 7 billion quantized, you can probably get away with doing CPU as well. I think GPU is better if you are starting from a blank slate. If you’re starting from a point where you already have a massive data center filled with CPUs and you’re not using them otherwise, it is still worth experimenting whether you can utilize them.

Participant 4: I have a question regarding the APIs that are typically used, and of course, it’s OpenAI’s API that are typically used also by applications. I also know a lot of people who do not really like the OpenAI API. Do you see any other APIs around? Because a lot of people are just emulating them, or they are just using it, but no one really likes it.

Arik: When you say they don’t like it, do they not like the API structure, or don’t like the models?

Participant 4: It is about the API structure. It is about documentation. It is about states, about a lot of things that happen that you can’t fully understand.

Arik: We also didn’t really like it, so we wrote our own API that’s called as our inference server, and then we have an OpenAI compatible layer, because most people are using that structure. You can check out our docs and see if you like that better. I think because it was the first one to really blow up, it’s what the whole industry converged to when it comes to that API structure.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Operational Data Powers The Digital Economy – Forbes

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Andrew Davidson, Senior Vice President, Product Management, of MongoDB, building and managing your data in the cloud.

It’s a common adage that in today’s digital world, there’s an application, or software, for everything. But what exactly is software?

At its most basic level, software is a bridge between the world and a digital understanding of the world. And that digital understanding is maintained by operational data. Essentially, operational data is software’s heartbeat—it quietly drives the digital systems, decisions and processes that govern much of our lives.

More specifically, operational data is the real-time information generated from everyday processes, like convenience store transactions, hospital patient admissions or traffic data. It’s another way of referring to what some of us know as transactional data, or online transaction processing. Whatever you call it, the bottom line is that operational data is necessary for powering applications of all sorts that complete immediate tasks and generate insights that help us respond intelligently to the world’s changing conditions.

Consider the role of operational data in a hospital. When a patient is admitted, a data record is created and then updated with new information about the patient’s stay in the facility. That record can be referenced on its own or fed into software that helps medical providers give personalized and timely care to the patient. Put together, patient records give software insights that help the hospital scale workforces and triage patients based on real-time events.

On a day-to-day basis, this data supports patient care, but it can also play a critical role in adapting to out-of-the-ordinary circumstances. This is all made possible because of the “bridge” that operational data creates between the world and our digital understanding of it.

For example, Northwell Health—a New York-based provider that serves over two million patients annually—dealt with this during the early days of Covid-19. To get ahead of the influx of Covid-19 patients, Northwell built a tool that captures real-time clinical data from its 19 emergency departments and 52 urgent care centers.

By monitoring infection trends at its outpatient facilities, Northwell identified Covid surge clusters days before they overwhelmed inpatient facilities—giving Northwell time to expand treatment capacity to serve 163,000 Covid-19 patients in a year. At one point, Northwell was treating the most Covid-19 patients of any health system in the U.S., thanks in part to its data-driven innovations.

Capturing Lost Data

In sum, data runs the world. And at its best, it can save lives. So adopting a modern operational data layer that centrally integrates an organization’s data—and makes it readily available to consuming applications—is of paramount importance on the journey to delivering smarter, safer and more reliable digital services.

But too many organizations lack the foundation of a modern operational data layer or are leaving vast amounts of their operational data untapped. For example, nearly two-thirds of organizations’ data is considered “dark data”—data that is inaccessible, difficult to retrieve and can cost more to store than it delivers in value.

Such dark data constitutes a huge organizational tax; a recent study found that 52% of the average company’s data storage budget is spent on dark data. And with Grand View Research estimating that the worldwide database software market is worth $100.79 billion, we can assume that billions are being lost annually to dark data. It’s also a cybersecurity threat: If you can’t readily access and maintain much of your data, how can you ensure its security? Dark data is thus a vast attack surface for hackers.

The reality is that terabytes of enterprise data are locked in silos or stuck in brittle systems that make it difficult to fully tap into. Poor data management systems can be blamed for inefficiencies and wasted resources, missed insights, slow response to security incidents and an inability to adapt to market shifts. Together, this means that only a small fraction of today’s operational data is leveraged to its full potential—it’s buried gold at best, and a liability at worst.

Investing in a strong operational data layer can unlock the immense value of this untapped data, and make it faster and more cost-effective to launch data-driven applications. Indeed, a 2021 McKinsey report found that organizations with data-driven cultures can boost their productivity by up to 20% and accelerate decision making.

AI’s Promise: A Smarter Future

With AI, the opportunities data presents are only growing. AI can facilitate the processes involved with architecting an operational data layer. Then, mountains of operational data can be funneled into AI-powered applications that enhance real-time predictions and deliver personalized insights to end-users—all while making it easy for anyone to work closely with data.

Examples of this already abound. Mount Sinai Health System implemented new AI algorithms that assess patient histories in combination with lab results to predict potential complications, reducing readmission rates for heart failure problems by 30%. And New Zealand’s Pathfinder Labs (a customer of MongoDB) uses AI to make it faster for detectives investigating cybercrimes against children to navigate volumes of evidence, catch more offenders and rescue more kids.

And in a more day-to-day example, the City of Los Angeles implemented an AI-powered traffic management system that uses adaptive traffic signaling to reduce traffic congestion—thereby shortening commutes and reducing vehicle emissions—while improving driver safety with smarter incident detection capabilities.

I could go on and on. The point is that right now, we have an opportunity to leverage one of our most ubiquitous resources—our data—to drive unprecedented levels of resilience, to more effectively adapt to a changing world and to revolutionize our quality of life. By harnessing the operational data layer, we can build a foundation for a transformative future. And I can’t wait to see where that data-fueled future takes us.


Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?


Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MongoDB Expands Microsoft Partnership with New AI, Analytics Integration – Stock Titan

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

MongoDB Atlas now available on Azure OpenAI Service

New Microsoft Fabric Mirroring integration with MongoDB Atlas allows for near real-time data syncs

MongoDB Enterprise Advanced now available on Azure Marketplace for Azure Arc-enabled Kubernetes applications

CHICAGO, Nov. 19, 2024 /PRNewswire/ — Today at Microsoft Ignite, MongoDB, Inc. (NASDAQ: MDB) announced an expanded collaboration with Microsoft that introduces three new capabilities for joint customers. First, customers building applications powered by retrieval-augmented generation (RAG) can now select MongoDB Atlas as a vector store in Microsoft Azure AI Foundry, combining MongoDB Atlas’s vector capabilities with generative AI tools and services from Microsoft Azure and Azure Open AI. Meanwhile, users looking to maximize insights from operational data can now do so in near real-time with Open Mirroring in Microsoft Fabric for MongoDB Atlas. And the launch of MongoDB Enterprise Advanced (EA) on Azure Marketplace for Azure Arc-enabled Kubernetes applications enables organizations that operate across on-premises, multi-cloud, and edge Kubernetes environments to choose MongoDB. With these capabilities, MongoDB is meeting customers where they are on their innovation journeys, and making it easier for them to unleash the power of data.

Through the strengthened MongoDB-Microsoft relationship, customers will be able to:

  • Enhance LLMs with proprietary data stored in MongoDB Atlas: Accessible through Azure AI Foundry, the Azure OpenAI Service allows businesses to develop RAG applications with their proprietary data in combination with the power of advanced LLMs. This new integration with Azure OpenAI Service enables users to take enterprise data stored in MongoDB Atlas and augment LLMs with proprietary context. This collaboration makes it easy to build unique chatbots, copilots, internal applications, or customer-facing portals that are grounded in up-to-date enterprise data and context. Developers are now able to add MongoDB Atlas as a vector data store for advanced LLMs, all without the need for additional coding or pipeline building. And through Azure AI Foundry’s “Chat Playground” feature, developers can quickly test how their enterprise data and selected LLM function together before taking it to production.
  • Generate key business insights faster: Microsoft Fabric empowers businesses to gather actionable insights from their data on an AI-powered Analytics platform. Now Open Mirroring in Microsoft Fabric with MongoDB Atlas will allow for a near real-time connection, to keep data in sync between MongoDB Atlas and OneLake in Microsoft Fabric. This enables the generation of near real-time analytics, AI-based predictions, and business intelligence reports. Customers will be able to seamlessly take advantage of each data platform without having to choose between one or the other, or without worrying about maintaining and replicating data from MongoDB Atlas to OneLake.
  • Deploy MongoDB Their Way: The launch of MongoDB EA on Azure Marketplace for Azure Arc-enabled Kubernetes applications gives customers greater flexibility when building applications across multiple environments. With MongoDB EA, customers are able to deploy and self-manage MongoDB database instances in the environment of their choosing, including on-premises, hybrid, and multi-cloud. The MongoDB Enterprise Kubernetes Operator, part of the MongoDB Enterprise Advanced offering, enhances the availability, resilience, and scalability of critical workloads by deploying MongoDB replica sets, sharded MongoDB clusters, and the Ops Manager tool across multiple Kubernetes clusters. Azure Arc further complements this by centrally managing these Kubernetes clusters running anywhere—in Azure, on premises, or even in other clouds. Together, these capabilities ensure that customers can build robust, distributed applications by leveraging the resilience of a strong data layer along with the central management capabilities that Azure Arc offers for its Arc-enabled Kubernetes applications.

“We frequently hear from MongoDB’s customers and partners that they’re looking for the best way to build AI applications, using the latest models and tools.” said Alan Chhabra, Executive Vice President of Partners at MongoDB. “And to address varying business needs, they also want to be able to use multiple tools for data analytics and business insights. Now, with the MongoDB Atlas integration with Azure AI Foundry, customers can power gen AI applications with their own data stored in MongoDB. And with Open Mirroring in Microsoft Fabric, customers can seamlessly sync data between MongoDB Atlas and OneLake for efficient data analysis. Combining the best from Microsoft with the best from MongoDB will help developers push applications even further.”

Joint Microsoft and MongoDB customers and partners welcome the expanded collaboration for greater data development flexibility.

Trimble, a leading provider of construction technology, delivers a connected ecosystem of solutions to improve coordination and collaboration between construction teams, phases and processes.

“As an early tester of the new integrations, Trimble views MongoDB Atlas as a premier choice for our data and vector storage. Building RAG architectures for our customers require powerful tools and these workflows need to enable the storage and querying of large collections of data and AI models in near real-time,” said Dan Farner, Vice President of Product Development at Trimble. “We’re excited to continue to build on MongoDB and look forward to taking advantage of its integrations with Microsoft to accelerate our ML offerings across the construction space.”

Eliassen Group, a strategic consulting company that provides business, clinical, and IT services, will use the new Microsoft integrations to drive innovation and provide greater flexibility to their clients.

“We’ve witnessed the incredible impact MongoDB Atlas has had on our customers’ businesses, and we’ve been equally impressed by Microsoft Azure AI Foundry’s capabilities. Now that these powerful platforms are integrated, we’re excited to combine the best of both worlds to build AI solutions that our customers will love just as much as we do,” said Kolby Kappes, Vice President – Emerging Technology, Eliassen Group.

Available in 48 Azure regions globally, MongoDB Atlas provides joint customers with the powerful capabilities of the document data model. With versatile support for structured and unstructured data, including Atlas Vector Search for RAG-powered applications, MongoDB Atlas accelerates and simplifies how developers build with data.

“By integrating MongoDB Atlas with Microsoft Azure’s powerful AI and data analytics tools, we empower our customers to build modern AI applications with unparalleled flexibility and efficiency,” said Sandy Gupta, VP, Partner Development ISV, Microsoft. “This collaboration ensures seamless data synchronization, real-time analytics, and robust application development across multi-cloud and hybrid environments.”

To read more about MongoDB Atlas on Azure go to https://www.mongodb.com/products/platform/atlas-cloud-providers/azure.

About MongoDB
Headquartered in New York, MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. Built by developers, for developers, MongoDB’s developer data platform is a database with an integrated set of related services that allow development teams to address the growing requirements for a wide variety of applications, all in a unified and consistent user experience. MongoDB has more than 50,000 customers in over 100 countries. The MongoDB database platform has been downloaded hundreds of millions of times since 2007, and there have been millions of builders trained through MongoDB University courses. To learn more, visit mongodb.com.

Forward-looking Statements
This press release includes certain “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, or the Securities Act, and Section 21E of the Securities Exchange Act of 1934, as amended, including statements concerning MongoDB’s deepened partnership with Microsoft. These forward-looking statements include, but are not limited to, plans, objectives, expectations and intentions and other statements contained in this press release that are not historical facts and statements identified by words such as “anticipate,” “believe,” “continue,” “could,” “estimate,” “expect,” “intend,” “may,” “plan,” “project,” “will,” “would” or the negative or plural of these words or similar expressions or variations. These forward-looking statements reflect our current views about our plans, intentions, expectations, strategies and prospects, which are based on the information currently available to us and on assumptions we have made. Although we believe that our plans, intentions, expectations, strategies and prospects as reflected in or suggested by those forward-looking statements are reasonable, we can give no assurance that the plans, intentions, expectations or strategies will be attained or achieved. Furthermore, actual results may differ materially from those described in the forward-looking statements and are subject to a variety of assumptions, uncertainties, risks and factors that are beyond our control including, without limitation: the effects of the ongoing military conflicts between Russia and Ukraine and Israel and Hamas on our business and future operating results; economic downturns and/or the effects of rising interest rates, inflation and volatility in the global economy and financial markets on our business and future operating results; our potential failure to meet publicly announced guidance or other expectations about our business and future operating results; our limited operating history; our history of losses; failure of our platform to satisfy customer demands; the effects of increased competition; our investments in new products and our ability to introduce new features, services or enhancements; our ability to effectively expand our sales and marketing organization; our ability to continue to build and maintain credibility with the developer community; our ability to add new customers or increase sales to our existing customers; our ability to maintain, protect, enforce and enhance our intellectual property; the effects of social, ethical and regulatory issues relating to the use of new and evolving technologies, such as artificial intelligence, in our offerings or partnerships; the growth and expansion of the market for database products and our ability to penetrate that market; our ability to integrate acquired businesses and technologies successfully or achieve the expected benefits of such acquisitions; our ability to maintain the security of our software and adequately address privacy concerns; our ability to manage our growth effectively and successfully recruit and retain additional highly-qualified personnel; and the price volatility of our common stock. These and other risks and uncertainties are more fully described in our filings with the Securities and Exchange Commission (“SEC”), including under the caption “Risk Factors” in our Annual Report on Form 10-Q for the quarter ended July 31, 2024, filed with the SEC on August 30, 2024, and other filings and reports that we may file from time to time with the SEC. Except as required by law, we undertake no duty or obligation to update any forward-looking statements contained in this release as a result of new information, future events, changes in expectations or otherwise.

Investor Relations
Brian Denyeau
ICR for MongoDB
646-277-1251
ir@mongodb.com 

Media Relations
MongoDB 
press@mongodb.com

Cision View original content to download multimedia:https://www.prnewswire.com/news-releases/mongodb-deepens-relationship-with-microsoft-through-new-integrations-for-ai-and-data-analytics-and-microsoft-azure-arc-support-302309318.html

SOURCE MongoDB, Inc.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Microsoft brings transactional databases to Fabric to boost AI agents – VentureBeat

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


For years, enterprise companies have been plagued by data silos separating transactional systems from analytical tools—a divide that has hampered AI applications, slowed real-time decision-making, and driven up costs with complex integrations. Today at its Ignite conference, Microsoft announced a major step toward breaking this cycle.

The tech giant revealed that Azure SQL, its flagship transactional database, is now integrated into Fabric, Microsoft’s unified data platform. This integration allows enterprises to combine real-time operational and other historical data into a single, AI-ready data later called OneLake. 

This announcement represents a critical evolution of Microsoft Fabric, its end-to-end data platform, which also includes new capabilities like real-time intelligence and the general availability of the OneLake catalog (see our full coverage of the Microsoft Ignite data announcements here). Together, these updates aim to address the growing demand for accessible, high-quality data in enterprise AI workflows.

Until now, companies have struggled to connect disparate data systems, relying on patchwork solutions to support AI applications. The urgency has only increased with the rise of AI agents—software tools capable of performing complex reasoning autonomously. These agents require instantaneous access to live and historical data to function effectively, a demand Microsoft aims to meet with Fabric.

And with AI agents becoming one one of the hottest trends for enterprise companies next year, Microsoft is pushing to lead here. See our separate coverage about how Microsoft is ahead in this race, and no one else is close.

The integration of Azure SQL is just the beginning of this integration of transactional data. Microsoft plans to extend support to other key transactional databases, including Cosmos DB, its NoSQL document database widely used in AI applications, and PostgreSQL, the popular open-source relational database. While timelines for these integrations remain unspecified, this marks a significant milestone in Microsoft’s effort to create a truly unified data platform. 

Microsoft also said it plans to integrate with popular open source transactional databases, including MongoDB, and Cassandra, but it’s unlikely Microsoft will prioritize integration with competing proprietary transactional databases like Couchbase and Google’s Bigtable.

The power of unified data integration

Arun Ulag, corporate vice president of Azure Data, emphasized in an interview that integrating transactional databases like Cosmos DB into Fabric is critical for enabling next-generation AI applications. For example, OpenAI’s ChatGPT—the fastest-growing consumer AI product in history—relies on Cosmos DB to power its conversations, context, and memory, managing billions of transactions daily.

As AI agents evolve to handle complex tasks like e-commerce transactions, the demand for real-time access to transactional databases will only grow. These agents rely on advanced techniques like vector search, which retrieves data based on semantic meaning rather than exact matches, to answer user queries effectively—such as recommending a specific book.

“You don’t have the time to…go run your RAG model somewhere else,” Ulag said, referencing retrieval-augmented generation models that combine real-time and historical data. “It has to be just built into the database itself.”

By unifying operational and analytical capabilities, Fabric allows businesses to build AI applications that seamlessly leverage live transactional data, structured analytics, and unstructured insights.

Key advancements include:

  • Real-time intelligence: Built-in vector search and retrieval-augmented generation (RAG) capabilities simplify AI application development, reducing latency and improving accuracy.
  • Unified data governance: OneLake provides a centralized, multi-cloud data layer that ensures interoperability, compliance, and easier collaboration.
  • Seamless code generation: Copilot in Fabric can automatically translate natural language queries into SQL, allowing developers to get inline code suggestions,  real-time explanations and fixes.

AI Skills: simplifying AI agent app development

One of the most dynamic announcements in Fabric is the introduction of AI Skills, a capability that enables enterprises to interact with any data – wherever it resides –  through natural language. They connect to Copilot Studio, so you can build AI agents that easily query this data across multiple systems, from transactional logs to semantic models.

Ulag said that if he had to pick one announcement that excites him the most, it would be AI Skills. With AI Skills, business users can simply point to any dataset — be it from any cloud, structured, or unstructured – and begin asking questions about that data, whether through natural language, SQL queries, Power BI business definitions, or real-time intelligence engines, he said. 

For example, a user could use AI Skills to identify trends in sales data stored across multiple systems or to generate instant insights from IoT telemetry logs. By bridging the gap between business users and technical systems, AI Skills simplifies the development of AI agents and democratizes data access across organizations.

As of today, AI Skills can connect with lakehouse and data warehouse tables, mirrored DB and shortcut data, and now semantic models and Eventhouse KQL databases. Support for unstructured data is “coming soon,” the company said. 

Differentiation in a crowded market

Microsoft faces fierce competition from players like Databricks and Snowflake on the data platform front, as well as AWS and Google Cloud in the broader cloud ecosystem—all of which are working on integrating transactional and analytical databases. However, Microsoft’s approach with Fabric is beginning to carve out a unique position.

By leveraging a unified SaaS model, seamless Azure ecosystem integration, and a commitment to open data formats, Microsoft eliminates many of the data complexities that have plagued enterprise data systems. Additionally, tools like Copilot Studio for building AI agents and Fabric’s deep integration across multi-cloud environments give it an edge (see my separate analysis [LINK] of Microsoft’s positioning around AI agents, which also appears to be industry-leading).

Microsoft’s ability to embed AI capabilities directly into its unified data environment “could provide a better experience for developers and data scientists,” said Robert Kramer, vice president at research firm Moor Insights, underscoring how Fabric’s design simplifies workflows and accelerates AI-driven innovation.

Key differentiators include:

  • Unified SaaS model: Fabric eliminates the need to manage multiple services, offering enterprises a single, cohesive platform that reduces complexity and operational overhead.
  • Multi-cloud support: Unlike some competitors, Fabric integrates with AWS, Google Cloud, and on-premises systems, enabling organizations to work seamlessly across diverse data environments.
  • AI-optimized workflows: Built-in support for vector similarity search and retrieval-augmented generation (RAG) streamlines the creation of intelligent applications, cutting development time and improving performance.

Microsoft’s strategy to unify and simplify the enterprise data stack not only meets the demands of today’s AI-centric workloads but also sets a high bar for competitors in the rapidly evolving data platform market.

The road ahead: where Fabric fits in the AI ecosystem

The integration of transactional databases into Fabric marks a significant milestone, but it also reflects a broader shift across the enterprise data landscape: the race toward seamless interoperability. With AI agents becoming a cornerstone of enterprise strategy, the ability to unify disparate systems into a cohesive architecture is no longer optional—it’s essential.

However, Arun Ulag, corporate vice president of Azure Data, acknowledged the challenges that come with operating at Microsoft’s scale. While the company has taken major strides with Fabric, the fast-moving nature of the industry demands constant innovation and adaptability.

“A lot of these patterns are new,” Ulag explained, describing the challenges of designing for a diverse set of use cases across industries. “Some of these patterns will work. Some of them will not, and we’ll only know as customers try them at scale…The way it’s used in automotive may be very, very different from the way it’s used in healthcare,” he added, emphasizing the role of external forces like government regulations in shaping future development.

As Microsoft continues to refine Fabric, the company is positioning itself as a leader in the shift to unified, AI-ready data architectures. But with competitors also racing to meet the demands of enterprise AI, the journey ahead will require constant evolution, rapid learning, and a focus on delivering value at scale.

For more insights into the announcements and Arun Ulag’s perspective, watch our full video interview above.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


QCon SF: Using Metaflow to Support Diverse ML Systems at Netflix

MMS Founder
MMS Anthony Alford

Article originally posted on InfoQ. Visit InfoQ

At QCon SF 2024, David Berg and Romain Cledat gave a talk about how Netflix uses Metaflow, an open-source framework, to support a variety of ML systems. The pair gave an overview of Metaflow’s design principles and illustrated several of Netflix’s use cases, including media processing, content demand modeling, and meta-models for explaining models.

Berg and Cledat, both Senior Software Engineers at Netflix, began with several design principles for Metaflow. The goal is to accelerate ML model development in Python by minimizing the developer’s cognitive load. The Metaflow team identified several effects that they wished to minimize: the House of Cards effect, where the underlying layers of a framework are “shaky” instead of a solid foundation; the Puzzle effect, where the composable modules have unique or unintuitive interfaces; and the Waterbed effect, where the system has a fixed amount of complexity that “pops up” in one spot when pushed down elsewhere.

Cledat gave an overview of the project’s history. Metaflow began in 2017 as an internal project at Netflix; in 2019, it was open-sourced, although Netflix continued to maintain its own internal version. In 2021, a group of Netflix ex-employees created a startup, Outerbounds, to maintain and support the open-source project. The same year, Netflix’s internal version and the open-source version were refactored to create a shared “core.”

The key idea of Metaflow is to express computation as a directed acyclic graph (DAG) of steps. Everything is expressed using Python code that “any Python developer would be comfortable coding” instead of using a DSL. The DAG can be executed locally on a developer’s machine or in a production cluster without modification. Each execution of the low, or “run,” can be tagged and persisted for collaboration.

Berg gave several examples of the different ML tasks that Netflix developers have tackled with Metaflow. Content demand modeling tries to predict user demand for a video “across the entire life cycle of the content.” This actually involves multiple data sources and models, and leverages Metaflow’s ability to orchestrate among multiple flow DAGs; in particular, it uses a feature where flows can signal other flows, for example when a flow completes.

Another use case is meta-modeling, which trains a model to explain the results of other models. This relies on Metaflow’s ability to support reproducible environments. Metaflow packages all the dependencies needed to run a flow so that developers can perform repeatable experiments. When training a meta-model, this may require loading several environments, as the meta-model may have different dependencies from the explained model.  

The presenters concluded their talk by answering questions from the audience. Track host Thomas Betts asked the first question. He noted that the code for a flow DAG can have annotations specifying the size of the compute cluster to execute it, but the hosts said the same DAG could also be executed on a single machine, and he wondered if those were ignored in that case. The hosts confirmed that this was the case.

Another attendee asked about how these cluster specifications were tuned, especially in the case of over-provisioning. Berg said that the framework can surface “hints” about resource use. He also said there was some research being done on auto-tuning the resources, but not everything could be abstracted away.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


How Coinbase provides trustworthy financial experiences through real-time user clustering … – AWS

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

This post was co-authored with Nissan Modi, Staff Software Engineer at Coinbase.

In this post, we discuss how Coinbase migrated their user clustering system to Amazon Neptune Database, enabling them to solve complex and interconnected data challenges at scale.

Coinbase’s mission is to expand global economic freedom by building an open financial system that uses cryptocurrency to increase access to the global economy and aims to achieve this by providing a secure online platform for buying, selling, transferring, and storing cryptocurrency. Coinbase handles vast amounts of data related to transactions, user behavior, and market trends. To efficiently manage and analyze this data, the company employs various data science techniques, including clustering. This method of data organization and analysis is particularly useful for a platform like Coinbase, which needs to understand patterns in user behavior, detect anomalies, and optimize its services. Clustering groups similar data points together based on their features by sorting entities into sets, or clusters, where items in the same cluster share key traits. These traits could be things like age, habits, or interests. The main idea is that data points in one cluster are alike, and those in different clusters are not. This method helps find natural patterns in data, making it straightforward to understand and use large datasets.

Challenge

The platform organization within Coinbase has been responsible for managing a clustering system that has been in place since 2015. Since the original datastore for the clustering system was not graph-based, clusters needed to be precomputed and stored in a NoSQL database. At any given time, the system would store approximately 150 million clusters, some of which contain over 50,000 nodes. This made it challenging to keep clusters up-to-date as user attributes changed in real time, as whenever a user attribute was updated, the system would need to re-calculate clusters.

Pre-calculating clusters became even more challenging as Coinbase expanded their product offerings and their customer base grew. Additionally, logic for grouping users became increasingly complex over time. This necessitated a high number of database updates to support each specific use case. As a result, the system began to experience performance degradation, higher storage costs, and difficulties in supporting different read patterns. This growing inefficiency made it clear that the existing approach was no longer sustainable.

The scale of the system was significant, with around 150 million clusters, some of which included over 50,000 nodes. This massive scale added to the complexity and challenges faced by the team, especially as the system’s write-heavy nature became more pronounced over time.

Initially, the system relied on a NoSQL database to store the precomputed clusters. Precomputing results can be advantageous in systems that are primarily read-heavy, because it avoids the need to repeat the same computations during read operations. However, the clustering system at Coinbase was characterized by a write-heavy workload with frequent updates, making precomputing less optimal as the system evolved. This led to performance issues, increased storage costs, and challenges in accommodating the complex and dynamic relationships within the data. Consequently, the team needed to reevaluate the database choice to better scale the system and meet the demands of Coinbase’s growing product ecosystem.

Solution overview

Graph databases are designed to manage complex, interconnected data structures, allowing representation and querying of relationships between entities. Because Coinbase’s use case is write-heavy and the data is highly connected, they needed a solution that can handle frequent updates to both data and relationships. Instead of relying on precomputed joins, a graph database can perform real-time traversals of connections and relationships, leading to improved query performance and reduced storage costs as compared to using a non-graph datastore to solve the same problem. Adopting a graph database for Coinbase’s clustering system represents a strategic shift towards a more flexible and scalable data architecture, which is key as Coinbase grows not only their customer base but the increasing complexity of its customer relationships.

Graph databases are purpose-built for storing and efficiently querying highly connected data. The following are key indicators of whether a graph is well-suited for a particular use case:

  • Is the dataset highly connected, full of many-to-many relationships?
  • Do your intended queries for the graph require understanding relationships between data points?

Coinbase’s clustering use case aims to group entities according to attributes shared across the entities. Therefore, when clustering is complete, entities within a single cluster will be more closely associated with one another, compared to other entities that are in different clusters.

You can represent the dataset using a series of relational tables, for example, a user_attributes table where each row represents a user, and each column represents a different attribute, as illustrated in the following figure.

Two tables containing information on user attributes and cluster information, respectively. Each row in the user attributes table represents a single user and its associated attributes. Each row in the cluster information table represents a single user and which cluster it belongs to.

You can also model it as a graph, as shown in the following figure.

A collection of nodes and edges that represent individual users and how they are related to the attributes they are associated with. Users and attributes are represented as nodes, and user associations with a particular attribute are represented by connecting edges.

The benefit of modeling this data in a graph format is that you can efficiently find groups and patterns based on mutual connections. For example, given the following sample graph of entities (ent#) and attributes (attr#), you might want to find the collection of entities that share certain attributes but not others. A shared attribute is defined as an attribute node that is connected to two or more entities.

More specifically, let’s say you want to find a collection of entities that meet the following requirements:

  1. All entities in the collection share at least attributes attr1, attr2, attr3
  2. All entities in the collection do not share the attributes attr4, attr5
  3. All entities in the collection share any attribute with at least one other entity that shares a specific attribute with at least two other entities

And your graph contains the follow entities and relationships:

ent1 through ent6 and attr1 through attr5 are nodes. Edges connect from ent1 to attr1, attr2, attr3, and attr5. Edges connect from ent2 to attr1, attr2, attr3, and attr4. An edge connects from ent3 to attr3. Edges connect from ent4, ent5, and ent6 to attr1.

With this example, only ent1 and ent2 would be returned, since ent1 and ent2 both connect to attr1, attr2, and attr3, meeting the first requirement. ent1 is connected to attr5, and ent2 is connected to attr4, but both attributes are not shared attributes since they don’t connect to more than one node each – thus meeting the second requirement. And both ent1 and ent2 share attr1, which is also shared by ent4, ent5, and ent6 – thus meeting the third requirement.

To answer this question efficiently, you need to know not only how entities are connected to the attributes they are associated with, but a way to traverse those connections across multiple levels. Although this question can be answered with a relational database, for query performance to be efficient, you should know all your query patterns upfront, so table joins can be pre-calculated and stored. But by keeping this data in a graph, it not only lets you recalculate queries in real time as the data changes (with no need to pre-calculate joins), but also gives additional flexibility for different query patterns to be written as needed.

Neptune Database addresses several technical challenges faced by large-scale graph database implementations. Because it is fully managed, Coinbase can eliminate significant operational overhead while providing flexibility in data modeling and querying. Neptune Database doesn’t enforce a schema, so adding new properties, node types, and edge types to answer evolving business use cases doesn’t require the graph to be rebuilt or reloaded. Additionally, Neptune Database is capable of querying billions of relationships with millisecond latency, allowing Coinbase to scale this system with their growing customer base.

In Coinbase’s solution, the data ingestor service writes to Neptune Database transactionally. Multiple events are batched into a single graph query, which is run as a single transaction within Neptune Database. This keeps the graph up to date in near real time with the incoming events. Coinbase micro batches multiple changes into the same transaction, and is therefore able to achieve their desired ingestion rates through 20 writes per second, where each write takes the place of many writes (depending on how many users’ clusters were being updated) in the old NoSQL system.

The following diagram illustrates the architecture for Coinbase’s enhanced clustering solution.

The architecture flow starts with multiple event sources that flow into Amazon Managed Streaming for Apache Kafka (MSK). A data ingestor collects data from Amazon MSK and writes the corresponding data into Amazon Neptune Database. An API server maps different use cases to different queries, which can be called by clients. Additionally, visualizations of the graph are generated from Amazon Neptune Database.

Services communicate with Neptune Database through an API server, where different use cases are mapped to different queries. For example, when invoked, the get-related-users API takes an attribute name and attribute value and runs the following Gremlin query to retrieve information about a given user:

g.V().HasLabel(attributeName).
Has("attribute_value", attribute_value).
In().
HasLabel("user").
ElementMap().
ToList()

One feature that Coinbase was unable to implement with the legacy architecture was a UI for stakeholders that visualized the graph. Even though clusters could be pre-calculated, the results themselves were still stored in a tabular format. Now that the data is in a graph format, visualizations of the entities and relationships can be generated with ease. Providing a visualization allows stakeholders to see a different perspective of the data, and makes it straightforward to visually identify the common attributes used for generating clusters—enabling stakeholders to take the proper actions when linkages between common attributes are found. The following is an example visualization from Coinbase’s enhanced clustering system.

An example of the graph visualization generated with data from Amazon Neptune Database.

Representing graph data and querying

Neptune Database supports two open frameworks for representing graph data: the Resource Description Framework (RDF) and the Labeled Property Graph framework (LPG). You can represent graph use cases using either framework, but depending on the types of queries that you want to run, it can be more efficient to represent the graph using one framework or the other.

The types of queries that are commonly used for clustering in the Coinbase system require recursive traversals with filtering on edge properties and other traversals. Therefore, representing this use case with the LPG framework was a good fit because it’s simpler to write complex pathfinding queries using the LPG query languages openCypher and/or Gremlin.

For example, one benefit of using LPG and the Gremlin query language is the presence of support for a query results cache. Pathfinding queries that are used to generate clusters can have many results, and with the query results cache, you can natively paginate the results for improved overall performance. Additionally, to generate a visualization of subgraphs, you need to return a path, which is the sequence of nodes and edges that were traversed from your starting points to your ending points. You can use the Gremlin path() step to return this information, making it less complicated to generate paths for recursive traversals with condition-based ending conditions, such as finding the path between a given pair of nodes.

Benefits and results

Coinbase’s solution with Neptune Database yielded the following benefits:

  • New use cases – The new solution facilitates the discovery of related users across various product use cases without the need for hard-coded aggregation logic. Additionally, attribute lists can be passed to the get-related-users API to instantly generate a list of related users. This capability aids in debugging and allows for the efficient identification of similar users for administrative purposes.
  • Performance efficiency – 99% of the queries that Coinbase runs achieves a latency of less than 80 milliseconds for the platform team while running on a smaller, cost-optimized instance, without a caching layer. This instance can scale to 300 transactions per second (TPS). These transactions are more meaningful than TPS figures on the previous NoSQL system, due to batching the writes and updating all of the users’ attributes across multiple clusters. Because computing multiple joins was required, the NoSQL system thus needed multiple queries to find the same results that a single graph query now finds.
  • Reliability – Because updates are now limited to a single node, the number of database operations has been drastically reduced. This optimization has effectively eliminated the race conditions that were prevalent in the previous system. Additionally, Coinbase can take advantage of automatic hourly backups through Neptune Database.
  • Cost optimization – Coinbase was able to achieve 30% savings in storage costs by eliminating redundant information in the old system and computing the clusters at runtime using Neptune Database.
  • Visualizations – New visualization capabilities provided through a custom-built UI help business owners and teams across the company understand fraud and risk situations and allow new ways to show useful data. This has already significantly reduced analysis time.

Conclusion

Coinbase’s journey with Neptune Database showcases the power of graph databases in solving complex, interconnected data challenges at scale. By migrating their user clustering system to Neptune Database, Coinbase has not only overcome the limitations of their previous NoSQL solution but also unlocked new capabilities and efficiencies. The fully managed nature of Neptune Database has allowed Coinbase to focus on innovation rather than operational overhead. The platform’s ability to handle billions of relationships with millisecond latency enables Coinbase’s future growth and evolving business needs.

Now that the data is in a graph format on Neptune Database, it’s less complicated for Coinbase to add more user attributes without increasing the complexity of managing the relationship. In the future, they plan to ingest more of these attributes and gain richer insights. This will lead to even greater benefits and new use cases.

The graph format also makes it straightforward to extend analyses to experiment with new graph-based techniques. Neptune Analytics is a memory-optimized graph database that helps you find insights faster by analyzing graph datasets with built-in algorithms. Graph algorithms can be used to identify outlier patterns and structures within the dataset, providing insights on new behaviors to investigate. A Neptune Analytics graph can be created directly from a Neptune Database cluster, making it effortless to run graph algorithms without having to manage additional extract, transform, and load (ETL) pipelines and infrastructure.

Get started today with Fraud Graphs on AWS powered by Neptune. You can use sample notebooks such as those in the following GitHub repo to quickly test in your own environment.


About the Authors

Joshua Smithis a Senior Solutions Architect working with FinTech customers at AWS. He is passionate about solving high-scale distributed systems challenges and helping our fastest scaling financial services customers build secure, reliable, and cost-effective solutions. He has a background in security and systems engineering, working with early startups, large enterprises, and federal agencies.

Melissa Kwok is a Senior Neptune Specialist Solutions Architect at AWS, where she helps customers of all sizes and verticals build cloud solutions according to best practices. When she’s not at her desk you can find her in the kitchen experimenting with new recipes or reading a cookbook.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MongoDB (MDB) Outpaces Stock Market Gains: What You Should Know – Yahoo Finance

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

The most recent trading session ended with MongoDB (MDB) standing at $284.43, reflecting a +1.67% shift from the previouse trading day’s closing. The stock’s performance was ahead of the S&P 500’s daily gain of 0.39%. Meanwhile, the Dow lost 0.13%, and the Nasdaq, a tech-heavy index, added 0.6%.

Heading into today, shares of the database platform had gained 1.54% over the past month, outpacing the Computer and Technology sector’s gain of 0.59% and the S&P 500’s gain of 1.06% in that time.

Investors will be eagerly watching for the performance of MongoDB in its upcoming earnings disclosure. The company is predicted to post an EPS of $0.69, indicating a 28.13% decline compared to the equivalent quarter last year. Meanwhile, our latest consensus estimate is calling for revenue of $495.23 million, up 14.39% from the prior-year quarter.

Regarding the entire year, the Zacks Consensus Estimates forecast earnings of $2.43 per share and revenue of $1.93 billion, indicating changes of -27.03% and +14.48%, respectively, compared to the previous year.

Furthermore, it would be beneficial for investors to monitor any recent shifts in analyst projections for MongoDB. These latest adjustments often mirror the shifting dynamics of short-term business patterns. Hence, positive alterations in estimates signify analyst optimism regarding the company’s business and profitability.

Empirical research indicates that these revisions in estimates have a direct correlation with impending stock price performance. We developed the Zacks Rank to capitalize on this phenomenon. Our system takes these estimate changes into account and delivers a clear, actionable rating model.

The Zacks Rank system ranges from #1 (Strong Buy) to #5 (Strong Sell). It has a remarkable, outside-audited track record of success, with #1 stocks delivering an average annual return of +25% since 1988. The Zacks Consensus EPS estimate has moved 0.76% higher within the past month. Right now, MongoDB possesses a Zacks Rank of #3 (Hold).

With respect to valuation, MongoDB is currently being traded at a Forward P/E ratio of 115.24. This represents a premium compared to its industry’s average Forward P/E of 30.95.

We can additionally observe that MDB currently boasts a PEG ratio of 11.08. Comparable to the widely accepted P/E ratio, the PEG ratio also accounts for the company’s projected earnings growth. MDB’s industry had an average PEG ratio of 2.45 as of yesterday’s close.

The Internet – Software industry is part of the Computer and Technology sector. This industry, currently bearing a Zacks Industry Rank of 33, finds itself in the top 14% echelons of all 250+ industries.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Amazon DynamoDB announces general availability of attribute-based access control – AWS

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Amazon DynamoDB is a serverless, NoSQL, fully managed database with single-digit millisecond performance at any scale. Today, we are announcing the general availability of attribute-based access control (ABAC) support for tables and indexes in all AWS Commercial Regions and the AWS GovCloud (US) Regions. ABAC is an authorization strategy that lets you define access permissions based on tags attached to users, roles, and AWS resources. Using ABAC with DynamoDB helps you simplify permission management with your tables and indexes as your applications and organizations scale.

ABAC uses tag-based conditions in your AWS Identity and Access Management (IAM) policies or other policies to allow or deny specific actions on your tables or indexes when IAM principals’ tags match the tags for the tables. Using tag-based conditions, you can also set more granular access permissions based on your organizational structures. ABAC automatically applies your tag-based permissions to new employees and changing resource structures, without rewriting policies as organizations grow.

There is no additional cost to use ABAC. You can get started with ABAC using the AWS Management Console, AWS API, AWS CLI, AWS SDK, or AWS CloudFormation. Learn more at Using attribute-based access control with DynamoDB.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.