Category: Uncategorized
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
Atria Investments Inc boosted its position in MongoDB, Inc. (NASDAQ:MDB – Free Report) by 6.6% during the 3rd quarter, according to its most recent 13F filing with the Securities and Exchange Commission (SEC). The institutional investor owned 2,175 shares of the company’s stock after purchasing an additional 135 shares during the quarter. Atria Investments Inc’s holdings in MongoDB were worth $588,000 as of its most recent filing with the Securities and Exchange Commission (SEC).
Other institutional investors and hedge funds have also bought and sold shares of the company. Principal Financial Group Inc. raised its position in shares of MongoDB by 2.7% during the 3rd quarter. Principal Financial Group Inc. now owns 6,095 shares of the company’s stock valued at $1,648,000 after buying an additional 160 shares in the last quarter. Janney Montgomery Scott LLC bought a new stake in shares of MongoDB during the 3rd quarter valued at about $861,000. Stephens Investment Management Group LLC boosted its stake in shares of MongoDB by 22.8% during the 3rd quarter. Stephens Investment Management Group LLC now owns 30,664 shares of the company’s stock valued at $8,290,000 after purchasing an additional 5,688 shares in the last quarter. US Bancorp DE boosted its stake in shares of MongoDB by 9.1% during the 3rd quarter. US Bancorp DE now owns 3,869 shares of the company’s stock valued at $1,046,000 after purchasing an additional 324 shares in the last quarter. Finally, First Trust Direct Indexing L.P. boosted its stake in shares of MongoDB by 16.0% during the 3rd quarter. First Trust Direct Indexing L.P. now owns 1,888 shares of the company’s stock valued at $510,000 after purchasing an additional 261 shares in the last quarter. 89.29% of the stock is currently owned by hedge funds and other institutional investors.
MongoDB Stock Up 1.7 %
Shares of MongoDB stock opened at $289.15 on Wednesday. The firm’s 50 day moving average price is $278.06 and its 200-day moving average price is $273.04. The company has a quick ratio of 5.03, a current ratio of 5.03 and a debt-to-equity ratio of 0.84. MongoDB, Inc. has a 52-week low of $212.74 and a 52-week high of $509.62. The company has a market cap of $21.36 billion, a P/E ratio of -94.26 and a beta of 1.15.
MongoDB (NASDAQ:MDB – Get Free Report) last released its earnings results on Thursday, August 29th. The company reported $0.70 earnings per share for the quarter, topping the consensus estimate of $0.49 by $0.21. The company had revenue of $478.11 million for the quarter, compared to the consensus estimate of $465.03 million. MongoDB had a negative net margin of 12.08% and a negative return on equity of 15.06%. MongoDB’s revenue was up 12.8% compared to the same quarter last year. During the same quarter in the previous year, the firm earned ($0.63) EPS. On average, sell-side analysts predict that MongoDB, Inc. will post -2.39 EPS for the current fiscal year.
Insider Buying and Selling at MongoDB
In related news, CAO Thomas Bull sold 154 shares of the business’s stock in a transaction on Wednesday, October 2nd. The shares were sold at an average price of $256.25, for a total value of $39,462.50. Following the completion of the transaction, the chief accounting officer now directly owns 16,068 shares in the company, valued at $4,117,425. The trade was a 0.95 % decrease in their position. The transaction was disclosed in a document filed with the SEC, which is available at the SEC website. Also, Director Dwight A. Merriman sold 3,000 shares of the business’s stock in a transaction on Wednesday, October 2nd. The shares were sold at an average price of $256.25, for a total transaction of $768,750.00. Following the completion of the transaction, the director now owns 1,131,006 shares of the company’s stock, valued at approximately $289,820,287.50. This represents a 0.26 % decrease in their ownership of the stock. The disclosure for this sale can be found here. In the last ninety days, insiders sold 25,600 shares of company stock worth $7,034,249. Company insiders own 3.60% of the company’s stock.
Analyst Upgrades and Downgrades
Several brokerages have issued reports on MDB. Needham & Company LLC lifted their target price on shares of MongoDB from $290.00 to $335.00 and gave the stock a “buy” rating in a report on Friday, August 30th. Oppenheimer upped their target price on shares of MongoDB from $300.00 to $350.00 and gave the company an “outperform” rating in a report on Friday, August 30th. Wedbush raised shares of MongoDB to a “strong-buy” rating in a research note on Thursday, October 17th. Scotiabank increased their price objective on shares of MongoDB from $250.00 to $295.00 and gave the stock a “sector perform” rating in a research note on Friday, August 30th. Finally, Stifel Nicolaus raised their target price on shares of MongoDB from $300.00 to $325.00 and gave the company a “buy” rating in a research note on Friday, August 30th. One investment analyst has rated the stock with a sell rating, five have issued a hold rating, nineteen have assigned a buy rating and one has given a strong buy rating to the stock. Based on data from MarketBeat.com, the stock presently has a consensus rating of “Moderate Buy” and a consensus price target of $336.54.
MongoDB Profile
MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.
Featured Articles
Want to see what other hedge funds are holding MDB? Visit HoldingsChannel.com to get the latest 13F filings and insider trades for MongoDB, Inc. (NASDAQ:MDB – Free Report).
Receive News & Ratings for MongoDB Daily – Enter your email address below to receive a concise daily summary of the latest news and analysts’ ratings for MongoDB and related companies with MarketBeat.com’s FREE daily email newsletter.
Article originally posted on mongodb google news. Visit mongodb google news
MongoDB and Microsoft expand partnership to advance AI applications and data analytics
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
Database company MongoDB Inc. today announced an expanded partnership with Microsoft Corp. that includes new integrations aimed at enhancing artificial intelligence application development, real-time data analytics and deployment flexibility.
The first integration sees MongoDB Atlas, MongoDB’s fully managed cloud database service, integrated into Microsoft Azure AI Foundry. The goal is to allow customers to build retrieval-augmented generation or RAG applications by combining MongoDB’s data capabilities with Azure OpenAI Service.
With the integration, developers can enhance large language models with proprietary data stored in MongoDB Atlas without additional coding or pipeline building, streamlining the process of creating chatbots, copilots and enterprise AI applications. Azure AI Foundry’s “Chat Playground” feature further simplifies development by enabling real-time testing of LLMs with enterprise data before deployment.
The integration offers users a way to augment generative AI models with their own data to ensure their applications are grounded in up-to-date context. The combination of MongoDB Atlas and Azure AI Foundry offers flexibility and efficiency in leveraging enterprise data for advanced AI use cases.
In the second announcement, real-time data analytics with Microsoft Fabric, MongoDB Atlas now supports Open Mirroring in Microsoft Fabric for a near real-time connection with OneLake. The capability synchronizes data between the two platforms, allowing businesses to generate timely analytics, AI predictions, and business intelligence reports.
Through enabling real-time insights, businesses can leverage MongoDB’s operational data and Microsoft Fabric’s analytics tools to drive strategic decisions and optimize performance across diverse use cases, from AI-powered predictions to reporting.
The final announcement allows users to “deploy MongoDB their way” with MongoDB Enterprise Advanced on Azure Marketplace, introducing greater flexibility for organizations deploying applications in Kubernetes environments. With Azure Arc-enabled Kubernetes, customers can deploy and self-manage MongoDB instances across on-premises, multicloud and edge environments.
“By integrating MongoDB Atlas with Microsoft Azure’s powerful AI and data analytics tools, we empower our customers to build modern AI applications with unparalleled flexibility and efficiency,”Sandy Gupta, vice president of partner development ISV at Microsoft, said in a statement.
Sahir Azam, chief product officer of MongoDB, spoke with theCUBE, SiliconANGLE Media’s live streaming studio, in May, when he discussed how the company is strengthening its database ecosystem and advancing artificial intelligence capabilities with key partners:
Image: SiliconANGLE/Ideogram
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU
Article originally posted on mongodb google news. Visit mongodb google news
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
CHICAGO, Nov. 19, 2024 — Today at Microsoft Ignite, MongoDB, Inc. announced an expanded collaboration with Microsoft that introduces three new capabilities for joint customers. First, customers building applications powered by retrieval-augmented generation (RAG) can now select MongoDB Atlas as a vector store in Microsoft Azure AI Foundry, combining MongoDB Atlas’s vector capabilities with generative AI tools and services from Microsoft Azure and Azure Open AI. Meanwhile, users looking to maximize insights from operational data can now do so in near real-time with Open Mirroring in Microsoft Fabric for MongoDB Atlas. And the launch of MongoDB Enterprise Advanced (EA) on Azure Marketplace for Azure Arc-enabled Kubernetes applications enables organizations that operate across on-premises, multi-cloud, and edge Kubernetes environments to choose MongoDB. With these capabilities, MongoDB is meeting customers where they are on their innovation journeys, and making it easier for them to unleash the power of data.
Through the strengthened MongoDB-Microsoft relationship, customers will be able to:
- Enhance LLMs with proprietary data stored in MongoDB Atlas: Accessible through Azure AI Foundry, the Azure OpenAI Service allows businesses to develop RAG applications with their proprietary data in combination with the power of advanced LLMs. This new integration with Azure OpenAI Service enables users to take enterprise data stored in MongoDB Atlas and augment LLMs with proprietary context. This collaboration makes it easy to build unique chatbots, copilots, internal applications, or customer-facing portals that are grounded in up-to-date enterprise data and context. Developers are now able to add MongoDB Atlas as a vector data store for advanced LLMs, all without the need for additional coding or pipeline building. And through Azure AI Foundry’s “Chat Playground” feature, developers can quickly test how their enterprise data and selected LLM function together before taking it to production.
- Generate key business insights faster: Microsoft Fabric empowers businesses to gather actionable insights from their data on an AI-powered Analytics platform. Now Open Mirroring in Microsoft Fabric with MongoDB Atlas will allow for a near real-time connection, to keep data in sync between MongoDB Atlas and OneLake in Microsoft Fabric. This enables the generation of near real-time analytics, AI-based predictions, and business intelligence reports. Customers will be able to seamlessly take advantage of each data platform without having to choose between one or the other, or without worrying about maintaining and replicating data from MongoDB Atlas to OneLake.
- Deploy MongoDB Their Way: The launch of MongoDB EA on Azure Marketplace for Azure Arc-enabled Kubernetes applications gives customers greater flexibility when building applications across multiple environments. With MongoDB EA, customers are able to deploy and self-manage MongoDB database instances in the environment of their choosing, including on-premises, hybrid, and multi-cloud. The MongoDB Enterprise Kubernetes Operator, part of the MongoDB Enterprise Advanced offering, enhances the availability, resilience, and scalability of critical workloads by deploying MongoDB replica sets, sharded MongoDB clusters, and the Ops Manager tool across multiple Kubernetes clusters. Azure Arc further complements this by centrally managing these Kubernetes clusters running anywhere—in Azure, on premises, or even in other clouds. Together, these capabilities ensure that customers can build robust, distributed applications by leveraging the resilience of a strong data layer along with the central management capabilities that Azure Arc offers for its Arc-enabled Kubernetes applications.
“We frequently hear from MongoDB’s customers and partners that they’re looking for the best way to build AI applications, using the latest models and tools.” said Alan Chhabra, Executive Vice President of Partners at MongoDB. “And to address varying business needs, they also want to be able to use multiple tools for data analytics and business insights. Now, with the MongoDB Atlas integration with Azure AI Foundry, customers can power gen AI applications with their own data stored in MongoDB. And with Open Mirroring in Microsoft Fabric, customers can seamlessly sync data between MongoDB Atlas and OneLake for efficient data analysis. Combining the best from Microsoft with the best from MongoDB will help developers push applications even further.”
Joint Microsoft and MongoDB customers and partners welcome the expanded collaboration for greater data development flexibility.
Trimble, a leading provider of construction technology, delivers a connected ecosystem of solutions to improve coordination and collaboration between construction teams, phases and processes.
“As an early tester of the new integrations, Trimble views MongoDB Atlas as a premier choice for our data and vector storage. Building RAG architectures for our customers require powerful tools and these workflows need to enable the storage and querying of large collections of data and AI models in near real-time,” said Dan Farner, Vice President of Product Development at Trimble. “We’re excited to continue to build on MongoDB and look forward to taking advantage of its integrations with Microsoft to accelerate our ML offerings across the construction space.”
Eliassen Group, a strategic consulting company that provides business, clinical, and IT services, will use the new Microsoft integrations to drive innovation and provide greater flexibility to their clients.
“We’ve witnessed the incredible impact MongoDB Atlas has had on our customers’ businesses, and we’ve been equally impressed by Microsoft Azure AI Foundry’s capabilities. Now that these powerful platforms are integrated, we’re excited to combine the best of both worlds to build AI solutions that our customers will love just as much as we do,” said Kolby Kappes, Vice President – Emerging Technology, Eliassen Group.
Available in 48 Azure regions globally, MongoDB Atlas provides joint customers with the powerful capabilities of the document data model. With versatile support for structured and unstructured data, including Atlas Vector Search for RAG-powered applications, MongoDB Atlas accelerates and simplifies how developers build with data.
“By integrating MongoDB Atlas with Microsoft Azure’s powerful AI and data analytics tools, we empower our customers to build modern AI applications with unparalleled flexibility and efficiency,” said Sandy Gupta, VP, Partner Development ISV, Microsoft. “This collaboration ensures seamless data synchronization, real-time analytics, and robust application development across multi-cloud and hybrid environments.”
To read more about MongoDB Atlas on Azure go to https://www.mongodb.com/products/platform/atlas-cloud-providers/azure.
About MongoDB
Headquartered in New York, MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. Built by developers, for developers, MongoDB’s developer data platform is a database with an integrated set of related services that allow development teams to address the growing requirements for a wide variety of applications, all in a unified and consistent user experience. MongoDB has more than 50,000 customers in over 100 countries. The MongoDB database platform has been downloaded hundreds of millions of times since 2007, and there have been millions of builders trained through MongoDB University courses. To learn more, visit mongodb.com.
Source: MongoDB
Article originally posted on mongodb google news. Visit mongodb google news
Microsoft supercharges Fabric with new data tools to accelerate enterprise AI workflows
MMS • RSS
Posted on nosqlgooglealerts. Visit nosqlgooglealerts
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Today, Microsoft kicked off its Ignite conference, talking about all things AI, including how it has assembled the largest AI agent ecosystem and will allow enterprises to build more such apps using any of the 1,800 large language models it has on offer.
The move — a significant departure from the long-standing reliance on OpenAI — promises enhanced flexibility to developers, but we all know AI is just ‘garbage in and garbage out’ without a solid data foundation.
To this end, Microsoft also announced a series of updates for Fabric, its end-to-end SaaS data platform. According to Arun Ulag, the corporate VP of Azure data, the biggest development is the integration of transactional databases, which will transform Fabric into a truly unified, open data platform bringing all the necessary technologies together in one place to build next-gen AI applications, including advanced agents.
Other notable capabilities, some of which are being previewed while others are generally available, touch on different aspects of how Fabric operates, including data connectivity, workload performance, scalability, security and governance.
“We are relatively early in the AI journey… There are many more customers, business users, and developers that can take advantage of these technologies. And as they take advantage of these tools, we have to evolve them. We have to drive costs further down and make sure that we further accelerate business value. Ultimately, all of this should translate into higher GDP growth for countries and stronger business outcomes for customers,” Ulag said in an interview with VentureBeat.
Transactional database integration for fast-tracked AI development
Microsoft launched Fabric last year as a SaaS-based data and analytics platform to bring its innovations across the data stack in one place. The unified offering leveraged several tools the company built over the years, including SQL Server, Excel, Power BI and Azure Synapse, and provided teams with an end-to-end experience to connect, manage and analyze large structured and unstructured data assets.
At the core, Fabric is underpinned by an open lakehouse architecture called OneLake. It serves as a central, multi-cloud repository that supports various open data formats (Apache Parquet, Delta Lake and Iceberg) and the downstream analytical workloads. In the last few months, both Fabric and OneLake have received several improvements, including Real-Time Intelligence – which now becomes generally available – for analyzing streaming logs, IoT and telemetry as well as tools for migrating data from other data environments.
However, running analytical workloads to identify trends and patterns is just one piece of the puzzle. AI is the real deal today, and for that, the users need to go beyond aggregated, historical data. To help with this, Microsoft has announced Fabric Databases, which will see different transactional databases plug into OneLake, allowing users to access both live data from transactional systems (think individual purchase or login events) and bulk analytical data through one unified layer.
The company is starting with the integration of its own Azure SQL database and will follow up with other transactional databases including including Cosmos DB (its NoSQL document database behind ChatGPT), PostgreSQL, MongoDB and Cassandra. It hopes the move will save developers from complex database integrations and enable them to power next-gen AI apps, managing billions of interactions daily.
“Built-in vector search, RAG support, and Azure AI integration simplify AI app development, and your data is instantly available in OneLake for advanced analytics. Developers can even use Copilot in Fabric to translate natural language queries into SQL and get inline code completion alongside code fixes and explanations,” Ulag noted in a blog post today.
OneLake catalog, new AI features and more
In addition to transactional databases, Fabric is getting a new OneLake catalog to make it easier for teams to explore, manage and govern their entire Fabric data estate, no matter where the information has come from, as well as several AI capabilities to accelerate workflows.
The catalog, as Ulag wrote in the blog, carries two main tabs: Explore and Govern. The former is generally available and will help teams discover and manage their trusted data. Meanwhile, the Govern tab, aimed at providing data owners with valuable insights, tools and recommendations for governing their data, is in preview at this stage. These features will ensure that the teams are aware of what’s going on across the platform, without running into any surprises.
On the AI front, Microsoft is now previewing AI functions in Fabric notebooks, providing a simplified API for common AI text enrichments like summarization, translation, sentiment analysis, and more. The company is also enhancing AI skills (preview), which allow users to build agents that can be pointed to query any data across multiple systems via natural language. Ulag wrote AI skills now have an improved conversational experience. Plus, they can now connect to semantic models and Eventhouse KQL databases, going beyond lakehouse and data warehouse tables, mirrored DB and shortcut data.
Among other notable updates, Microsoft announced the general availability of API for GraphQL to allow efficient querying of multiple data sources using the widely adopted GraphQL technology; support for new events and simplified dashboard sharing in Real-Time Intelligence; and preview of open mirroring, a feature that allows any application or data provider to write data changes directly into a mirrored database within Fabric. It also confirmed the general availability of Azure SQL DB mirroring and the preview of SQL managed instance mirroring.
Finally, Fabric users will also get workspace monitoring and surge protection in preview. The former will provide detailed diagnostic logs for troubleshooting performance issues, capacity performance and data downtime, while the latter will prevent background jobs from starting after a set threshold.
Microsoft Ignite runs from November 19 to November 22, 2024
MMS • RSS
Posted on nosqlgooglealerts. Visit nosqlgooglealerts
Ignite A new version of Microsoft’s database warhorse, SQL Server, is on the way, with some useful improvements squeezed between the inevitable artificial intelligence additions.
New in SQL Server 2025 will be performance and availability enhancements lifted from Azure SQL. According to a Microsoft spokesperson, there’s optimized locking, optional parameter plan optimization, faster batch mode, and columnstore indexing in the release. There is also REST API support alongside Regular Expression enablement.
“Additionally, native JSON support enables developers to more effectively deal with frequently changing schema and hierarchical data, facilitating the creation of more dynamic applications,” the spokesperson said.
There’s support for Entra managed identities, which Microsoft says will improve credential management and compliance, and failover reliability has also been enhanced. And, of course, Copilot is in SQL Server Management Studio to “streamline SQL development by offering real-time suggestions, code completions, and best practice recommendations.”
Unsurprisingly, Microsoft is going all-in with AI in this release. “SQL Server 2025 has AI built-in, simplifying AI application development and retrieval-augmented generation (RAG) patterns with secure, performant, and easy-to-use vector support, leveraging the T-SQL language,” the company said.
“In this latest SQL Server version, flexible AI model management within the engine using REST interfaces allows our customers to use AI models from ground to cloud.”
Microsoft SQL Server is just over 35 years old – older, if one considers its Sybase origins – and the most recent release, SQL Server 2022, will remain in mainstream support until January 11, 2028. Extended support will go to January 11, 2033. The spokesperson told us that SQL Server 2025 would likely follow Microsoft’s Fixed Lifecycle policy, with five years of mainstream support followed by another five years of extended support.
Assuming SQL Server 2025 makes it to general availability in 2025 – it is currently in Private Preview – this translates to support until at least 2035.
If SQL Server 2022 was all about making everything “Azure-enabled,” SQL Server 2025 reflects Microsoft’s obsession with AI. “SQL Server 2025 transforms SQL Server into an enterprise AI-ready database, bringing AI to customers’ data in a secure, efficient manner,” the spokesperson said.
“This release continues SQL Server’s legacy of impressive performance and security, adding new features and AI assistance that optimizes customer data for the era of AI.”
As before, the company was tight-lipped on costs, although pay-as-you-go licensing for on-premises customers is available with Azure Arc integration.
It is hard to say if this might be the last hurrah for SQL Server. Microsoft has various alternative database options these days, and hybrid and cloud-based services. But there will always be customers who want to keep their data out of the cloud and firmly on-premises.
The spokesperson was non-committal: “The SQL Server schedule is dependent on industry trends, customer feedback, and our strategic vision. We will continue to evaluate SQL Server releases according to these factors as time continues.” ®
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
The database and development platform provider is announcing a number of initiatives at Microsoft Ignite this week that make it easier for customers and partners to work with MongoDB on Azure cloud.
MongoDB is extending the scope of integrations between its cloud database development platform and Microsoft Azure, a move the company says will make it easier for partners and customers to build real-time data analytics links and develop generative AI applications.
In a series of announcements today at this week’s Microsoft Ignite conference, MongoDB is integrating the MongoDB Atlas cloud database with Microsoft’s Azure OpenAI services and launching its MongoDB Enterprise Advanced database management tools on the Azure Marketplace.
MongoDB said the new integrations will provide partners and customers with greater flexibility in data development on Azure – particularly to help meet the exploding demand for data for AI and generative AI applications.
[Related: MongoDB CEO Ittycheria: AI Has Reached ‘A Crucible Moment’ In Its Development]
“I think the pace is phenomenal, things are changing daily,” said Alan Chhabra, MongoDB executive vice president of worldwide partners, speaking in an interview with CRN about the rapid growth of AI and GenAI development. He said experimentation with GenAI, especially within larger enterprises, “is through the roof.”
Despite competing with Microsoft and its Azure Cosmos database, MongoDB has been steadily expanding its alliance with Microsoft – along with its partnerships with Amazon Web Services and Google Cloud – in recent years.
Last year MongoDB extended its multi-year strategic partnership with Microsoft, committing to a broad range of initiatives including close cooperation between the two companies’ sales teams and making it easier to migrate database workloads to MongoDB Atlas on Azure. That followed steps in 2022 that allowed developers to work with MongoDB Atlas through the Azure Marketplace and Azure Portal.
“Microsoft has become our fastest growing partnership,” Chhabra said, noting how MongoDB and Microsoft sales representatives cooperate in selling MongoDB for Azure, particularly for AI and GenAI development.
At the Ignite event Tuesday MongoDB announced that customers building applications powered by retrieval-augmented generation (RAG) can now select MongoDB Atlas as a vector store in the Microsoft Azure AI Foundry, combining MongoDB Atlas’s vector capabilities with generative AI tools and services from Microsoft Azure and Azure Open AI Service.
That makes it easier for customers to enhance large language models (LLMs) with proprietary data and build unique chatbots, copilots, internal applications, or customer-facing portals that are grounded in up-to-date enterprise data and context, the company said.
Chhabra said the new capabilities are designed to help customers develop and deploy GenAI applications. “It’s not easy. There’s a lot of confusion. There’s also a lot of experimentation, because everyone knows they need to use it [but] they’re not sure how.
“This integration will make it way easier and seamless for customers to deploy RAG applications leveraging their proprietary data in the combination of their LLMs,” Chhabra said.
In May MongoDB launched the MongoDB AI Applications Program (MAAP) that provides a complete technology stack, services and other resources to help businesses develop and deploy at scale applications with advanced generative AI capabilities.
Chhabra said MongoDB systems integration and consulting partners will benefit from the new integrations “because we’re making it easier for them to deploy Gen AI pilots and help them take it to production for customers.”
While large enterprises are conducting lots of AI development and experimentation in-house, Chhabra said SMBs are looking for more complete packaged AI and GenAI solutions.
“I believe there’s a large play for ISV application [developers] who are building purpose-built GenAI applications in the cloud on Azure, leveraging the MongoDB stack, leveraging our MAAP program,” Chhabra said. “So instead of customers having to build, they can buy GenAI solutions. When big companies like Microsoft work with cutting-edge growing companies like MongoDB, we make it easier for customers and partners to deploy GenAI [and] the whole ecosystem benefits.”
In another announcement at Ignite, MongoDB said users looking to maximize insights from operational data can now do so in near real-time with Open Mirroring in Microsoft Fabric for MongoDB Atlas. That connection keeps data in sync between MongoDB Atlas and OneLake in Microsoft Fabric, enabling the generation of near real-time analytics, AI-based predictions, and business intelligence reports, according to MongoDB.
And the announced launch of MongoDB Enterprise Advanced on Azure Marketplace for Azure Arc-enabled Kubernetes applications gives customers more flexibility to build and operate applications across on-premises, hybrid, multi-cloud, and edge Kubernetes environments.
Eliassen Group, a Reading, Mass.-based strategic consulting company that provides business, clinical, and IT services, will use the new Microsoft integrations to drive innovation and provide greater flexibility to their clients, MongoDB said.
“We’ve witnessed the incredible impact MongoDB Atlas has had on our customers’ businesses, and we’ve been equally impressed by Microsoft Azure AI Foundry’s capabilities. Now that these powerful platforms are integrated, we’re excited to combine the best of both worlds to build AI solutions that our customers will love just as much as we do,” said Kolby Kappes, vice president – emerging technology, at Eliassen Group, in a statement.
The new extensions to the Microsoft alliance come a little more than a month after MongoDB debuted MongoDB 8.0, a significant update to the company’s core database that offered improved scalability, optimized performance and enhanced enterprise-grade security.
Article originally posted on mongodb google news. Visit mongodb google news
MMS • Meryem Arik
Article originally posted on InfoQ. Visit InfoQ
Transcript
Arik: I’m Meryem. I’m co-founder and CEO of TitanML. My background is, I was a physicist, originally, turned banker, then turned AI, but always really interested in emerging tech. We at TitanML built the infrastructure to make serving LLMs efficiently, much better. I’m going to frame today through a conversation that I had at a wedding last summer. No one really understands what we do, at least they didn’t before ChatGPT came out. They’re starting to now. I always find myself having to have this conversation over again. Fortunately, it wasn’t actually me having this conversation. It was my co-founder who I was at the wedding with, we’re all university friends. This is the conversation. Russell, he’s also a friend of mine from university. He’s a data scientist at a hedge fund. Really smart guy. This is Jamie. He is my co-founder. He’s our chief scientist. He essentially is the person that makes our inference server really fast.
Outline
What I’m going to do is I’m firstly going to explain why LLM deployment is hard, because a lot of people don’t necessarily appreciate that it is. Then I’m going to give an assortment, I think it’s seven, that I landed on, tips, tricks, and techniques for better LLM deployments.
Why is LLM (AI) Deployment Hard?
We’ll start with this conversation. Typically, it’s like, what have you been up to? Then he’s like, I’ve been working on making LLM serving more easy. Then he says, is LLM deployment even hard, don’t I just call the OpenAI API? Then he’s like, sort of. Because everyone, when they think of LLMs, just thinks of OpenAI. APIs are really easy to call. You might be like, why is she even here talking? I can’t just call the OpenAI API. Everyone here knows how to do that. However, there are more than one ways that you can access LLMs. You can use hosted APIs. I have a bunch of them here, OpenAI, Cohere, Anthropic, AI21 Labs. These are all situations where they’ve done the hosting for you and they’ve done the deployment for you. All you have to do is call into them. I don’t want to minimize it too much, because there’s still complexity you have there. You still have to do things like hallucination reduction, but they’ve done a lot of the heavy lifting. For a lot of use cases, you might want to self-host. This is when you’re calling into like a Mistral, or you’re hosting a Llama, or one of the others. Essentially, you’re hosting it in your own environment, whether that’s VPC or on-prem environment.
He’s like, but why would I want to self-host anyway? To which we say, lots of reasons. There’s broadly three reasons why you might want to self-host. Firstly, there’s decreased cost at scale. It is true that if you’re just doing proof of concepts, then OpenAI API based models are much cheaper. If you’re deploying at scale, then self-hosting ends up being much cheaper. Why does it become much cheaper? Because you only have one problem to solve, which is your particular business problem. You’re able to use much smaller models to solve the same problem. Whereas OpenAI, they’re hosting a model that has to solve both coding and also writing Shakespeare, so they have to use a much bigger model to get the same output.
At scale, it’s much cheaper to use self-hosted models. Second reason why you might want to self-host is you have improved performance as well. When you’re using a task specific LLM, or you fine-tuned it, or you’ve done something to make it very narrow to your task, you end up typically getting much better performance. Here’s a couple of snippets from various blogs, although I think they’re a bit old now, but the point still stands. Then the third reason, which is why most of our clients self-host, which is privacy and security. If you’re part of a regulated industry maybe for GDPR reasons, or your compliance team, then you might have to self-host as well. These are the three main reasons why you should self-host. If these aren’t important to you, use an API.
Typically, we find that the reasons why enterprises care about open source, and I have, I think, a couple graphs from a report by the VC, a16z. The three main reasons are control, customizability, and cost. The biggest one by far is control. Being able to have that AI independence, that if OpenAI decides to fire its CEO again, that you will still have access to your models, which is important, especially if you’re building really business important applications. The majority of enterprises also seem to agree that these reasons are important to them. The vast majority of enterprises, apart from 18%, expect to shift to open source, either now or when open sources matches the performance of a GPT-4 quality model. If you are looking to self-host, you are very much not alone, and most enterprises are looking to build up that self-hosted capability.
Russell, he works at a hedge fund, he’s like, privacy is really important for my use case, so it makes sense to self-host. How much harder can it really be? I hear this all the time, and it infuriates me. The answer is a lot harder. You really shouldn’t ignore the complexity that you can’t see. When you call an API based model, you benefit from all of the hard work that their engineers have done under the hood to build that inference and serving infrastructure. In fact, companies like OpenAI have teams of 50 to 100 managing this infra. Things like model compression, like Kubernetes, batching servers, function calling, JSON forming, runtime engines, are all the things you don’t have to worry about when you’re using the API based model, but you do suddenly have to worry about when you’re self-hosting.
He’s like, but I deploy ML models all the time. You might have been deploying XGBoost models or linear regression models in the past. How much harder can it really be to deploy these LLMs? To which we say, do you know what the L stands for? It’s way harder to deploy these models. Why? The first L in LLM stands for large language model. I remember when we started the company, we thought a 100 million parameter BERT model was large. Now a 7 billion parameter model is considered small, but that is still 14 gig, and that is not small. GPUs are the second reason why it is much harder. GPUs are much harder to work with than CPUs. They’re much more expensive, so using them efficiently really matters. Doesn’t really matter if you don’t use your CPUs super efficiently, because they’re a couple orders of magnitude cheaper.
That cost, latency, performance tradeoff triangle that we sometimes talk about is really stark with LLMs in a way that it might not have been previously. The third reason why it’s really hard is the field is evolving crazy fast. Half of the techniques that we use to serve and deploy and optimize models didn’t exist a year ago. Another thing that I don’t have here, but maybe it’s worth mentioning, is also the orchestration element. Typically, with these large language model applications, you have to orchestrate a number of different models. RAG is a perfect example of this. You have to orchestrate in the very classic sense, an embedding model and a generation model. If you’re doing state of the art RAG, you’ll probably need a couple models for your parses, maybe an image model and a table model, and then you’ll need a reranker. Then you end up with five or six different models. That gets quite confusing. Plus, there’s all the other reasons why deploying applications is hard, like scaling and observability.
Tips to Make LLM Deployment Less Painful
He then says something like, that sounds really tricky. What can I do? Then Jamie says, “Luckily, Meryem has some tips and tricks that make navigating LLM deployment much easier.” That’s what exactly he said. We’ll go through my tips to make LLM deployment less painful. It’ll still suck, and it’ll still be painful, it might be less painful.
1. Know Your Deployment Boundaries
My first tip is that you should know your deployment boundaries. You should know your deployment boundaries when you’re building the application. Typically, people don’t start thinking about their deployment boundaries until after they’ve built an application that they think works. We think that you should spend time thinking about your requirements first. It’ll make everything else much easier. Thinking about stuff like, what are your latency requirements? What kind of load are you expecting? Are you going to be deploying an application that might have three users at its peak, or is this going to be the kind of thing like DoorDash, where you’re deploying to 5 gazillion users? What kind of hardware do you have available? Do you need to deploy on-prem, or can you use cloud instances? If you have cloud instances, what kind of instances do you have to have?
All of these are the kind of things that you should map out before. You might not know exactly, so it’s probably a range. It is acceptable if my latency is below a second, or above X amount. It’s just good things to bear in mind. Other things that I don’t have here is like, do I need guaranteed JSON outputs? Do I need guaranteed regex outputs? These are the kinds of things that we should bear in mind.
2. Always Quantize
If you have these mapped out, then all of the other decisions will be made much easier. This goes on to my next point, which is, always quantize. I’ll tell you why it links to my first point earlier. Who knows who Tim Dettmers is? This guy is a genius. Who knows what quantization is? Quantization is essentially model compression. It’s when you take a large language model and you reduce the precision of all of the weights to whatever form you want. 4-bit is my favorite form of quantization, going from an FP32. The reason why it’s my favorite is because it’s got a really fantastic accuracy compression tradeoff. You can see here, in this we have accuracy versus model bits, so the size of the model. Let’s say the original is FP16. It’s actually not, it’s normally 32.
That’s your red line there. We can see that when we compress the model down, we’ll go 10 to the 10, for a given resource size, you can see that the FP16, red line, is actually the worst tradeoff. You’re way better off using a FP8 or an INT4 quantized model. What this graph is telling you is that for a fixed resource, you’re way better off having a quantized model of the same size than the unquantized model. We start with the infra and we work backwards. Let’s say we have access to L40S, and we have that much VRAM. Because I know my resources that I’m allowed, I can look at the models that I have available to me, and then work backwards. I have 48 gigs of VRAM. I have a Llama 13 billion, so that’s 26. That’s all good. That fits. I have a Mixtral which is current state of the art for open-source models. That’s not going to work.
However, I have a 4-bit quantized Mixtral which does fit, which is great. I now know which models I can even pick from, and I can start experimenting with. That graph that I showed you earlier with Tim Dettmers, that tells me that my 4-bit model will be better performing, probably. Let’s say my Llama was also the same size, my 4-bit model will be better performing than my Llama model, because my model retains a lot of that accuracy from when it was really big and compressed down. We start with our infra and work backwards. We essentially find the resources that we can fit, and then find the 4-bit quantized model that’ll fit in those resources. The chances are that’s probably the best accuracy that you can get for that particular model.
3. Spend Time Thinking About Optimizing Inference
Tip number three, spend a little bit of time thinking about optimizing inference. The reason why I tell people spend just a little bit of time optimizing inference is because the naive things that you would do when you’re deploying these models is typically completely the wrong thing to do. You don’t need to spend a huge amount of time thinking about this, but just spending a little bit of time can make multiple orders of magnitude difference to GPU utilization. I can give one example of this, batching strategies. Essentially, batching is where multiple requests are processed in parallel. The most valuable thing when you’re deploying these models that you have is your GPU utilization. GPUs, I think I said earlier, are really expensive, so it’s very important that we utilize them as much as we can. If I’m doing no batching, then this is more or less the GPU utilization that I’ll get, which is pretty bad. The naive thing to do would either be to do no batching or dynamic batching.
Dynamic batching is the standard batching method for non-Gen AI applications. It’s the kind of thing that you might have built previously. The idea is that you wait a small amount of time before starting to process a request. Group any of those requests that arrive during that time, and then process them together. In generative models, this leads to a downtime in utilization. You can see that it starts really high and then it goes down, because users will get stuck in the queue waiting for longer generations to finish. Dynamic batching is something that you might try naively, but it actually tends to be a pretty bad idea. If you spend a little bit of time thinking about this, you can do something like continuous batching. This is what we do.
This is a GPU utilization graph that we got a couple weeks ago, maybe. This the state-of-the-art batching technique designed for generative models. You let incoming requests interrupt in-flight requests in order to keep that GPU utilization really high. You get much less queue waiting, and much higher resource utilization as well. You can see going from there to there is maybe one order of magnitude difference in GPU costs, which is pretty significant. I’ve not done anything to the model, nothing will impact accuracy there.
Second example I can give you is with parallelism strategies. For really large models, you often can’t inference them on a single GPU. For example, a Llama 70 billion, or a Mixtral, or a Jamba, for example, they’re really hefty models. Often, I’ll need to split them across multiple GPUs in order to be able to inference them. You need to be able to figure out how you’re going to essentially do that multi-GPU inference. The naive way to do this, and actually this is probably the most popular way to do this, in fact, common inference libraries like Hugging Face’s Accelerate, does this, is you split the model layer by layer. It was a 90-gigabyte model. I have 30 on one, 30 on one, and then 30 on the third GPU. At any one time only one GPU is active, which means that I’m paying for essentially three times the number of GPUs that I’m actually using at any one time.
That’s just because I split them in this naive way, because my next GPU is having to wait for my previous GPU. That’s really unideal. This is what happens in Hugging Face Accelerate library, if you want to look into that. Tensor Parallel is what we think is the best one, which is, you essentially split the model lengthwise so that every GPU can be fully utilized at the same time for each layer, so it makes inference much faster, and you can support arbitrarily large models as well with enough GPUs. Because at every single point, all of your GPUs are firing, you don’t end up paying for that extra resource. In this particular example, we’ve got, for this particular model, a 3x model, a GPU utilization improvement. Combining that with the order of magnitude we had before, that’s a really significant GPU utilization improvement. It’s not a huge amount of time to think about this, but if you just spend that little bit of time, then you might end up improving what you can put on those GPUs.
4. Consolidate Infrastructure
What have I done so far? I’ve done, think about your deployment requirements, quantize, inference optimization. Fourth one is, consolidate your infrastructure. Gen AI is so computationally expensive that it really benefits from consolidation of infrastructure, and that’s why central MLOps teams like Ian runs, make a lot of sense. For most companies, ML teams tend to work in silos, and therefore are pretty bad at consolidation of infrastructure. It wasn’t really relevant for previous ML sources. Deployment is really hard, so it’s better if you deploy once, you have one team managing deployment, and then you maintain that, rather than having teams individually doing that deployment, because then each team individually has to discover that this is a good tradeoff to make. What this allows is it allows the rest of the org to focus on that application development while the infrastructure is taken care of.
I can give you an example of what this might look like. I will have a central compute infrastructure, and maybe as a central MLOps team, I’ve decided that my company can have access to these models, Llama 70, Mixtral, and Gemma 7B. I might periodically update the models and improve the models. For example, when Llama 7 comes out, instead of Llama 2, I might update that. These are the models that I’ll host centrally. Then all of those little yellow boxes are my application development teams. They’re my dispersed teams within the org. Each of them will be able to get access to my central compute infrastructure, and personalize it in the way that works for them. One of them might add a LoRA, which is essentially a little adapter that you can add to your model when you fine-tune it. It’s very easy to firstly do, and then also add into inference. Then maybe I’ll add RAG as well. RAG is when we give it access to our proprietary data, so our vector store, for example.
I have each of my application teams building LoRA’s RAGs, LoRA’s RAGs. Maybe I don’t even need LoRAs, and I can just do prompt engineering, for example, and my central compute is all managed by one team, and it’s just taken care of. The nice thing about this is what you’re doing is you’re giving your organization the OpenAI experience, but with private models. If I’m an individual developer, I don’t think about the LLM deployment. Another team manages it. It sits there, and I just build applications on top of the models we’ve been given access to. This is really beneficial. Things to bear in mind. Make sure your inference server is scalable. LoRA adapter support is super important if you want to allow your teams to fine-tune. If you do all of this, you’ll get really high utilization of GPUs. Because, remember, GPU utilization is literally everything. I say literally everything. There’s your friends, and there’s your family, and then there’s GPU utilization. If we centrally host this compute, then we’re able to get much higher utilization of those very precious GPUs.
I can give you a case study that we did with a client, RNL, it’s a U.S. enterprise. What they had before was they had four different Gen AI apps. They were pretty ahead at the time. They built all of this last year. Each app was sitting on its own GPU, because they were like, they’re all different applications. They’ve all got their own Embedders, their own thing going on. They gave them each their own GPUs, and as a result, got really poor GPU utilization, because not all the apps were firing all the time. They weren’t all firing at capacity. What we did with them is something like this. It doesn’t have to be Titan, it can be any inference server. They had Mixtrals and Embedders, essentially, is all they had. We hosted a Mixtral and an Embedder on one server and exposed those APIs. The teams then built on top of those APIs, sharing that resource. Because they were sharing the resource, they could approximately half the number of GPUs that they needed. We were able to manage both the generative and the non-generative in one container. It was super easy for those developers to build on top of. That’s the kind of thing that if you have a central MLOps team, you can do, and end up saving a lot of those GPU times.
5. Build as if You Are Going to Replace the Models Within 12 Months
My fifth piece of advice is, build as if you’re going to replace the models within 12 months, because you will. One of our clients, they deployed their first application with Llama 1 last year. I think they changed the model about four times. Every week they’re like, this new model came out. Do you support it? I’m like, yes, but why are you changing it for the sixth time? Let’s think back to what state of the art was a year ago. A year ago, maybe Llama had come out by then, but if before that, it might have been the T5s. The T5 models were the best open-source models. What we’ve seen is this amazing explosion of the open-source LLM ecosystem. It was all started by Llama and then Llama 2, and then loads of businesses had built on top of that.
For example, the Mistral 70B was actually built with the same architecture that Llama was. We had the Falcon out of the UAE. We had Mixtral by Mistral. You have loads of them, and they just keep on coming out. In fact, if you check out the Hugging Face, which is where all of these models are stored, if you check out their leaderboard of open-source models, the top model changes almost every week. Latest and greatest models come out. These models are going to keep getting better. This is the performance of all models, both open source and non-open source, as you can see the license, proprietary or non-proprietary. The open-source models are just slowly scaling that leaderboard. We’re starting to get close to parity between open source and non-open source. Right now, the open-source models are there or thereabouts, with GPT-3.5. That was the original ChatGPT that we were all amazed by.
My expectation is that we’ll get to GPT-4 quality within the next year. What this means is that you should really not wed yourself to a single model or a single provider. Going back to that a16z report that I showed you earlier, most enterprises are using multiple model providers. They’re building their inference stack in a way that it’s interoperable, in a way that if OpenAI has a meltdown, I can swap it out for a Llama model. Or, in a way that if Claude is now better than GPT-4 as it is now, I can swap them really easily. Building with this interoperability in mind is really important. I think one of the greatest things that OpenAI has blessed us with is not necessarily their models, although they are really great, but they have actually counterintuitively democratized the AI landscape, not because they’ve open sourced their models, because they really haven’t, but because what they’ve done is they’ve provided uniformity of APIs to the industry. If you build with the OpenAI API in mind, then you’ll be able to capture a lot of that value and be able to swap models in and out really easily.
What does this mean for how you build? API and container-first development makes life much easier. It’s fairly standard things. Abstraction is really good, so don’t spend time building custom infrastructure for your particular model. The chances are you’re not going to use it in 12 months. Try and build more general infra if you’re going to. We always say that at this current stage where we’re still proving value of AI in a lot of organizations, engineers should spend their time building great application experiences rather than fussing with infrastructure. Because right now, for most businesses, we’re fortunate enough to have a decent amount of budget to go and play and try out this Gen AI stuff.
We need to prove value pretty quickly. We tend to say, don’t work with frameworks that don’t have super wide support for models. For example, don’t work with a framework that only works with Llama, for example, because it’ll come back to bite you. Whatever architecture you pick or infrastructure you pick, making sure that when Llama 3, 4, 5, Mixtral, Mistral comes out, they will help you adopt it. I can go back to this case study that I talked about before. We built this in a way, obviously, that it’s super easy to swap that Mixtral for Llama 3, when Llama 3 comes out. For example, if a better Embedder comes out, like a really good Embedder came out a couple weeks ago, we can swap that out easily too.
6. GPUs Look Really Expensive, Use Them Anyway
My sixth one, GPUs look really expensive. You should use them anyway. GPUs are so phenomenal. They are so phenomenally designed for Gen AI and Gen AI workloads. Gen AI involves doing a lot of calculations in parallel, and that happens to be the thing that GPUs are incredibly good at. You might look at the sticker price and be like, it’s 100 times more expensive than a CPU. Yes, it is, but if you use it correctly and get that utilization you need out of it, then you’ll end up processing orders of magnitude more, and per request, it will be much cheaper.
7. When You Can, Use Small Models
When you can, use small models. GPT-4 is king, but you don’t get the king to do the dishes. What the dishes are: GPT-4 is phenomenal. It’s a genuinely remarkable piece of technology, but the thing that makes it so good is also that it is so broad in terms of its capabilities. I can use the GPT-4 model to write love letters, and you can use it to become a better programmer, and we’re using the exact same model. That is mental. That model has so many capabilities, and as a result, it’s really big. It’s a huge model, and it’s very expensive to inference. What we find is that you tend to be better off using GPT-4 for the really hard stuff that none of the open-source models can do yet, and then using smaller models for the things that are easier. You can massively reduce cost and latency by doing this. When we talked about that latency budget that you had earlier, or those resource budgets that you had earlier, you can go a long way to maximizing that resource budget if you only use GPT-4 when you really have to.
Three commonly seen examples are like RAG Fusion. This is when your query is edited by a large language model, and then all queries are searched against, and then the results are ranked to improve the search quality. For example that, you can get very good results by not using GPT-4, only using GPT-4 when you have to. You might, for example, with RAG, use a generative model just to do the reranking, so just check at the end that the thing that my Embedder said was relevant, was really relevant. Small models, especially fine-tuned models for things like function calling are really good. One of the really common use cases for function calling is if I need my model to output something like JSON or regex, there are broadly two ways that I could do this. Either I could fine-tune a much smaller model, or I could add controllers to my small model. A controller is really cool. A controller is essentially when, if I’m self-hosting the model, I can ban my model from saying any tokens that would break a JSON schema or that would break a regex schema that I don’t want. Stuff like that, which actually is majority of enterprise use cases, you don’t necessarily need to be using those API based models, and you can get really immediate cost and latency benefits.
Summary
Figure out your deployment boundaries and work backwards. Because you know your deployment boundaries, you know that you should pick the model that when you’ve quantized it down is that size. Spend time thinking about optimizing inference so that can make the difference of genuinely multiple orders of magnitude. Gen AI benefits from consolidation of infrastructure, so try to avoid having each team being responsible for their deployments, because it will probably go wrong. Build as if you’re going to replace your model in 12 months. GPUs look expensive, but they’re your best option. When you can, you’ll use small models. Then we said all of this to Russell, and then he was like, “That was so helpful. I’m so excited to deploy my mission critical LLM app using your tips.” Then we said, “No problem, let us know if you have any questions”.
Questions and Answers
Participant 1: You said, build for flexibility. What are the use cases for frequent model replacements? The time and effort we have spent on custom fine-tuning, on custom data, will have to be repeated? Do you have any tips for that in case of frequent model replacements?
Arik: When would you want to do frequent model replacement? All of the time. With the pace of LLM improvement, it’s almost always the case that you can get better performance, literally just by swapping out a model. You might need some tweaks to prompts, but typically, just doing a one-to-one switch works. For example, if I have my application built on GPT-3.5 and I swap it out for GPT-4, even if I’m using the same prompt, the chances are my model performance will go up, and that’s a very low effort thing to do. How does that square with things like the engineering effort required to swap? If it is a month’s long process, if it’s not a significant improvement, then you shouldn’t make that switch. What I would suggest is trying to build in a way where it’s not a month’s long process and actually can be done in a couple days, because then it will almost always be worth that switch.
How does that square as well with things like fine-tuning? I have a spicy and hot take, which is, for the majority of use cases, you don’t need to fine-tune. Fine-tuning was very popular in deep learning of a couple years ago. As the models are getting better, they’re also better at following your instructions as well. You tend to not need to fine-tune for a lot of use cases, and can just get away with things like RAG, prompt engineering, and function calling. That’s what I would tend to say. If you are looking for your first LLM use case, speaking of swapping models, a really good first LLM use case is to just try and swap out your NLP pipelines. A lot of businesses have preexisting NLP pipelines. If you can swap them for LLMs, typically, you’ll get multiple points of accuracy boost.
Participant 2: How do you see the difference for the on-prem hardware, between enterprise grade hardware and consumer maxed out hardware, because I chose to go for consumer maxed out hardware because you go up to 6000 meg transfers on the memory, and the PCI lanes are faster.
Arik: Because people like him have taken all the A100s, when we do our internal development, we actually do it on 4090s, which is consumer hardware. They’re way more readily accessible, much cheaper as well than getting those data center hardware. That’s what we use for our development. We’ve not actually used consumer grade hardware for at-scale inference, although there’s no reason why it wouldn’t work.
If it works for your workload. We use it as well. We think they’re very good. They’re also just much cheaper, because they’re sold as consumer grade, rather than data center grade.
Participant 3: You’re saying that GPU is a whole and it’s most important. I’m a bit surprised, but maybe my question will explain. I made some proof of concept with small virtual machines with only CPUs, and I get quite good results with few requests per second. I did not ask myself about scalability. I’m thinking about how much requests shall we switch to GPUs?
Arik: Actually, maybe I was a bit strong on the GPU stuff, because we’ve deployed on CPU as well. If the latency is good enough, and that’s typically the first complaint that people get, is latency, then CPU is probably fine. It’s just that when you’re looking at economies of scale and when you’re looking at scaling up, they will almost always be more expensive per request. If you have a reasonably low number of requests, and the latency is fine, then you can get away with it. I think one of our first proof of concepts with our inference server was done on CPU. One thing that you will also know is that you’ll be limited in the size of model that you can go up to. For example, if you’re doing a 7 billion quantized, you can probably get away with doing CPU as well. I think GPU is better if you are starting from a blank slate. If you’re starting from a point where you already have a massive data center filled with CPUs and you’re not using them otherwise, it is still worth experimenting whether you can utilize them.
Participant 4: I have a question regarding the APIs that are typically used, and of course, it’s OpenAI’s API that are typically used also by applications. I also know a lot of people who do not really like the OpenAI API. Do you see any other APIs around? Because a lot of people are just emulating them, or they are just using it, but no one really likes it.
Arik: When you say they don’t like it, do they not like the API structure, or don’t like the models?
Participant 4: It is about the API structure. It is about documentation. It is about states, about a lot of things that happen that you can’t fully understand.
Arik: We also didn’t really like it, so we wrote our own API that’s called as our inference server, and then we have an OpenAI compatible layer, because most people are using that structure. You can check out our docs and see if you like that better. I think because it was the first one to really blow up, it’s what the whole industry converged to when it comes to that API structure.
See more presentations with transcripts
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
Andrew Davidson, Senior Vice President, Product Management, of MongoDB, building and managing your data in the cloud.
It’s a common adage that in today’s digital world, there’s an application, or software, for everything. But what exactly is software?
At its most basic level, software is a bridge between the world and a digital understanding of the world. And that digital understanding is maintained by operational data. Essentially, operational data is software’s heartbeat—it quietly drives the digital systems, decisions and processes that govern much of our lives.
More specifically, operational data is the real-time information generated from everyday processes, like convenience store transactions, hospital patient admissions or traffic data. It’s another way of referring to what some of us know as transactional data, or online transaction processing. Whatever you call it, the bottom line is that operational data is necessary for powering applications of all sorts that complete immediate tasks and generate insights that help us respond intelligently to the world’s changing conditions.
Consider the role of operational data in a hospital. When a patient is admitted, a data record is created and then updated with new information about the patient’s stay in the facility. That record can be referenced on its own or fed into software that helps medical providers give personalized and timely care to the patient. Put together, patient records give software insights that help the hospital scale workforces and triage patients based on real-time events.
On a day-to-day basis, this data supports patient care, but it can also play a critical role in adapting to out-of-the-ordinary circumstances. This is all made possible because of the “bridge” that operational data creates between the world and our digital understanding of it.
For example, Northwell Health—a New York-based provider that serves over two million patients annually—dealt with this during the early days of Covid-19. To get ahead of the influx of Covid-19 patients, Northwell built a tool that captures real-time clinical data from its 19 emergency departments and 52 urgent care centers.
By monitoring infection trends at its outpatient facilities, Northwell identified Covid surge clusters days before they overwhelmed inpatient facilities—giving Northwell time to expand treatment capacity to serve 163,000 Covid-19 patients in a year. At one point, Northwell was treating the most Covid-19 patients of any health system in the U.S., thanks in part to its data-driven innovations.
Capturing Lost Data
In sum, data runs the world. And at its best, it can save lives. So adopting a modern operational data layer that centrally integrates an organization’s data—and makes it readily available to consuming applications—is of paramount importance on the journey to delivering smarter, safer and more reliable digital services.
But too many organizations lack the foundation of a modern operational data layer or are leaving vast amounts of their operational data untapped. For example, nearly two-thirds of organizations’ data is considered “dark data”—data that is inaccessible, difficult to retrieve and can cost more to store than it delivers in value.
Such dark data constitutes a huge organizational tax; a recent study found that 52% of the average company’s data storage budget is spent on dark data. And with Grand View Research estimating that the worldwide database software market is worth $100.79 billion, we can assume that billions are being lost annually to dark data. It’s also a cybersecurity threat: If you can’t readily access and maintain much of your data, how can you ensure its security? Dark data is thus a vast attack surface for hackers.
The reality is that terabytes of enterprise data are locked in silos or stuck in brittle systems that make it difficult to fully tap into. Poor data management systems can be blamed for inefficiencies and wasted resources, missed insights, slow response to security incidents and an inability to adapt to market shifts. Together, this means that only a small fraction of today’s operational data is leveraged to its full potential—it’s buried gold at best, and a liability at worst.
Investing in a strong operational data layer can unlock the immense value of this untapped data, and make it faster and more cost-effective to launch data-driven applications. Indeed, a 2021 McKinsey report found that organizations with data-driven cultures can boost their productivity by up to 20% and accelerate decision making.
AI’s Promise: A Smarter Future
With AI, the opportunities data presents are only growing. AI can facilitate the processes involved with architecting an operational data layer. Then, mountains of operational data can be funneled into AI-powered applications that enhance real-time predictions and deliver personalized insights to end-users—all while making it easy for anyone to work closely with data.
Examples of this already abound. Mount Sinai Health System implemented new AI algorithms that assess patient histories in combination with lab results to predict potential complications, reducing readmission rates for heart failure problems by 30%. And New Zealand’s Pathfinder Labs (a customer of MongoDB) uses AI to make it faster for detectives investigating cybercrimes against children to navigate volumes of evidence, catch more offenders and rescue more kids.
And in a more day-to-day example, the City of Los Angeles implemented an AI-powered traffic management system that uses adaptive traffic signaling to reduce traffic congestion—thereby shortening commutes and reducing vehicle emissions—while improving driver safety with smarter incident detection capabilities.
I could go on and on. The point is that right now, we have an opportunity to leverage one of our most ubiquitous resources—our data—to drive unprecedented levels of resilience, to more effectively adapt to a changing world and to revolutionize our quality of life. By harnessing the operational data layer, we can build a foundation for a transformative future. And I can’t wait to see where that data-fueled future takes us.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
Article originally posted on mongodb google news. Visit mongodb google news
MongoDB Expands Microsoft Partnership with New AI, Analytics Integration – Stock Titan
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
MongoDB Atlas now available on Azure OpenAI Service
New Microsoft Fabric Mirroring integration with MongoDB Atlas allows for near real-time data syncs
MongoDB Enterprise Advanced now available on Azure Marketplace for Azure Arc-enabled Kubernetes applications
CHICAGO, Nov. 19, 2024 /PRNewswire/ — Today at Microsoft Ignite, MongoDB, Inc. (NASDAQ: MDB) announced an expanded collaboration with Microsoft that introduces three new capabilities for joint customers. First, customers building applications powered by retrieval-augmented generation (RAG) can now select MongoDB Atlas as a vector store in Microsoft Azure AI Foundry, combining MongoDB Atlas’s vector capabilities with generative AI tools and services from Microsoft Azure and Azure Open AI. Meanwhile, users looking to maximize insights from operational data can now do so in near real-time with Open Mirroring in Microsoft Fabric for MongoDB Atlas. And the launch of MongoDB Enterprise Advanced (EA) on Azure Marketplace for Azure Arc-enabled Kubernetes applications enables organizations that operate across on-premises, multi-cloud, and edge Kubernetes environments to choose MongoDB. With these capabilities, MongoDB is meeting customers where they are on their innovation journeys, and making it easier for them to unleash the power of data.
Through the strengthened MongoDB-Microsoft relationship, customers will be able to:
- Enhance LLMs with proprietary data stored in MongoDB Atlas: Accessible through Azure AI Foundry, the Azure OpenAI Service allows businesses to develop RAG applications with their proprietary data in combination with the power of advanced LLMs. This new integration with Azure OpenAI Service enables users to take enterprise data stored in MongoDB Atlas and augment LLMs with proprietary context. This collaboration makes it easy to build unique chatbots, copilots, internal applications, or customer-facing portals that are grounded in up-to-date enterprise data and context. Developers are now able to add MongoDB Atlas as a vector data store for advanced LLMs, all without the need for additional coding or pipeline building. And through Azure AI Foundry’s “Chat Playground” feature, developers can quickly test how their enterprise data and selected LLM function together before taking it to production.
- Generate key business insights faster: Microsoft Fabric empowers businesses to gather actionable insights from their data on an AI-powered Analytics platform. Now Open Mirroring in Microsoft Fabric with MongoDB Atlas will allow for a near real-time connection, to keep data in sync between MongoDB Atlas and OneLake in Microsoft Fabric. This enables the generation of near real-time analytics, AI-based predictions, and business intelligence reports. Customers will be able to seamlessly take advantage of each data platform without having to choose between one or the other, or without worrying about maintaining and replicating data from MongoDB Atlas to OneLake.
- Deploy MongoDB Their Way: The launch of MongoDB EA on Azure Marketplace for Azure Arc-enabled Kubernetes applications gives customers greater flexibility when building applications across multiple environments. With MongoDB EA, customers are able to deploy and self-manage MongoDB database instances in the environment of their choosing, including on-premises, hybrid, and multi-cloud. The MongoDB Enterprise Kubernetes Operator, part of the MongoDB Enterprise Advanced offering, enhances the availability, resilience, and scalability of critical workloads by deploying MongoDB replica sets, sharded MongoDB clusters, and the Ops Manager tool across multiple Kubernetes clusters. Azure Arc further complements this by centrally managing these Kubernetes clusters running anywhere—in Azure, on premises, or even in other clouds. Together, these capabilities ensure that customers can build robust, distributed applications by leveraging the resilience of a strong data layer along with the central management capabilities that Azure Arc offers for its Arc-enabled Kubernetes applications.
“We frequently hear from MongoDB’s customers and partners that they’re looking for the best way to build AI applications, using the latest models and tools.” said Alan Chhabra, Executive Vice President of Partners at MongoDB. “And to address varying business needs, they also want to be able to use multiple tools for data analytics and business insights. Now, with the MongoDB Atlas integration with Azure AI Foundry, customers can power gen AI applications with their own data stored in MongoDB. And with Open Mirroring in Microsoft Fabric, customers can seamlessly sync data between MongoDB Atlas and OneLake for efficient data analysis. Combining the best from Microsoft with the best from MongoDB will help developers push applications even further.”
Joint Microsoft and MongoDB customers and partners welcome the expanded collaboration for greater data development flexibility.
Trimble, a leading provider of construction technology, delivers a connected ecosystem of solutions to improve coordination and collaboration between construction teams, phases and processes.
“As an early tester of the new integrations, Trimble views MongoDB Atlas as a premier choice for our data and vector storage. Building RAG architectures for our customers require powerful tools and these workflows need to enable the storage and querying of large collections of data and AI models in near real-time,” said Dan Farner, Vice President of Product Development at Trimble. “We’re excited to continue to build on MongoDB and look forward to taking advantage of its integrations with Microsoft to accelerate our ML offerings across the construction space.”
Eliassen Group, a strategic consulting company that provides business, clinical, and IT services, will use the new Microsoft integrations to drive innovation and provide greater flexibility to their clients.
“We’ve witnessed the incredible impact MongoDB Atlas has had on our customers’ businesses, and we’ve been equally impressed by Microsoft Azure AI Foundry’s capabilities. Now that these powerful platforms are integrated, we’re excited to combine the best of both worlds to build AI solutions that our customers will love just as much as we do,” said Kolby Kappes, Vice President – Emerging Technology, Eliassen Group.
Available in 48 Azure regions globally, MongoDB Atlas provides joint customers with the powerful capabilities of the document data model. With versatile support for structured and unstructured data, including Atlas Vector Search for RAG-powered applications, MongoDB Atlas accelerates and simplifies how developers build with data.
“By integrating MongoDB Atlas with Microsoft Azure’s powerful AI and data analytics tools, we empower our customers to build modern AI applications with unparalleled flexibility and efficiency,” said Sandy Gupta, VP, Partner Development ISV, Microsoft. “This collaboration ensures seamless data synchronization, real-time analytics, and robust application development across multi-cloud and hybrid environments.”
To read more about MongoDB Atlas on Azure go to https://www.mongodb.com/products/platform/atlas-cloud-providers/azure.
About MongoDB
Headquartered in New York, MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. Built by developers, for developers, MongoDB’s developer data platform is a database with an integrated set of related services that allow development teams to address the growing requirements for a wide variety of applications, all in a unified and consistent user experience. MongoDB has more than 50,000 customers in over 100 countries. The MongoDB database platform has been downloaded hundreds of millions of times since 2007, and there have been millions of builders trained through MongoDB University courses. To learn more, visit mongodb.com.
Forward-looking Statements
This press release includes certain “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, or the Securities Act, and Section 21E of the Securities Exchange Act of 1934, as amended, including statements concerning MongoDB’s deepened partnership with Microsoft. These forward-looking statements include, but are not limited to, plans, objectives, expectations and intentions and other statements contained in this press release that are not historical facts and statements identified by words such as “anticipate,” “believe,” “continue,” “could,” “estimate,” “expect,” “intend,” “may,” “plan,” “project,” “will,” “would” or the negative or plural of these words or similar expressions or variations. These forward-looking statements reflect our current views about our plans, intentions, expectations, strategies and prospects, which are based on the information currently available to us and on assumptions we have made. Although we believe that our plans, intentions, expectations, strategies and prospects as reflected in or suggested by those forward-looking statements are reasonable, we can give no assurance that the plans, intentions, expectations or strategies will be attained or achieved. Furthermore, actual results may differ materially from those described in the forward-looking statements and are subject to a variety of assumptions, uncertainties, risks and factors that are beyond our control including, without limitation: the effects of the ongoing military conflicts between Russia and Ukraine and Israel and Hamas on our business and future operating results; economic downturns and/or the effects of rising interest rates, inflation and volatility in the global economy and financial markets on our business and future operating results; our potential failure to meet publicly announced guidance or other expectations about our business and future operating results; our limited operating history; our history of losses; failure of our platform to satisfy customer demands; the effects of increased competition; our investments in new products and our ability to introduce new features, services or enhancements; our ability to effectively expand our sales and marketing organization; our ability to continue to build and maintain credibility with the developer community; our ability to add new customers or increase sales to our existing customers; our ability to maintain, protect, enforce and enhance our intellectual property; the effects of social, ethical and regulatory issues relating to the use of new and evolving technologies, such as artificial intelligence, in our offerings or partnerships; the growth and expansion of the market for database products and our ability to penetrate that market; our ability to integrate acquired businesses and technologies successfully or achieve the expected benefits of such acquisitions; our ability to maintain the security of our software and adequately address privacy concerns; our ability to manage our growth effectively and successfully recruit and retain additional highly-qualified personnel; and the price volatility of our common stock. These and other risks and uncertainties are more fully described in our filings with the Securities and Exchange Commission (“SEC”), including under the caption “Risk Factors” in our Annual Report on Form 10-Q for the quarter ended July 31, 2024, filed with the SEC on August 30, 2024, and other filings and reports that we may file from time to time with the SEC. Except as required by law, we undertake no duty or obligation to update any forward-looking statements contained in this release as a result of new information, future events, changes in expectations or otherwise.
Investor Relations
Brian Denyeau
ICR for MongoDB
646-277-1251
ir@mongodb.com
Media Relations
MongoDB
press@mongodb.com
View original content to download multimedia:https://www.prnewswire.com/news-releases/mongodb-deepens-relationship-with-microsoft-through-new-integrations-for-ai-and-data-analytics-and-microsoft-azure-arc-support-302309318.html
SOURCE MongoDB, Inc.
Article originally posted on mongodb google news. Visit mongodb google news
MMS • RSS
Posted on nosqlgooglealerts. Visit nosqlgooglealerts
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
For years, enterprise companies have been plagued by data silos separating transactional systems from analytical tools—a divide that has hampered AI applications, slowed real-time decision-making, and driven up costs with complex integrations. Today at its Ignite conference, Microsoft announced a major step toward breaking this cycle.
The tech giant revealed that Azure SQL, its flagship transactional database, is now integrated into Fabric, Microsoft’s unified data platform. This integration allows enterprises to combine real-time operational and other historical data into a single, AI-ready data later called OneLake.
This announcement represents a critical evolution of Microsoft Fabric, its end-to-end data platform, which also includes new capabilities like real-time intelligence and the general availability of the OneLake catalog (see our full coverage of the Microsoft Ignite data announcements here). Together, these updates aim to address the growing demand for accessible, high-quality data in enterprise AI workflows.
Until now, companies have struggled to connect disparate data systems, relying on patchwork solutions to support AI applications. The urgency has only increased with the rise of AI agents—software tools capable of performing complex reasoning autonomously. These agents require instantaneous access to live and historical data to function effectively, a demand Microsoft aims to meet with Fabric.
And with AI agents becoming one one of the hottest trends for enterprise companies next year, Microsoft is pushing to lead here. See our separate coverage about how Microsoft is ahead in this race, and no one else is close.
The integration of Azure SQL is just the beginning of this integration of transactional data. Microsoft plans to extend support to other key transactional databases, including Cosmos DB, its NoSQL document database widely used in AI applications, and PostgreSQL, the popular open-source relational database. While timelines for these integrations remain unspecified, this marks a significant milestone in Microsoft’s effort to create a truly unified data platform.
Microsoft also said it plans to integrate with popular open source transactional databases, including MongoDB, and Cassandra, but it’s unlikely Microsoft will prioritize integration with competing proprietary transactional databases like Couchbase and Google’s Bigtable.
The power of unified data integration
Arun Ulag, corporate vice president of Azure Data, emphasized in an interview that integrating transactional databases like Cosmos DB into Fabric is critical for enabling next-generation AI applications. For example, OpenAI’s ChatGPT—the fastest-growing consumer AI product in history—relies on Cosmos DB to power its conversations, context, and memory, managing billions of transactions daily.
As AI agents evolve to handle complex tasks like e-commerce transactions, the demand for real-time access to transactional databases will only grow. These agents rely on advanced techniques like vector search, which retrieves data based on semantic meaning rather than exact matches, to answer user queries effectively—such as recommending a specific book.
“You don’t have the time to…go run your RAG model somewhere else,” Ulag said, referencing retrieval-augmented generation models that combine real-time and historical data. “It has to be just built into the database itself.”
By unifying operational and analytical capabilities, Fabric allows businesses to build AI applications that seamlessly leverage live transactional data, structured analytics, and unstructured insights.
Key advancements include:
- Real-time intelligence: Built-in vector search and retrieval-augmented generation (RAG) capabilities simplify AI application development, reducing latency and improving accuracy.
- Unified data governance: OneLake provides a centralized, multi-cloud data layer that ensures interoperability, compliance, and easier collaboration.
- Seamless code generation: Copilot in Fabric can automatically translate natural language queries into SQL, allowing developers to get inline code suggestions, real-time explanations and fixes.
AI Skills: simplifying AI agent app development
One of the most dynamic announcements in Fabric is the introduction of AI Skills, a capability that enables enterprises to interact with any data – wherever it resides – through natural language. They connect to Copilot Studio, so you can build AI agents that easily query this data across multiple systems, from transactional logs to semantic models.
Ulag said that if he had to pick one announcement that excites him the most, it would be AI Skills. With AI Skills, business users can simply point to any dataset — be it from any cloud, structured, or unstructured – and begin asking questions about that data, whether through natural language, SQL queries, Power BI business definitions, or real-time intelligence engines, he said.
For example, a user could use AI Skills to identify trends in sales data stored across multiple systems or to generate instant insights from IoT telemetry logs. By bridging the gap between business users and technical systems, AI Skills simplifies the development of AI agents and democratizes data access across organizations.
As of today, AI Skills can connect with lakehouse and data warehouse tables, mirrored DB and shortcut data, and now semantic models and Eventhouse KQL databases. Support for unstructured data is “coming soon,” the company said.
Differentiation in a crowded market
Microsoft faces fierce competition from players like Databricks and Snowflake on the data platform front, as well as AWS and Google Cloud in the broader cloud ecosystem—all of which are working on integrating transactional and analytical databases. However, Microsoft’s approach with Fabric is beginning to carve out a unique position.
By leveraging a unified SaaS model, seamless Azure ecosystem integration, and a commitment to open data formats, Microsoft eliminates many of the data complexities that have plagued enterprise data systems. Additionally, tools like Copilot Studio for building AI agents and Fabric’s deep integration across multi-cloud environments give it an edge (see my separate analysis [LINK] of Microsoft’s positioning around AI agents, which also appears to be industry-leading).
Microsoft’s ability to embed AI capabilities directly into its unified data environment “could provide a better experience for developers and data scientists,” said Robert Kramer, vice president at research firm Moor Insights, underscoring how Fabric’s design simplifies workflows and accelerates AI-driven innovation.
Key differentiators include:
- Unified SaaS model: Fabric eliminates the need to manage multiple services, offering enterprises a single, cohesive platform that reduces complexity and operational overhead.
- Multi-cloud support: Unlike some competitors, Fabric integrates with AWS, Google Cloud, and on-premises systems, enabling organizations to work seamlessly across diverse data environments.
- AI-optimized workflows: Built-in support for vector similarity search and retrieval-augmented generation (RAG) streamlines the creation of intelligent applications, cutting development time and improving performance.
Microsoft’s strategy to unify and simplify the enterprise data stack not only meets the demands of today’s AI-centric workloads but also sets a high bar for competitors in the rapidly evolving data platform market.
The road ahead: where Fabric fits in the AI ecosystem
The integration of transactional databases into Fabric marks a significant milestone, but it also reflects a broader shift across the enterprise data landscape: the race toward seamless interoperability. With AI agents becoming a cornerstone of enterprise strategy, the ability to unify disparate systems into a cohesive architecture is no longer optional—it’s essential.
However, Arun Ulag, corporate vice president of Azure Data, acknowledged the challenges that come with operating at Microsoft’s scale. While the company has taken major strides with Fabric, the fast-moving nature of the industry demands constant innovation and adaptability.
“A lot of these patterns are new,” Ulag explained, describing the challenges of designing for a diverse set of use cases across industries. “Some of these patterns will work. Some of them will not, and we’ll only know as customers try them at scale…The way it’s used in automotive may be very, very different from the way it’s used in healthcare,” he added, emphasizing the role of external forces like government regulations in shaping future development.
As Microsoft continues to refine Fabric, the company is positioning itself as a leader in the shift to unified, AI-ready data architectures. But with competitors also racing to meet the demands of enterprise AI, the journey ahead will require constant evolution, rapid learning, and a focus on delivering value at scale.
For more insights into the announcements and Arun Ulag’s perspective, watch our full video interview above.