Category: Uncategorized
Microsoft Introduces Local Emulator for Azure Service Bus Wanted by Developers for Years
MMS • Steef-Jan Wiggers
Article originally posted on InfoQ. Visit InfoQ
In response to developer feedback, Microsoft launched a local Azure Service Bus emulator. According to the company, the emulator promises to simplify the creation and testing of Azure Service Bus applications by offering a localized environment free from network or cloud-related constraints.
The Azure Service Bus is a managed message broker enabling reliable application communication. Its features include queues and topics for efficient handling, load balancing, transactional reliability, and safe data routing for decoupling services.
Despite its robust capabilities, developers often face challenges testing against cloud-based Service Bus instances due to latency, costs, and cloud dependencies. This local emulator addresses these hurdles head-on.
The company designed the emulator with developer convenience in mind, offering several benefits like:
- Optimized Development Loop: Developers can test and iterate quickly without relying on cloud deployments, drastically reducing the development cycle time.
- Cost Efficiency: Since the emulator runs locally, it eliminates cloud usage costs for testing and development scenarios.
- Isolated Environment: Local testing ensures no interference from other cloud-based activities, allowing precise troubleshooting and debugging.
- Pre-Migration Testing: Developers can trial Azure Service Bus using their existing AMQP-based applications before committing to a full cloud migration.
The emulator is platform-independent and accessible as a Docker image from the Microsoft Artifact Registry. Developers can deploy it quickly using docker compose or automated scripts available in Microsoft’s Installer repository.
While the emulator replicates much of the Azure Service Bus’s functionality, some features are unavailable:
- Azure-specific integrations like virtual networks, Microsoft Entra ID, and activity logs.
- Advanced capabilities like autoscaling, geo-disaster recovery, and large message handling.
- Persisted data: Container restarts reset data and entities.
Furthermore, the emulator is tailored for local development and lacks several high-level Azure Service Bus cloud service features. It does not support a UI portal, visual metrics, or advanced alerting capabilities.
The emulator enforces quotas like the cloud service, such as:
- Maximum of 50 queues/topics per namespace.
- Message size capped at 256 KB.
- Namespace size is limited to 100 MB.
Configuration changes must be pre-defined in config.json and applied before restarting the container.
Developers have long anticipated the local Service Bus emulator for a while. Vincent Kok, a freelance .NET developer, wrote in a post on LinkedIn:
Initially, Microsoft rejected the idea of setting up a local development for Azure Service Bus. The official answer from Microsoft was to use cloud instances of Azure ServiceBus. However, this approach requires each developer to create their own Service Bus namespace to ensure isolated development and testing. Alternatively, developers can share a single Service Bus namespace, but this introduces the risk of messages published by one developer being consumed by another, which is not very practical.
And:
Today, six years after that GitHub issue was first opened, the wait is finally over! Microsoft has released a local emulator for Azure Service Bus, enabling developers to build and test applications locally without needing to spin up cloud instances of Service Bus.
Furthermore, on X, Dave Callan, a Microsoft MVP, tweeted:
It’s so amazing that this is finally here.
We can use the emulator to develop and test code against the service in isolation, free from cloud interference.
Lastly, the emulator is compatible with the latest service bus client SDKs.
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
Swiss National Bank increased its position in shares of MongoDB, Inc. (NASDAQ:MDB – Free Report) by 1.1% in the 3rd quarter, according to the company in its most recent 13F filing with the Securities & Exchange Commission. The institutional investor owned 217,700 shares of the company’s stock after acquiring an additional 2,300 shares during the quarter. Swiss National Bank owned about 0.29% of MongoDB worth $58,855,000 as of its most recent SEC filing.
Other large investors have also recently modified their holdings of the company. MFA Wealth Advisors LLC acquired a new stake in MongoDB in the 2nd quarter worth approximately $25,000. J.Safra Asset Management Corp lifted its stake in shares of MongoDB by 682.4% during the second quarter. J.Safra Asset Management Corp now owns 133 shares of the company’s stock worth $33,000 after buying an additional 116 shares during the period. Quarry LP grew its holdings in shares of MongoDB by 2,580.0% during the second quarter. Quarry LP now owns 134 shares of the company’s stock valued at $33,000 after buying an additional 129 shares during the last quarter. Hantz Financial Services Inc. acquired a new position in shares of MongoDB in the 2nd quarter valued at $35,000. Finally, GAMMA Investing LLC increased its position in shares of MongoDB by 178.8% in the 3rd quarter. GAMMA Investing LLC now owns 145 shares of the company’s stock valued at $39,000 after acquiring an additional 93 shares during the period. Institutional investors own 89.29% of the company’s stock.
Insider Transactions at MongoDB
In other news, CRO Cedric Pech sold 302 shares of the business’s stock in a transaction that occurred on Wednesday, October 2nd. The shares were sold at an average price of $256.25, for a total value of $77,387.50. Following the sale, the executive now directly owns 33,440 shares of the company’s stock, valued at approximately $8,569,000. The trade was a 0.90 % decrease in their position. The transaction was disclosed in a legal filing with the Securities & Exchange Commission, which is available at this hyperlink. Also, CAO Thomas Bull sold 1,000 shares of the company’s stock in a transaction that occurred on Monday, September 9th. The stock was sold at an average price of $282.89, for a total transaction of $282,890.00. Following the completion of the transaction, the chief accounting officer now owns 16,222 shares in the company, valued at approximately $4,589,041.58. This represents a 5.81 % decrease in their ownership of the stock. The disclosure for this sale can be found here. Insiders sold 25,600 shares of company stock worth $7,034,249 in the last 90 days. Insiders own 3.60% of the company’s stock.
Analyst Upgrades and Downgrades
A number of research analysts have recently weighed in on MDB shares. UBS Group increased their price objective on MongoDB from $250.00 to $275.00 and gave the company a “neutral” rating in a report on Friday, August 30th. Mizuho lifted their price objective on shares of MongoDB from $250.00 to $275.00 and gave the company a “neutral” rating in a research note on Friday, August 30th. Needham & Company LLC increased their price objective on shares of MongoDB from $290.00 to $335.00 and gave the stock a “buy” rating in a research note on Friday, August 30th. Wells Fargo & Company boosted their target price on shares of MongoDB from $300.00 to $350.00 and gave the company an “overweight” rating in a research report on Friday, August 30th. Finally, Wedbush raised MongoDB to a “strong-buy” rating in a research report on Thursday, October 17th. One equities research analyst has rated the stock with a sell rating, five have assigned a hold rating, nineteen have assigned a buy rating and one has assigned a strong buy rating to the company. Based on data from MarketBeat, the company has a consensus rating of “Moderate Buy” and a consensus target price of $336.54.
Check Out Our Latest Report on MongoDB
MongoDB Stock Performance
NASDAQ MDB opened at $289.15 on Wednesday. The company has a debt-to-equity ratio of 0.84, a quick ratio of 5.03 and a current ratio of 5.03. The stock’s fifty day moving average is $278.06 and its two-hundred day moving average is $273.04. The firm has a market capitalization of $21.36 billion, a P/E ratio of -95.74 and a beta of 1.15. MongoDB, Inc. has a fifty-two week low of $212.74 and a fifty-two week high of $509.62.
MongoDB (NASDAQ:MDB – Get Free Report) last announced its quarterly earnings results on Thursday, August 29th. The company reported $0.70 earnings per share for the quarter, beating the consensus estimate of $0.49 by $0.21. MongoDB had a negative net margin of 12.08% and a negative return on equity of 15.06%. The firm had revenue of $478.11 million for the quarter, compared to analysts’ expectations of $465.03 million. During the same quarter in the previous year, the company posted ($0.63) EPS. The company’s revenue was up 12.8% compared to the same quarter last year. On average, research analysts anticipate that MongoDB, Inc. will post -2.39 earnings per share for the current year.
MongoDB Company Profile
MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.
Further Reading
Before you consider MongoDB, you’ll want to hear this.
MarketBeat keeps track of Wall Street’s top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis. MarketBeat has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on… and MongoDB wasn’t on the list.
While MongoDB currently has a “Moderate Buy” rating among analysts, top-rated analysts believe these five stocks are better buys.
With average gains of 150% since the start of 2023, now is the time to give these stocks a look and pump up your 2024 portfolio.
Article originally posted on mongodb google news. Visit mongodb google news
MMS • Edin Kapic
Article originally posted on InfoQ. Visit InfoQ
On November 12th, Microsoft presented the .NET MAUI 9 in its final form. This version brings two new controls (HybridWebView and TitleBar), a slew of improvements throughout the framework, free SyncFusion controls and a Xcode sync tool for Apple-specific files. The performance and stability of the entire framework has been enhanced.
MAUI is an acronym that stands for Multiplatform Application UI. According to Microsoft, it’s an evolution of Xamarin and Xamarin Forms frameworks, unifying separate target libraries and projects into a single project for multiple devices. Currently, MAUI supports writing applications that run on Android 5+, iOS 12.2+, macOS 12+ (as Mac Catalyst), Samsung Tizen, Windows 10 version 1809+, or Windows 11. The new versions bumps the minimum Apple devices support from iOS 11 and macOS 10.15 in .NET MAUI 8.
The .NET MAUI 9 journey to the GA (general availability) version started with the Preview 1 in February 2024. A new preview was launched roughly every month, plus two RC (release candidate) versions were made available in September and October. These frequent releases caught bugs and performance fixes that were included in the final version.
The first of the newly added controls, HybridWebView, allows developers to host HTML, JavaScript and CSS content within a WebView
control, but with a communication bridge between the web view and the MAUI application code in .NET. On the JavaScript side there is a HybridWebViewMessageReceived
event and SendRawMessage
method. On the .NET side of the application there is a RawMessageReceived event and a SendRawMessage method on the control.
The second of the new controls, TitleBar, allows developers to create custom title bars in their application. For the moment, this control is only supported on the Windows platform, with Mac Catalyst support coming ‘in a future release’. The title bar control is then set to the parent Window object using the Window.TitleBar property.
While there are only two new first-party controls in .NET MAUI 9, the recent partnership with SyncFusion has added 14 new free controls from the vendor to MAUI as a package. The new MAUI version adds a sample application to the MAUI App template, which showcases how to use several of the contributed controls, together with recommended practices for common app patterns.
As for performance and stability improvements, one of the significant changes was the complete re-implementation of CollectionView and CarouselView controls on Apple devices. The new implementation requires a code change in the root MauiProgram
class.
The new version also brings several deprecated features. The most important one is the Frame control, which is marked as obsolete and should be replaced with the Border control. In addition, the Application.MainPage property is replaced by setting the Window.Page property to the first page of the app.
It is worth noting that just two days after the launch, a Service Release (SR) patch was released with the version 9.0.10 of the framework. The SR version adds small fixes to the GA code. It could be in response to the comments of users on social networks, complaining that upgrading to .NET MAUI 9 breaks Visual Studio or the build process. On the other hand, developers like Claudio Bernasconi are stating that “MAUI is heading in the right direction”.
Readers can refer to GitHub official MAUI repository for complete release notes.
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
Atria Investments Inc boosted its position in MongoDB, Inc. (NASDAQ:MDB – Free Report) by 6.6% during the 3rd quarter, according to its most recent 13F filing with the Securities and Exchange Commission (SEC). The institutional investor owned 2,175 shares of the company’s stock after purchasing an additional 135 shares during the quarter. Atria Investments Inc’s holdings in MongoDB were worth $588,000 as of its most recent filing with the Securities and Exchange Commission (SEC).
Other institutional investors and hedge funds have also bought and sold shares of the company. Principal Financial Group Inc. raised its position in shares of MongoDB by 2.7% during the 3rd quarter. Principal Financial Group Inc. now owns 6,095 shares of the company’s stock valued at $1,648,000 after buying an additional 160 shares in the last quarter. Janney Montgomery Scott LLC bought a new stake in shares of MongoDB during the 3rd quarter valued at about $861,000. Stephens Investment Management Group LLC boosted its stake in shares of MongoDB by 22.8% during the 3rd quarter. Stephens Investment Management Group LLC now owns 30,664 shares of the company’s stock valued at $8,290,000 after purchasing an additional 5,688 shares in the last quarter. US Bancorp DE boosted its stake in shares of MongoDB by 9.1% during the 3rd quarter. US Bancorp DE now owns 3,869 shares of the company’s stock valued at $1,046,000 after purchasing an additional 324 shares in the last quarter. Finally, First Trust Direct Indexing L.P. boosted its stake in shares of MongoDB by 16.0% during the 3rd quarter. First Trust Direct Indexing L.P. now owns 1,888 shares of the company’s stock valued at $510,000 after purchasing an additional 261 shares in the last quarter. 89.29% of the stock is currently owned by hedge funds and other institutional investors.
MongoDB Stock Up 1.7 %
Shares of MongoDB stock opened at $289.15 on Wednesday. The firm’s 50 day moving average price is $278.06 and its 200-day moving average price is $273.04. The company has a quick ratio of 5.03, a current ratio of 5.03 and a debt-to-equity ratio of 0.84. MongoDB, Inc. has a 52-week low of $212.74 and a 52-week high of $509.62. The company has a market cap of $21.36 billion, a P/E ratio of -94.26 and a beta of 1.15.
MongoDB (NASDAQ:MDB – Get Free Report) last released its earnings results on Thursday, August 29th. The company reported $0.70 earnings per share for the quarter, topping the consensus estimate of $0.49 by $0.21. The company had revenue of $478.11 million for the quarter, compared to the consensus estimate of $465.03 million. MongoDB had a negative net margin of 12.08% and a negative return on equity of 15.06%. MongoDB’s revenue was up 12.8% compared to the same quarter last year. During the same quarter in the previous year, the firm earned ($0.63) EPS. On average, sell-side analysts predict that MongoDB, Inc. will post -2.39 EPS for the current fiscal year.
Insider Buying and Selling at MongoDB
In related news, CAO Thomas Bull sold 154 shares of the business’s stock in a transaction on Wednesday, October 2nd. The shares were sold at an average price of $256.25, for a total value of $39,462.50. Following the completion of the transaction, the chief accounting officer now directly owns 16,068 shares in the company, valued at $4,117,425. The trade was a 0.95 % decrease in their position. The transaction was disclosed in a document filed with the SEC, which is available at the SEC website. Also, Director Dwight A. Merriman sold 3,000 shares of the business’s stock in a transaction on Wednesday, October 2nd. The shares were sold at an average price of $256.25, for a total transaction of $768,750.00. Following the completion of the transaction, the director now owns 1,131,006 shares of the company’s stock, valued at approximately $289,820,287.50. This represents a 0.26 % decrease in their ownership of the stock. The disclosure for this sale can be found here. In the last ninety days, insiders sold 25,600 shares of company stock worth $7,034,249. Company insiders own 3.60% of the company’s stock.
Analyst Upgrades and Downgrades
Several brokerages have issued reports on MDB. Needham & Company LLC lifted their target price on shares of MongoDB from $290.00 to $335.00 and gave the stock a “buy” rating in a report on Friday, August 30th. Oppenheimer upped their target price on shares of MongoDB from $300.00 to $350.00 and gave the company an “outperform” rating in a report on Friday, August 30th. Wedbush raised shares of MongoDB to a “strong-buy” rating in a research note on Thursday, October 17th. Scotiabank increased their price objective on shares of MongoDB from $250.00 to $295.00 and gave the stock a “sector perform” rating in a research note on Friday, August 30th. Finally, Stifel Nicolaus raised their target price on shares of MongoDB from $300.00 to $325.00 and gave the company a “buy” rating in a research note on Friday, August 30th. One investment analyst has rated the stock with a sell rating, five have issued a hold rating, nineteen have assigned a buy rating and one has given a strong buy rating to the stock. Based on data from MarketBeat.com, the stock presently has a consensus rating of “Moderate Buy” and a consensus price target of $336.54.
MongoDB Profile
MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.
Featured Articles
Want to see what other hedge funds are holding MDB? Visit HoldingsChannel.com to get the latest 13F filings and insider trades for MongoDB, Inc. (NASDAQ:MDB – Free Report).
Receive News & Ratings for MongoDB Daily – Enter your email address below to receive a concise daily summary of the latest news and analysts’ ratings for MongoDB and related companies with MarketBeat.com’s FREE daily email newsletter.
Article originally posted on mongodb google news. Visit mongodb google news
MongoDB and Microsoft expand partnership to advance AI applications and data analytics
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
Database company MongoDB Inc. today announced an expanded partnership with Microsoft Corp. that includes new integrations aimed at enhancing artificial intelligence application development, real-time data analytics and deployment flexibility.
The first integration sees MongoDB Atlas, MongoDB’s fully managed cloud database service, integrated into Microsoft Azure AI Foundry. The goal is to allow customers to build retrieval-augmented generation or RAG applications by combining MongoDB’s data capabilities with Azure OpenAI Service.
With the integration, developers can enhance large language models with proprietary data stored in MongoDB Atlas without additional coding or pipeline building, streamlining the process of creating chatbots, copilots and enterprise AI applications. Azure AI Foundry’s “Chat Playground” feature further simplifies development by enabling real-time testing of LLMs with enterprise data before deployment.
The integration offers users a way to augment generative AI models with their own data to ensure their applications are grounded in up-to-date context. The combination of MongoDB Atlas and Azure AI Foundry offers flexibility and efficiency in leveraging enterprise data for advanced AI use cases.
In the second announcement, real-time data analytics with Microsoft Fabric, MongoDB Atlas now supports Open Mirroring in Microsoft Fabric for a near real-time connection with OneLake. The capability synchronizes data between the two platforms, allowing businesses to generate timely analytics, AI predictions, and business intelligence reports.
Through enabling real-time insights, businesses can leverage MongoDB’s operational data and Microsoft Fabric’s analytics tools to drive strategic decisions and optimize performance across diverse use cases, from AI-powered predictions to reporting.
The final announcement allows users to “deploy MongoDB their way” with MongoDB Enterprise Advanced on Azure Marketplace, introducing greater flexibility for organizations deploying applications in Kubernetes environments. With Azure Arc-enabled Kubernetes, customers can deploy and self-manage MongoDB instances across on-premises, multicloud and edge environments.
“By integrating MongoDB Atlas with Microsoft Azure’s powerful AI and data analytics tools, we empower our customers to build modern AI applications with unparalleled flexibility and efficiency,”Sandy Gupta, vice president of partner development ISV at Microsoft, said in a statement.
Sahir Azam, chief product officer of MongoDB, spoke with theCUBE, SiliconANGLE Media’s live streaming studio, in May, when he discussed how the company is strengthening its database ecosystem and advancing artificial intelligence capabilities with key partners:
Image: SiliconANGLE/Ideogram
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU
Article originally posted on mongodb google news. Visit mongodb google news
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
CHICAGO, Nov. 19, 2024 — Today at Microsoft Ignite, MongoDB, Inc. announced an expanded collaboration with Microsoft that introduces three new capabilities for joint customers. First, customers building applications powered by retrieval-augmented generation (RAG) can now select MongoDB Atlas as a vector store in Microsoft Azure AI Foundry, combining MongoDB Atlas’s vector capabilities with generative AI tools and services from Microsoft Azure and Azure Open AI. Meanwhile, users looking to maximize insights from operational data can now do so in near real-time with Open Mirroring in Microsoft Fabric for MongoDB Atlas. And the launch of MongoDB Enterprise Advanced (EA) on Azure Marketplace for Azure Arc-enabled Kubernetes applications enables organizations that operate across on-premises, multi-cloud, and edge Kubernetes environments to choose MongoDB. With these capabilities, MongoDB is meeting customers where they are on their innovation journeys, and making it easier for them to unleash the power of data.
Through the strengthened MongoDB-Microsoft relationship, customers will be able to:
- Enhance LLMs with proprietary data stored in MongoDB Atlas: Accessible through Azure AI Foundry, the Azure OpenAI Service allows businesses to develop RAG applications with their proprietary data in combination with the power of advanced LLMs. This new integration with Azure OpenAI Service enables users to take enterprise data stored in MongoDB Atlas and augment LLMs with proprietary context. This collaboration makes it easy to build unique chatbots, copilots, internal applications, or customer-facing portals that are grounded in up-to-date enterprise data and context. Developers are now able to add MongoDB Atlas as a vector data store for advanced LLMs, all without the need for additional coding or pipeline building. And through Azure AI Foundry’s “Chat Playground” feature, developers can quickly test how their enterprise data and selected LLM function together before taking it to production.
- Generate key business insights faster: Microsoft Fabric empowers businesses to gather actionable insights from their data on an AI-powered Analytics platform. Now Open Mirroring in Microsoft Fabric with MongoDB Atlas will allow for a near real-time connection, to keep data in sync between MongoDB Atlas and OneLake in Microsoft Fabric. This enables the generation of near real-time analytics, AI-based predictions, and business intelligence reports. Customers will be able to seamlessly take advantage of each data platform without having to choose between one or the other, or without worrying about maintaining and replicating data from MongoDB Atlas to OneLake.
- Deploy MongoDB Their Way: The launch of MongoDB EA on Azure Marketplace for Azure Arc-enabled Kubernetes applications gives customers greater flexibility when building applications across multiple environments. With MongoDB EA, customers are able to deploy and self-manage MongoDB database instances in the environment of their choosing, including on-premises, hybrid, and multi-cloud. The MongoDB Enterprise Kubernetes Operator, part of the MongoDB Enterprise Advanced offering, enhances the availability, resilience, and scalability of critical workloads by deploying MongoDB replica sets, sharded MongoDB clusters, and the Ops Manager tool across multiple Kubernetes clusters. Azure Arc further complements this by centrally managing these Kubernetes clusters running anywhere—in Azure, on premises, or even in other clouds. Together, these capabilities ensure that customers can build robust, distributed applications by leveraging the resilience of a strong data layer along with the central management capabilities that Azure Arc offers for its Arc-enabled Kubernetes applications.
“We frequently hear from MongoDB’s customers and partners that they’re looking for the best way to build AI applications, using the latest models and tools.” said Alan Chhabra, Executive Vice President of Partners at MongoDB. “And to address varying business needs, they also want to be able to use multiple tools for data analytics and business insights. Now, with the MongoDB Atlas integration with Azure AI Foundry, customers can power gen AI applications with their own data stored in MongoDB. And with Open Mirroring in Microsoft Fabric, customers can seamlessly sync data between MongoDB Atlas and OneLake for efficient data analysis. Combining the best from Microsoft with the best from MongoDB will help developers push applications even further.”
Joint Microsoft and MongoDB customers and partners welcome the expanded collaboration for greater data development flexibility.
Trimble, a leading provider of construction technology, delivers a connected ecosystem of solutions to improve coordination and collaboration between construction teams, phases and processes.
“As an early tester of the new integrations, Trimble views MongoDB Atlas as a premier choice for our data and vector storage. Building RAG architectures for our customers require powerful tools and these workflows need to enable the storage and querying of large collections of data and AI models in near real-time,” said Dan Farner, Vice President of Product Development at Trimble. “We’re excited to continue to build on MongoDB and look forward to taking advantage of its integrations with Microsoft to accelerate our ML offerings across the construction space.”
Eliassen Group, a strategic consulting company that provides business, clinical, and IT services, will use the new Microsoft integrations to drive innovation and provide greater flexibility to their clients.
“We’ve witnessed the incredible impact MongoDB Atlas has had on our customers’ businesses, and we’ve been equally impressed by Microsoft Azure AI Foundry’s capabilities. Now that these powerful platforms are integrated, we’re excited to combine the best of both worlds to build AI solutions that our customers will love just as much as we do,” said Kolby Kappes, Vice President – Emerging Technology, Eliassen Group.
Available in 48 Azure regions globally, MongoDB Atlas provides joint customers with the powerful capabilities of the document data model. With versatile support for structured and unstructured data, including Atlas Vector Search for RAG-powered applications, MongoDB Atlas accelerates and simplifies how developers build with data.
“By integrating MongoDB Atlas with Microsoft Azure’s powerful AI and data analytics tools, we empower our customers to build modern AI applications with unparalleled flexibility and efficiency,” said Sandy Gupta, VP, Partner Development ISV, Microsoft. “This collaboration ensures seamless data synchronization, real-time analytics, and robust application development across multi-cloud and hybrid environments.”
To read more about MongoDB Atlas on Azure go to https://www.mongodb.com/products/platform/atlas-cloud-providers/azure.
About MongoDB
Headquartered in New York, MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. Built by developers, for developers, MongoDB’s developer data platform is a database with an integrated set of related services that allow development teams to address the growing requirements for a wide variety of applications, all in a unified and consistent user experience. MongoDB has more than 50,000 customers in over 100 countries. The MongoDB database platform has been downloaded hundreds of millions of times since 2007, and there have been millions of builders trained through MongoDB University courses. To learn more, visit mongodb.com.
Source: MongoDB
Article originally posted on mongodb google news. Visit mongodb google news
Microsoft supercharges Fabric with new data tools to accelerate enterprise AI workflows
MMS • RSS
Posted on nosqlgooglealerts. Visit nosqlgooglealerts
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Today, Microsoft kicked off its Ignite conference, talking about all things AI, including how it has assembled the largest AI agent ecosystem and will allow enterprises to build more such apps using any of the 1,800 large language models it has on offer.
The move — a significant departure from the long-standing reliance on OpenAI — promises enhanced flexibility to developers, but we all know AI is just ‘garbage in and garbage out’ without a solid data foundation.
To this end, Microsoft also announced a series of updates for Fabric, its end-to-end SaaS data platform. According to Arun Ulag, the corporate VP of Azure data, the biggest development is the integration of transactional databases, which will transform Fabric into a truly unified, open data platform bringing all the necessary technologies together in one place to build next-gen AI applications, including advanced agents.
Other notable capabilities, some of which are being previewed while others are generally available, touch on different aspects of how Fabric operates, including data connectivity, workload performance, scalability, security and governance.
“We are relatively early in the AI journey… There are many more customers, business users, and developers that can take advantage of these technologies. And as they take advantage of these tools, we have to evolve them. We have to drive costs further down and make sure that we further accelerate business value. Ultimately, all of this should translate into higher GDP growth for countries and stronger business outcomes for customers,” Ulag said in an interview with VentureBeat.
Transactional database integration for fast-tracked AI development
Microsoft launched Fabric last year as a SaaS-based data and analytics platform to bring its innovations across the data stack in one place. The unified offering leveraged several tools the company built over the years, including SQL Server, Excel, Power BI and Azure Synapse, and provided teams with an end-to-end experience to connect, manage and analyze large structured and unstructured data assets.
At the core, Fabric is underpinned by an open lakehouse architecture called OneLake. It serves as a central, multi-cloud repository that supports various open data formats (Apache Parquet, Delta Lake and Iceberg) and the downstream analytical workloads. In the last few months, both Fabric and OneLake have received several improvements, including Real-Time Intelligence – which now becomes generally available – for analyzing streaming logs, IoT and telemetry as well as tools for migrating data from other data environments.
However, running analytical workloads to identify trends and patterns is just one piece of the puzzle. AI is the real deal today, and for that, the users need to go beyond aggregated, historical data. To help with this, Microsoft has announced Fabric Databases, which will see different transactional databases plug into OneLake, allowing users to access both live data from transactional systems (think individual purchase or login events) and bulk analytical data through one unified layer.
The company is starting with the integration of its own Azure SQL database and will follow up with other transactional databases including including Cosmos DB (its NoSQL document database behind ChatGPT), PostgreSQL, MongoDB and Cassandra. It hopes the move will save developers from complex database integrations and enable them to power next-gen AI apps, managing billions of interactions daily.
“Built-in vector search, RAG support, and Azure AI integration simplify AI app development, and your data is instantly available in OneLake for advanced analytics. Developers can even use Copilot in Fabric to translate natural language queries into SQL and get inline code completion alongside code fixes and explanations,” Ulag noted in a blog post today.
OneLake catalog, new AI features and more
In addition to transactional databases, Fabric is getting a new OneLake catalog to make it easier for teams to explore, manage and govern their entire Fabric data estate, no matter where the information has come from, as well as several AI capabilities to accelerate workflows.
The catalog, as Ulag wrote in the blog, carries two main tabs: Explore and Govern. The former is generally available and will help teams discover and manage their trusted data. Meanwhile, the Govern tab, aimed at providing data owners with valuable insights, tools and recommendations for governing their data, is in preview at this stage. These features will ensure that the teams are aware of what’s going on across the platform, without running into any surprises.
On the AI front, Microsoft is now previewing AI functions in Fabric notebooks, providing a simplified API for common AI text enrichments like summarization, translation, sentiment analysis, and more. The company is also enhancing AI skills (preview), which allow users to build agents that can be pointed to query any data across multiple systems via natural language. Ulag wrote AI skills now have an improved conversational experience. Plus, they can now connect to semantic models and Eventhouse KQL databases, going beyond lakehouse and data warehouse tables, mirrored DB and shortcut data.
Among other notable updates, Microsoft announced the general availability of API for GraphQL to allow efficient querying of multiple data sources using the widely adopted GraphQL technology; support for new events and simplified dashboard sharing in Real-Time Intelligence; and preview of open mirroring, a feature that allows any application or data provider to write data changes directly into a mirrored database within Fabric. It also confirmed the general availability of Azure SQL DB mirroring and the preview of SQL managed instance mirroring.
Finally, Fabric users will also get workspace monitoring and surge protection in preview. The former will provide detailed diagnostic logs for troubleshooting performance issues, capacity performance and data downtime, while the latter will prevent background jobs from starting after a set threshold.
Microsoft Ignite runs from November 19 to November 22, 2024
MMS • RSS
Posted on nosqlgooglealerts. Visit nosqlgooglealerts
Ignite A new version of Microsoft’s database warhorse, SQL Server, is on the way, with some useful improvements squeezed between the inevitable artificial intelligence additions.
New in SQL Server 2025 will be performance and availability enhancements lifted from Azure SQL. According to a Microsoft spokesperson, there’s optimized locking, optional parameter plan optimization, faster batch mode, and columnstore indexing in the release. There is also REST API support alongside Regular Expression enablement.
“Additionally, native JSON support enables developers to more effectively deal with frequently changing schema and hierarchical data, facilitating the creation of more dynamic applications,” the spokesperson said.
There’s support for Entra managed identities, which Microsoft says will improve credential management and compliance, and failover reliability has also been enhanced. And, of course, Copilot is in SQL Server Management Studio to “streamline SQL development by offering real-time suggestions, code completions, and best practice recommendations.”
Unsurprisingly, Microsoft is going all-in with AI in this release. “SQL Server 2025 has AI built-in, simplifying AI application development and retrieval-augmented generation (RAG) patterns with secure, performant, and easy-to-use vector support, leveraging the T-SQL language,” the company said.
“In this latest SQL Server version, flexible AI model management within the engine using REST interfaces allows our customers to use AI models from ground to cloud.”
Microsoft SQL Server is just over 35 years old – older, if one considers its Sybase origins – and the most recent release, SQL Server 2022, will remain in mainstream support until January 11, 2028. Extended support will go to January 11, 2033. The spokesperson told us that SQL Server 2025 would likely follow Microsoft’s Fixed Lifecycle policy, with five years of mainstream support followed by another five years of extended support.
Assuming SQL Server 2025 makes it to general availability in 2025 – it is currently in Private Preview – this translates to support until at least 2035.
If SQL Server 2022 was all about making everything “Azure-enabled,” SQL Server 2025 reflects Microsoft’s obsession with AI. “SQL Server 2025 transforms SQL Server into an enterprise AI-ready database, bringing AI to customers’ data in a secure, efficient manner,” the spokesperson said.
“This release continues SQL Server’s legacy of impressive performance and security, adding new features and AI assistance that optimizes customer data for the era of AI.”
As before, the company was tight-lipped on costs, although pay-as-you-go licensing for on-premises customers is available with Azure Arc integration.
It is hard to say if this might be the last hurrah for SQL Server. Microsoft has various alternative database options these days, and hybrid and cloud-based services. But there will always be customers who want to keep their data out of the cloud and firmly on-premises.
The spokesperson was non-committal: “The SQL Server schedule is dependent on industry trends, customer feedback, and our strategic vision. We will continue to evaluate SQL Server releases according to these factors as time continues.” ®
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
The database and development platform provider is announcing a number of initiatives at Microsoft Ignite this week that make it easier for customers and partners to work with MongoDB on Azure cloud.
MongoDB is extending the scope of integrations between its cloud database development platform and Microsoft Azure, a move the company says will make it easier for partners and customers to build real-time data analytics links and develop generative AI applications.
In a series of announcements today at this week’s Microsoft Ignite conference, MongoDB is integrating the MongoDB Atlas cloud database with Microsoft’s Azure OpenAI services and launching its MongoDB Enterprise Advanced database management tools on the Azure Marketplace.
MongoDB said the new integrations will provide partners and customers with greater flexibility in data development on Azure – particularly to help meet the exploding demand for data for AI and generative AI applications.
[Related: MongoDB CEO Ittycheria: AI Has Reached ‘A Crucible Moment’ In Its Development]
“I think the pace is phenomenal, things are changing daily,” said Alan Chhabra, MongoDB executive vice president of worldwide partners, speaking in an interview with CRN about the rapid growth of AI and GenAI development. He said experimentation with GenAI, especially within larger enterprises, “is through the roof.”
Despite competing with Microsoft and its Azure Cosmos database, MongoDB has been steadily expanding its alliance with Microsoft – along with its partnerships with Amazon Web Services and Google Cloud – in recent years.
Last year MongoDB extended its multi-year strategic partnership with Microsoft, committing to a broad range of initiatives including close cooperation between the two companies’ sales teams and making it easier to migrate database workloads to MongoDB Atlas on Azure. That followed steps in 2022 that allowed developers to work with MongoDB Atlas through the Azure Marketplace and Azure Portal.
“Microsoft has become our fastest growing partnership,” Chhabra said, noting how MongoDB and Microsoft sales representatives cooperate in selling MongoDB for Azure, particularly for AI and GenAI development.
At the Ignite event Tuesday MongoDB announced that customers building applications powered by retrieval-augmented generation (RAG) can now select MongoDB Atlas as a vector store in the Microsoft Azure AI Foundry, combining MongoDB Atlas’s vector capabilities with generative AI tools and services from Microsoft Azure and Azure Open AI Service.
That makes it easier for customers to enhance large language models (LLMs) with proprietary data and build unique chatbots, copilots, internal applications, or customer-facing portals that are grounded in up-to-date enterprise data and context, the company said.
Chhabra said the new capabilities are designed to help customers develop and deploy GenAI applications. “It’s not easy. There’s a lot of confusion. There’s also a lot of experimentation, because everyone knows they need to use it [but] they’re not sure how.
“This integration will make it way easier and seamless for customers to deploy RAG applications leveraging their proprietary data in the combination of their LLMs,” Chhabra said.
In May MongoDB launched the MongoDB AI Applications Program (MAAP) that provides a complete technology stack, services and other resources to help businesses develop and deploy at scale applications with advanced generative AI capabilities.
Chhabra said MongoDB systems integration and consulting partners will benefit from the new integrations “because we’re making it easier for them to deploy Gen AI pilots and help them take it to production for customers.”
While large enterprises are conducting lots of AI development and experimentation in-house, Chhabra said SMBs are looking for more complete packaged AI and GenAI solutions.
“I believe there’s a large play for ISV application [developers] who are building purpose-built GenAI applications in the cloud on Azure, leveraging the MongoDB stack, leveraging our MAAP program,” Chhabra said. “So instead of customers having to build, they can buy GenAI solutions. When big companies like Microsoft work with cutting-edge growing companies like MongoDB, we make it easier for customers and partners to deploy GenAI [and] the whole ecosystem benefits.”
In another announcement at Ignite, MongoDB said users looking to maximize insights from operational data can now do so in near real-time with Open Mirroring in Microsoft Fabric for MongoDB Atlas. That connection keeps data in sync between MongoDB Atlas and OneLake in Microsoft Fabric, enabling the generation of near real-time analytics, AI-based predictions, and business intelligence reports, according to MongoDB.
And the announced launch of MongoDB Enterprise Advanced on Azure Marketplace for Azure Arc-enabled Kubernetes applications gives customers more flexibility to build and operate applications across on-premises, hybrid, multi-cloud, and edge Kubernetes environments.
Eliassen Group, a Reading, Mass.-based strategic consulting company that provides business, clinical, and IT services, will use the new Microsoft integrations to drive innovation and provide greater flexibility to their clients, MongoDB said.
“We’ve witnessed the incredible impact MongoDB Atlas has had on our customers’ businesses, and we’ve been equally impressed by Microsoft Azure AI Foundry’s capabilities. Now that these powerful platforms are integrated, we’re excited to combine the best of both worlds to build AI solutions that our customers will love just as much as we do,” said Kolby Kappes, vice president – emerging technology, at Eliassen Group, in a statement.
The new extensions to the Microsoft alliance come a little more than a month after MongoDB debuted MongoDB 8.0, a significant update to the company’s core database that offered improved scalability, optimized performance and enhanced enterprise-grade security.
Article originally posted on mongodb google news. Visit mongodb google news
MMS • Meryem Arik
Article originally posted on InfoQ. Visit InfoQ
Transcript
Arik: I’m Meryem. I’m co-founder and CEO of TitanML. My background is, I was a physicist, originally, turned banker, then turned AI, but always really interested in emerging tech. We at TitanML built the infrastructure to make serving LLMs efficiently, much better. I’m going to frame today through a conversation that I had at a wedding last summer. No one really understands what we do, at least they didn’t before ChatGPT came out. They’re starting to now. I always find myself having to have this conversation over again. Fortunately, it wasn’t actually me having this conversation. It was my co-founder who I was at the wedding with, we’re all university friends. This is the conversation. Russell, he’s also a friend of mine from university. He’s a data scientist at a hedge fund. Really smart guy. This is Jamie. He is my co-founder. He’s our chief scientist. He essentially is the person that makes our inference server really fast.
Outline
What I’m going to do is I’m firstly going to explain why LLM deployment is hard, because a lot of people don’t necessarily appreciate that it is. Then I’m going to give an assortment, I think it’s seven, that I landed on, tips, tricks, and techniques for better LLM deployments.
Why is LLM (AI) Deployment Hard?
We’ll start with this conversation. Typically, it’s like, what have you been up to? Then he’s like, I’ve been working on making LLM serving more easy. Then he says, is LLM deployment even hard, don’t I just call the OpenAI API? Then he’s like, sort of. Because everyone, when they think of LLMs, just thinks of OpenAI. APIs are really easy to call. You might be like, why is she even here talking? I can’t just call the OpenAI API. Everyone here knows how to do that. However, there are more than one ways that you can access LLMs. You can use hosted APIs. I have a bunch of them here, OpenAI, Cohere, Anthropic, AI21 Labs. These are all situations where they’ve done the hosting for you and they’ve done the deployment for you. All you have to do is call into them. I don’t want to minimize it too much, because there’s still complexity you have there. You still have to do things like hallucination reduction, but they’ve done a lot of the heavy lifting. For a lot of use cases, you might want to self-host. This is when you’re calling into like a Mistral, or you’re hosting a Llama, or one of the others. Essentially, you’re hosting it in your own environment, whether that’s VPC or on-prem environment.
He’s like, but why would I want to self-host anyway? To which we say, lots of reasons. There’s broadly three reasons why you might want to self-host. Firstly, there’s decreased cost at scale. It is true that if you’re just doing proof of concepts, then OpenAI API based models are much cheaper. If you’re deploying at scale, then self-hosting ends up being much cheaper. Why does it become much cheaper? Because you only have one problem to solve, which is your particular business problem. You’re able to use much smaller models to solve the same problem. Whereas OpenAI, they’re hosting a model that has to solve both coding and also writing Shakespeare, so they have to use a much bigger model to get the same output.
At scale, it’s much cheaper to use self-hosted models. Second reason why you might want to self-host is you have improved performance as well. When you’re using a task specific LLM, or you fine-tuned it, or you’ve done something to make it very narrow to your task, you end up typically getting much better performance. Here’s a couple of snippets from various blogs, although I think they’re a bit old now, but the point still stands. Then the third reason, which is why most of our clients self-host, which is privacy and security. If you’re part of a regulated industry maybe for GDPR reasons, or your compliance team, then you might have to self-host as well. These are the three main reasons why you should self-host. If these aren’t important to you, use an API.
Typically, we find that the reasons why enterprises care about open source, and I have, I think, a couple graphs from a report by the VC, a16z. The three main reasons are control, customizability, and cost. The biggest one by far is control. Being able to have that AI independence, that if OpenAI decides to fire its CEO again, that you will still have access to your models, which is important, especially if you’re building really business important applications. The majority of enterprises also seem to agree that these reasons are important to them. The vast majority of enterprises, apart from 18%, expect to shift to open source, either now or when open sources matches the performance of a GPT-4 quality model. If you are looking to self-host, you are very much not alone, and most enterprises are looking to build up that self-hosted capability.
Russell, he works at a hedge fund, he’s like, privacy is really important for my use case, so it makes sense to self-host. How much harder can it really be? I hear this all the time, and it infuriates me. The answer is a lot harder. You really shouldn’t ignore the complexity that you can’t see. When you call an API based model, you benefit from all of the hard work that their engineers have done under the hood to build that inference and serving infrastructure. In fact, companies like OpenAI have teams of 50 to 100 managing this infra. Things like model compression, like Kubernetes, batching servers, function calling, JSON forming, runtime engines, are all the things you don’t have to worry about when you’re using the API based model, but you do suddenly have to worry about when you’re self-hosting.
He’s like, but I deploy ML models all the time. You might have been deploying XGBoost models or linear regression models in the past. How much harder can it really be to deploy these LLMs? To which we say, do you know what the L stands for? It’s way harder to deploy these models. Why? The first L in LLM stands for large language model. I remember when we started the company, we thought a 100 million parameter BERT model was large. Now a 7 billion parameter model is considered small, but that is still 14 gig, and that is not small. GPUs are the second reason why it is much harder. GPUs are much harder to work with than CPUs. They’re much more expensive, so using them efficiently really matters. Doesn’t really matter if you don’t use your CPUs super efficiently, because they’re a couple orders of magnitude cheaper.
That cost, latency, performance tradeoff triangle that we sometimes talk about is really stark with LLMs in a way that it might not have been previously. The third reason why it’s really hard is the field is evolving crazy fast. Half of the techniques that we use to serve and deploy and optimize models didn’t exist a year ago. Another thing that I don’t have here, but maybe it’s worth mentioning, is also the orchestration element. Typically, with these large language model applications, you have to orchestrate a number of different models. RAG is a perfect example of this. You have to orchestrate in the very classic sense, an embedding model and a generation model. If you’re doing state of the art RAG, you’ll probably need a couple models for your parses, maybe an image model and a table model, and then you’ll need a reranker. Then you end up with five or six different models. That gets quite confusing. Plus, there’s all the other reasons why deploying applications is hard, like scaling and observability.
Tips to Make LLM Deployment Less Painful
He then says something like, that sounds really tricky. What can I do? Then Jamie says, “Luckily, Meryem has some tips and tricks that make navigating LLM deployment much easier.” That’s what exactly he said. We’ll go through my tips to make LLM deployment less painful. It’ll still suck, and it’ll still be painful, it might be less painful.
1. Know Your Deployment Boundaries
My first tip is that you should know your deployment boundaries. You should know your deployment boundaries when you’re building the application. Typically, people don’t start thinking about their deployment boundaries until after they’ve built an application that they think works. We think that you should spend time thinking about your requirements first. It’ll make everything else much easier. Thinking about stuff like, what are your latency requirements? What kind of load are you expecting? Are you going to be deploying an application that might have three users at its peak, or is this going to be the kind of thing like DoorDash, where you’re deploying to 5 gazillion users? What kind of hardware do you have available? Do you need to deploy on-prem, or can you use cloud instances? If you have cloud instances, what kind of instances do you have to have?
All of these are the kind of things that you should map out before. You might not know exactly, so it’s probably a range. It is acceptable if my latency is below a second, or above X amount. It’s just good things to bear in mind. Other things that I don’t have here is like, do I need guaranteed JSON outputs? Do I need guaranteed regex outputs? These are the kinds of things that we should bear in mind.
2. Always Quantize
If you have these mapped out, then all of the other decisions will be made much easier. This goes on to my next point, which is, always quantize. I’ll tell you why it links to my first point earlier. Who knows who Tim Dettmers is? This guy is a genius. Who knows what quantization is? Quantization is essentially model compression. It’s when you take a large language model and you reduce the precision of all of the weights to whatever form you want. 4-bit is my favorite form of quantization, going from an FP32. The reason why it’s my favorite is because it’s got a really fantastic accuracy compression tradeoff. You can see here, in this we have accuracy versus model bits, so the size of the model. Let’s say the original is FP16. It’s actually not, it’s normally 32.
That’s your red line there. We can see that when we compress the model down, we’ll go 10 to the 10, for a given resource size, you can see that the FP16, red line, is actually the worst tradeoff. You’re way better off using a FP8 or an INT4 quantized model. What this graph is telling you is that for a fixed resource, you’re way better off having a quantized model of the same size than the unquantized model. We start with the infra and we work backwards. Let’s say we have access to L40S, and we have that much VRAM. Because I know my resources that I’m allowed, I can look at the models that I have available to me, and then work backwards. I have 48 gigs of VRAM. I have a Llama 13 billion, so that’s 26. That’s all good. That fits. I have a Mixtral which is current state of the art for open-source models. That’s not going to work.
However, I have a 4-bit quantized Mixtral which does fit, which is great. I now know which models I can even pick from, and I can start experimenting with. That graph that I showed you earlier with Tim Dettmers, that tells me that my 4-bit model will be better performing, probably. Let’s say my Llama was also the same size, my 4-bit model will be better performing than my Llama model, because my model retains a lot of that accuracy from when it was really big and compressed down. We start with our infra and work backwards. We essentially find the resources that we can fit, and then find the 4-bit quantized model that’ll fit in those resources. The chances are that’s probably the best accuracy that you can get for that particular model.
3. Spend Time Thinking About Optimizing Inference
Tip number three, spend a little bit of time thinking about optimizing inference. The reason why I tell people spend just a little bit of time optimizing inference is because the naive things that you would do when you’re deploying these models is typically completely the wrong thing to do. You don’t need to spend a huge amount of time thinking about this, but just spending a little bit of time can make multiple orders of magnitude difference to GPU utilization. I can give one example of this, batching strategies. Essentially, batching is where multiple requests are processed in parallel. The most valuable thing when you’re deploying these models that you have is your GPU utilization. GPUs, I think I said earlier, are really expensive, so it’s very important that we utilize them as much as we can. If I’m doing no batching, then this is more or less the GPU utilization that I’ll get, which is pretty bad. The naive thing to do would either be to do no batching or dynamic batching.
Dynamic batching is the standard batching method for non-Gen AI applications. It’s the kind of thing that you might have built previously. The idea is that you wait a small amount of time before starting to process a request. Group any of those requests that arrive during that time, and then process them together. In generative models, this leads to a downtime in utilization. You can see that it starts really high and then it goes down, because users will get stuck in the queue waiting for longer generations to finish. Dynamic batching is something that you might try naively, but it actually tends to be a pretty bad idea. If you spend a little bit of time thinking about this, you can do something like continuous batching. This is what we do.
This is a GPU utilization graph that we got a couple weeks ago, maybe. This the state-of-the-art batching technique designed for generative models. You let incoming requests interrupt in-flight requests in order to keep that GPU utilization really high. You get much less queue waiting, and much higher resource utilization as well. You can see going from there to there is maybe one order of magnitude difference in GPU costs, which is pretty significant. I’ve not done anything to the model, nothing will impact accuracy there.
Second example I can give you is with parallelism strategies. For really large models, you often can’t inference them on a single GPU. For example, a Llama 70 billion, or a Mixtral, or a Jamba, for example, they’re really hefty models. Often, I’ll need to split them across multiple GPUs in order to be able to inference them. You need to be able to figure out how you’re going to essentially do that multi-GPU inference. The naive way to do this, and actually this is probably the most popular way to do this, in fact, common inference libraries like Hugging Face’s Accelerate, does this, is you split the model layer by layer. It was a 90-gigabyte model. I have 30 on one, 30 on one, and then 30 on the third GPU. At any one time only one GPU is active, which means that I’m paying for essentially three times the number of GPUs that I’m actually using at any one time.
That’s just because I split them in this naive way, because my next GPU is having to wait for my previous GPU. That’s really unideal. This is what happens in Hugging Face Accelerate library, if you want to look into that. Tensor Parallel is what we think is the best one, which is, you essentially split the model lengthwise so that every GPU can be fully utilized at the same time for each layer, so it makes inference much faster, and you can support arbitrarily large models as well with enough GPUs. Because at every single point, all of your GPUs are firing, you don’t end up paying for that extra resource. In this particular example, we’ve got, for this particular model, a 3x model, a GPU utilization improvement. Combining that with the order of magnitude we had before, that’s a really significant GPU utilization improvement. It’s not a huge amount of time to think about this, but if you just spend that little bit of time, then you might end up improving what you can put on those GPUs.
4. Consolidate Infrastructure
What have I done so far? I’ve done, think about your deployment requirements, quantize, inference optimization. Fourth one is, consolidate your infrastructure. Gen AI is so computationally expensive that it really benefits from consolidation of infrastructure, and that’s why central MLOps teams like Ian runs, make a lot of sense. For most companies, ML teams tend to work in silos, and therefore are pretty bad at consolidation of infrastructure. It wasn’t really relevant for previous ML sources. Deployment is really hard, so it’s better if you deploy once, you have one team managing deployment, and then you maintain that, rather than having teams individually doing that deployment, because then each team individually has to discover that this is a good tradeoff to make. What this allows is it allows the rest of the org to focus on that application development while the infrastructure is taken care of.
I can give you an example of what this might look like. I will have a central compute infrastructure, and maybe as a central MLOps team, I’ve decided that my company can have access to these models, Llama 70, Mixtral, and Gemma 7B. I might periodically update the models and improve the models. For example, when Llama 7 comes out, instead of Llama 2, I might update that. These are the models that I’ll host centrally. Then all of those little yellow boxes are my application development teams. They’re my dispersed teams within the org. Each of them will be able to get access to my central compute infrastructure, and personalize it in the way that works for them. One of them might add a LoRA, which is essentially a little adapter that you can add to your model when you fine-tune it. It’s very easy to firstly do, and then also add into inference. Then maybe I’ll add RAG as well. RAG is when we give it access to our proprietary data, so our vector store, for example.
I have each of my application teams building LoRA’s RAGs, LoRA’s RAGs. Maybe I don’t even need LoRAs, and I can just do prompt engineering, for example, and my central compute is all managed by one team, and it’s just taken care of. The nice thing about this is what you’re doing is you’re giving your organization the OpenAI experience, but with private models. If I’m an individual developer, I don’t think about the LLM deployment. Another team manages it. It sits there, and I just build applications on top of the models we’ve been given access to. This is really beneficial. Things to bear in mind. Make sure your inference server is scalable. LoRA adapter support is super important if you want to allow your teams to fine-tune. If you do all of this, you’ll get really high utilization of GPUs. Because, remember, GPU utilization is literally everything. I say literally everything. There’s your friends, and there’s your family, and then there’s GPU utilization. If we centrally host this compute, then we’re able to get much higher utilization of those very precious GPUs.
I can give you a case study that we did with a client, RNL, it’s a U.S. enterprise. What they had before was they had four different Gen AI apps. They were pretty ahead at the time. They built all of this last year. Each app was sitting on its own GPU, because they were like, they’re all different applications. They’ve all got their own Embedders, their own thing going on. They gave them each their own GPUs, and as a result, got really poor GPU utilization, because not all the apps were firing all the time. They weren’t all firing at capacity. What we did with them is something like this. It doesn’t have to be Titan, it can be any inference server. They had Mixtrals and Embedders, essentially, is all they had. We hosted a Mixtral and an Embedder on one server and exposed those APIs. The teams then built on top of those APIs, sharing that resource. Because they were sharing the resource, they could approximately half the number of GPUs that they needed. We were able to manage both the generative and the non-generative in one container. It was super easy for those developers to build on top of. That’s the kind of thing that if you have a central MLOps team, you can do, and end up saving a lot of those GPU times.
5. Build as if You Are Going to Replace the Models Within 12 Months
My fifth piece of advice is, build as if you’re going to replace the models within 12 months, because you will. One of our clients, they deployed their first application with Llama 1 last year. I think they changed the model about four times. Every week they’re like, this new model came out. Do you support it? I’m like, yes, but why are you changing it for the sixth time? Let’s think back to what state of the art was a year ago. A year ago, maybe Llama had come out by then, but if before that, it might have been the T5s. The T5 models were the best open-source models. What we’ve seen is this amazing explosion of the open-source LLM ecosystem. It was all started by Llama and then Llama 2, and then loads of businesses had built on top of that.
For example, the Mistral 70B was actually built with the same architecture that Llama was. We had the Falcon out of the UAE. We had Mixtral by Mistral. You have loads of them, and they just keep on coming out. In fact, if you check out the Hugging Face, which is where all of these models are stored, if you check out their leaderboard of open-source models, the top model changes almost every week. Latest and greatest models come out. These models are going to keep getting better. This is the performance of all models, both open source and non-open source, as you can see the license, proprietary or non-proprietary. The open-source models are just slowly scaling that leaderboard. We’re starting to get close to parity between open source and non-open source. Right now, the open-source models are there or thereabouts, with GPT-3.5. That was the original ChatGPT that we were all amazed by.
My expectation is that we’ll get to GPT-4 quality within the next year. What this means is that you should really not wed yourself to a single model or a single provider. Going back to that a16z report that I showed you earlier, most enterprises are using multiple model providers. They’re building their inference stack in a way that it’s interoperable, in a way that if OpenAI has a meltdown, I can swap it out for a Llama model. Or, in a way that if Claude is now better than GPT-4 as it is now, I can swap them really easily. Building with this interoperability in mind is really important. I think one of the greatest things that OpenAI has blessed us with is not necessarily their models, although they are really great, but they have actually counterintuitively democratized the AI landscape, not because they’ve open sourced their models, because they really haven’t, but because what they’ve done is they’ve provided uniformity of APIs to the industry. If you build with the OpenAI API in mind, then you’ll be able to capture a lot of that value and be able to swap models in and out really easily.
What does this mean for how you build? API and container-first development makes life much easier. It’s fairly standard things. Abstraction is really good, so don’t spend time building custom infrastructure for your particular model. The chances are you’re not going to use it in 12 months. Try and build more general infra if you’re going to. We always say that at this current stage where we’re still proving value of AI in a lot of organizations, engineers should spend their time building great application experiences rather than fussing with infrastructure. Because right now, for most businesses, we’re fortunate enough to have a decent amount of budget to go and play and try out this Gen AI stuff.
We need to prove value pretty quickly. We tend to say, don’t work with frameworks that don’t have super wide support for models. For example, don’t work with a framework that only works with Llama, for example, because it’ll come back to bite you. Whatever architecture you pick or infrastructure you pick, making sure that when Llama 3, 4, 5, Mixtral, Mistral comes out, they will help you adopt it. I can go back to this case study that I talked about before. We built this in a way, obviously, that it’s super easy to swap that Mixtral for Llama 3, when Llama 3 comes out. For example, if a better Embedder comes out, like a really good Embedder came out a couple weeks ago, we can swap that out easily too.
6. GPUs Look Really Expensive, Use Them Anyway
My sixth one, GPUs look really expensive. You should use them anyway. GPUs are so phenomenal. They are so phenomenally designed for Gen AI and Gen AI workloads. Gen AI involves doing a lot of calculations in parallel, and that happens to be the thing that GPUs are incredibly good at. You might look at the sticker price and be like, it’s 100 times more expensive than a CPU. Yes, it is, but if you use it correctly and get that utilization you need out of it, then you’ll end up processing orders of magnitude more, and per request, it will be much cheaper.
7. When You Can, Use Small Models
When you can, use small models. GPT-4 is king, but you don’t get the king to do the dishes. What the dishes are: GPT-4 is phenomenal. It’s a genuinely remarkable piece of technology, but the thing that makes it so good is also that it is so broad in terms of its capabilities. I can use the GPT-4 model to write love letters, and you can use it to become a better programmer, and we’re using the exact same model. That is mental. That model has so many capabilities, and as a result, it’s really big. It’s a huge model, and it’s very expensive to inference. What we find is that you tend to be better off using GPT-4 for the really hard stuff that none of the open-source models can do yet, and then using smaller models for the things that are easier. You can massively reduce cost and latency by doing this. When we talked about that latency budget that you had earlier, or those resource budgets that you had earlier, you can go a long way to maximizing that resource budget if you only use GPT-4 when you really have to.
Three commonly seen examples are like RAG Fusion. This is when your query is edited by a large language model, and then all queries are searched against, and then the results are ranked to improve the search quality. For example that, you can get very good results by not using GPT-4, only using GPT-4 when you have to. You might, for example, with RAG, use a generative model just to do the reranking, so just check at the end that the thing that my Embedder said was relevant, was really relevant. Small models, especially fine-tuned models for things like function calling are really good. One of the really common use cases for function calling is if I need my model to output something like JSON or regex, there are broadly two ways that I could do this. Either I could fine-tune a much smaller model, or I could add controllers to my small model. A controller is really cool. A controller is essentially when, if I’m self-hosting the model, I can ban my model from saying any tokens that would break a JSON schema or that would break a regex schema that I don’t want. Stuff like that, which actually is majority of enterprise use cases, you don’t necessarily need to be using those API based models, and you can get really immediate cost and latency benefits.
Summary
Figure out your deployment boundaries and work backwards. Because you know your deployment boundaries, you know that you should pick the model that when you’ve quantized it down is that size. Spend time thinking about optimizing inference so that can make the difference of genuinely multiple orders of magnitude. Gen AI benefits from consolidation of infrastructure, so try to avoid having each team being responsible for their deployments, because it will probably go wrong. Build as if you’re going to replace your model in 12 months. GPUs look expensive, but they’re your best option. When you can, you’ll use small models. Then we said all of this to Russell, and then he was like, “That was so helpful. I’m so excited to deploy my mission critical LLM app using your tips.” Then we said, “No problem, let us know if you have any questions”.
Questions and Answers
Participant 1: You said, build for flexibility. What are the use cases for frequent model replacements? The time and effort we have spent on custom fine-tuning, on custom data, will have to be repeated? Do you have any tips for that in case of frequent model replacements?
Arik: When would you want to do frequent model replacement? All of the time. With the pace of LLM improvement, it’s almost always the case that you can get better performance, literally just by swapping out a model. You might need some tweaks to prompts, but typically, just doing a one-to-one switch works. For example, if I have my application built on GPT-3.5 and I swap it out for GPT-4, even if I’m using the same prompt, the chances are my model performance will go up, and that’s a very low effort thing to do. How does that square with things like the engineering effort required to swap? If it is a month’s long process, if it’s not a significant improvement, then you shouldn’t make that switch. What I would suggest is trying to build in a way where it’s not a month’s long process and actually can be done in a couple days, because then it will almost always be worth that switch.
How does that square as well with things like fine-tuning? I have a spicy and hot take, which is, for the majority of use cases, you don’t need to fine-tune. Fine-tuning was very popular in deep learning of a couple years ago. As the models are getting better, they’re also better at following your instructions as well. You tend to not need to fine-tune for a lot of use cases, and can just get away with things like RAG, prompt engineering, and function calling. That’s what I would tend to say. If you are looking for your first LLM use case, speaking of swapping models, a really good first LLM use case is to just try and swap out your NLP pipelines. A lot of businesses have preexisting NLP pipelines. If you can swap them for LLMs, typically, you’ll get multiple points of accuracy boost.
Participant 2: How do you see the difference for the on-prem hardware, between enterprise grade hardware and consumer maxed out hardware, because I chose to go for consumer maxed out hardware because you go up to 6000 meg transfers on the memory, and the PCI lanes are faster.
Arik: Because people like him have taken all the A100s, when we do our internal development, we actually do it on 4090s, which is consumer hardware. They’re way more readily accessible, much cheaper as well than getting those data center hardware. That’s what we use for our development. We’ve not actually used consumer grade hardware for at-scale inference, although there’s no reason why it wouldn’t work.
If it works for your workload. We use it as well. We think they’re very good. They’re also just much cheaper, because they’re sold as consumer grade, rather than data center grade.
Participant 3: You’re saying that GPU is a whole and it’s most important. I’m a bit surprised, but maybe my question will explain. I made some proof of concept with small virtual machines with only CPUs, and I get quite good results with few requests per second. I did not ask myself about scalability. I’m thinking about how much requests shall we switch to GPUs?
Arik: Actually, maybe I was a bit strong on the GPU stuff, because we’ve deployed on CPU as well. If the latency is good enough, and that’s typically the first complaint that people get, is latency, then CPU is probably fine. It’s just that when you’re looking at economies of scale and when you’re looking at scaling up, they will almost always be more expensive per request. If you have a reasonably low number of requests, and the latency is fine, then you can get away with it. I think one of our first proof of concepts with our inference server was done on CPU. One thing that you will also know is that you’ll be limited in the size of model that you can go up to. For example, if you’re doing a 7 billion quantized, you can probably get away with doing CPU as well. I think GPU is better if you are starting from a blank slate. If you’re starting from a point where you already have a massive data center filled with CPUs and you’re not using them otherwise, it is still worth experimenting whether you can utilize them.
Participant 4: I have a question regarding the APIs that are typically used, and of course, it’s OpenAI’s API that are typically used also by applications. I also know a lot of people who do not really like the OpenAI API. Do you see any other APIs around? Because a lot of people are just emulating them, or they are just using it, but no one really likes it.
Arik: When you say they don’t like it, do they not like the API structure, or don’t like the models?
Participant 4: It is about the API structure. It is about documentation. It is about states, about a lot of things that happen that you can’t fully understand.
Arik: We also didn’t really like it, so we wrote our own API that’s called as our inference server, and then we have an OpenAI compatible layer, because most people are using that structure. You can check out our docs and see if you like that better. I think because it was the first one to really blow up, it’s what the whole industry converged to when it comes to that API structure.
See more presentations with transcripts