Amazon EC2 I4g instances are now available in AWS Asia Pacific (Sydney) Region

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Starting today, storage optimized Amazon Elastic Compute Cloud (EC2) I4g instances powered by AWS Graviton2 processors and 2nd generation AWS Nitro SSDs are now available in the AWS Asia Pacific (Sydney) Region.

I4g instances are optimized for workloads performing a high mix of random read/write operations and requiring very low I/O latency and high compute performance, such as transactional databases (MySQL, and PostgreSQL), real-time databases including in-memory databases, NoSQL databases, time-series databases (Clickhouse, Apache Druid, MongoDB) and real-time analytics such as Apache Spark.

Get started with I4g instances by visiting the AWS Management Console, AWS Command Line Interface (CLI), or AWS SDKs. To learn more, visit the I4g instances page.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Tessell Named to CRN’s 2025 Big Data 100 List for Its AI-Powered Multi-Cloud DBaaS Platform

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Company Recognized as a Top Innovator in Database Systems Helping Enterprises Simplify and Accelerate Data Modernization in the AI Era

SAN FRANCISCO, April 23, 2025 (GLOBE NEWSWIRE) — Tessell, the leading next-generation multi-cloud database-as-a-service (DBaaS) that enables enterprises and startups to accelerate database, data, and application modernization journeys at scale, today announced it has been named to the CRN® 2025 Big Data 100, an annual list published by CRN, a brand of The Channel Company, that recognizes technology vendors delivering innovation and growth in big data, analytics, and data management.

This year’s list arrives amid an explosion of global data creation-forecasted to reach 394 zettabytes by 2028, according to Statista-as businesses struggle to keep up with the volume, complexity, and performance requirements of modern data ecosystems. Tessell was recognized in the Database Systems category for its AI-powered, cloud-native platform that simplifies and supercharges the deployment and management of popular database engines like PostgreSQL, MySQL, SQL Server, Oracle, MongoDB, and Milvus across any cloud environment.

“Being named to the CRN Big Data 100 reflects the momentum we’ve built in enabling enterprises to overcome the legacy barriers of cloud database management,” said Bakul Banthia, Co-Founder of Tessell. “We’re empowering our customers to transition from fragmented, high-cost environments to a unified, intelligent data platform built for performance, resilience, and AI-driven scale.”

Tessell’s inclusion highlights the platform’s growing traction among enterprises modernizing their infrastructure and adopting AI-centric workflows. On April 9th, Tessell announced a $60 million Series B led by WestBridge Capital, with participation from Lightspeed Venture Partners, B37 Ventures, and Rocketship.vc. The funding is being used to accelerate go-to-market expansion and enhance AI-driven features-including vector search, conversational query interfaces, and intelligent workload automation.

Get the latest news


delivered to your inbox
Sign up for The Manila Times newsletters

By signing up with an email address, I acknowledge that I have read and agree to the Terms of Service and Privacy Policy.

Key Capabilities Driving Recognition:

  • Conversational Data Management (CoDaM): Natural-language interaction with data systems, turning any business user into a data user.
  • Vector Extension & AI-Readiness: Enhanced support for generative AI workloads with integrated vector search on popular database engines.
  • Unified Control Plane: One interface to deploy, manage, and govern databases across multiple clouds and engines.
  • Zero RPO/RTO: Built-in disaster recovery and high availability for mission-critical workloads.
  • Enterprise Security & Compliance: Robust guardrails and policy-driven access controls for regulated industries.
  • 10x Performance, Fraction of the Cost: Patent-backed innovations eliminate IOPS bottlenecks while reducing TCO.

CRN’s 2025 Big Data 100 is segmented into technology categories-including database systems, analytics software, data management, observability, and cloud platforms. Tessell is featured in the Database Systems section alongside a select group of vendors leading innovation in the age of AI, automation, and intelligent data architecture.

For more information about Tessell and its DBaaS solutions, visit https://www.tessell.com/.

About Tessell

Tessell is a multi-cloud DBaaS platform redefining enterprise data management with its comprehensive suite of AI-powered database services. By unifying operational and analytical data within a seamless data ecosystem, Tessell enables enterprises to modernize databases, optimize cloud economics, and drive intelligent decision-making at scale. Through AI and Conversational Data Management (CoDaM), Tessell makes data more accessible, interactive, and intuitive, empowering businesses to harness their data’s full potential easily.

Media Contact

Len Fernandes

Firecracker PR for Tessell

[email protected]

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MongoDB (MDB) Price Target Slashed Amid Growth Challenges | MDB Stock News

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Piper Sandler has revised its price target for MongoDB (MDB, Financial), reducing it significantly from $280 to $200 while maintaining an Overweight rating on the shares. This adjustment aligns with similar cuts made across the cloud applications and analytics sector due to anticipated short-term growth challenges.

These challenges stem from several factors, including tariffs, shifting policy landscapes, and hurdles associated with the adoption of artificial intelligence. The firm notes that the software industry is experiencing a moderation in growth for the fourth consecutive year, which has impacted investor sentiment negatively.

Piper Sandler’s analysis indicates that valuation multiples in the sector have fallen to their lowest levels in seven years. However, the direct impact of tariffs on software models remains minimal.

Wall Street Analysts Forecast

1914998997236477952.png

Based on the one-year price targets offered by 34 analysts, the average target price for MongoDB Inc (MDB, Financial) is $283.92 with a high estimate of $520.00 and a low estimate of $170.00. The average target implies an
upside of 86.60%
from the current price of $152.15. More detailed estimate data can be found on the MongoDB Inc (MDB) Forecast page.

Based on the consensus recommendation from 38 brokerage firms, MongoDB Inc’s (MDB, Financial) average brokerage recommendation is currently 2.0, indicating “Outperform” status. The rating scale ranges from 1 to 5, where 1 signifies Strong Buy, and 5 denotes Sell.

Based on GuruFocus estimates, the estimated GF Value for MongoDB Inc (MDB, Financial) in one year is $432.89, suggesting a
upside
of 184.52% from the current price of $152.15. GF Value is GuruFocus’ estimate of the fair value that the stock should be traded at. It is calculated based on the historical multiples the stock has traded at previously, as well as past business growth and the future estimates of the business’ performance. More detailed data can be found on the MongoDB Inc (MDB) Summary page.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Ostrum Asset Management Acquires 1,774 Shares of MongoDB, Inc. (NASDAQ:MDB)

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Ostrum Asset Management grew its stake in MongoDB, Inc. (NASDAQ:MDBFree Report) by 460.8% during the fourth quarter, according to the company in its most recent filing with the Securities and Exchange Commission. The institutional investor owned 2,159 shares of the company’s stock after purchasing an additional 1,774 shares during the period. Ostrum Asset Management’s holdings in MongoDB were worth $503,000 as of its most recent filing with the Securities and Exchange Commission.

Several other hedge funds and other institutional investors also recently made changes to their positions in the stock. Vanguard Group Inc. raised its position in MongoDB by 0.3% in the fourth quarter. Vanguard Group Inc. now owns 7,328,745 shares of the company’s stock valued at $1,706,205,000 after purchasing an additional 23,942 shares during the last quarter. Franklin Resources Inc. raised its holdings in MongoDB by 9.7% in the 4th quarter. Franklin Resources Inc. now owns 2,054,888 shares of the company’s stock valued at $478,398,000 after buying an additional 181,962 shares during the last quarter. Geode Capital Management LLC grew its holdings in MongoDB by 1.8% during the 4th quarter. Geode Capital Management LLC now owns 1,252,142 shares of the company’s stock worth $290,987,000 after acquiring an additional 22,106 shares during the last quarter. First Trust Advisors LP raised its stake in shares of MongoDB by 12.6% during the fourth quarter. First Trust Advisors LP now owns 854,906 shares of the company’s stock valued at $199,031,000 after acquiring an additional 95,893 shares during the last quarter. Finally, Norges Bank bought a new stake in shares of MongoDB in the fourth quarter worth $189,584,000. 89.29% of the stock is owned by institutional investors.

Analysts Set New Price Targets

A number of research firms have commented on MDB. KeyCorp lowered MongoDB from a “strong-buy” rating to a “hold” rating in a research report on Wednesday, March 5th. Mizuho decreased their target price on shares of MongoDB from $250.00 to $190.00 and set a “neutral” rating for the company in a research report on Tuesday, April 15th. Monness Crespi & Hardt upgraded shares of MongoDB from a “sell” rating to a “neutral” rating in a research report on Monday, March 3rd. UBS Group set a $350.00 target price on shares of MongoDB in a research note on Tuesday, March 4th. Finally, Truist Financial decreased their price target on MongoDB from $300.00 to $275.00 and set a “buy” rating for the company in a report on Monday, March 31st. Eight equities research analysts have rated the stock with a hold rating, twenty-four have given a buy rating and one has given a strong buy rating to the company’s stock. According to MarketBeat, MongoDB presently has an average rating of “Moderate Buy” and an average price target of $299.78.

Check Out Our Latest Analysis on MongoDB

Insider Buying and Selling at MongoDB

In other MongoDB news, CFO Srdjan Tanjga sold 525 shares of the business’s stock in a transaction on Wednesday, April 2nd. The stock was sold at an average price of $173.26, for a total value of $90,961.50. Following the sale, the chief financial officer now directly owns 6,406 shares in the company, valued at $1,109,903.56. The trade was a 7.57 % decrease in their ownership of the stock. The sale was disclosed in a filing with the SEC, which is available through the SEC website. Also, CAO Thomas Bull sold 301 shares of the stock in a transaction on Wednesday, April 2nd. The stock was sold at an average price of $173.25, for a total value of $52,148.25. Following the transaction, the chief accounting officer now owns 14,598 shares of the company’s stock, valued at $2,529,103.50. The trade was a 2.02 % decrease in their position. The disclosure for this sale can be found here. Insiders sold 48,680 shares of company stock worth $11,084,027 in the last quarter. 3.60% of the stock is currently owned by company insiders.

MongoDB Trading Down 0.5 %

Shares of NASDAQ:MDB opened at $159.26 on Monday. MongoDB, Inc. has a 1-year low of $140.78 and a 1-year high of $387.19. The stock has a market cap of $12.93 billion, a PE ratio of -58.12 and a beta of 1.49. The stock’s fifty day simple moving average is $208.15 and its two-hundred day simple moving average is $252.42.

MongoDB (NASDAQ:MDBGet Free Report) last issued its quarterly earnings data on Wednesday, March 5th. The company reported $0.19 earnings per share (EPS) for the quarter, missing analysts’ consensus estimates of $0.64 by ($0.45). MongoDB had a negative return on equity of 12.22% and a negative net margin of 10.46%. The firm had revenue of $548.40 million during the quarter, compared to the consensus estimate of $519.65 million. During the same period in the previous year, the company posted $0.86 earnings per share. On average, sell-side analysts forecast that MongoDB, Inc. will post -1.78 earnings per share for the current fiscal year.

MongoDB Company Profile

(Free Report)

MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

See Also

Institutional Ownership by Quarter for MongoDB (NASDAQ:MDB)

This instant news alert was generated by narrative science technology and financial data from MarketBeat in order to provide readers with the fastest and most accurate reporting. This story was reviewed by MarketBeat’s editorial team prior to publication. Please send any questions or comments about this story to contact@marketbeat.com.

Before you consider MongoDB, you’ll want to hear this.

MarketBeat keeps track of Wall Street’s top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis. MarketBeat has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on… and MongoDB wasn’t on the list.

While MongoDB currently has a Moderate Buy rating among analysts, top-rated analysts believe these five stocks are better buys.

View The Five Stocks Here

The Next 7 Blockbuster Stocks for Growth Investors Cover

Wondering what the next stocks will be that hit it big, with solid fundamentals? Enter your email address to see which stocks MarketBeat analysts could become the next blockbuster growth stocks.

Get This Free Report

Like this article? Share it with a colleague.

Link copied to clipboard.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Capital Research Global Investors Has $128.64 Million Holdings in MongoDB, Inc. (NASDAQ:MDB)

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Capital Research Global Investors reduced its holdings in MongoDB, Inc. (NASDAQ:MDBFree Report) by 61.5% during the 4th quarter, according to its most recent filing with the Securities and Exchange Commission (SEC). The institutional investor owned 552,540 shares of the company’s stock after selling 881,000 shares during the quarter. Capital Research Global Investors owned approximately 0.74% of MongoDB worth $128,638,000 at the end of the most recent reporting period.

Several other hedge funds and other institutional investors have also recently added to or reduced their stakes in the stock. Vanguard Group Inc. raised its position in MongoDB by 0.3% in the 4th quarter. Vanguard Group Inc. now owns 7,328,745 shares of the company’s stock worth $1,706,205,000 after purchasing an additional 23,942 shares during the last quarter. Franklin Resources Inc. boosted its stake in MongoDB by 9.7% during the 4th quarter. Franklin Resources Inc. now owns 2,054,888 shares of the company’s stock worth $478,398,000 after acquiring an additional 181,962 shares during the last quarter. Geode Capital Management LLC grew its holdings in MongoDB by 1.8% during the fourth quarter. Geode Capital Management LLC now owns 1,252,142 shares of the company’s stock valued at $290,987,000 after purchasing an additional 22,106 shares during the period. First Trust Advisors LP raised its holdings in MongoDB by 12.6% in the fourth quarter. First Trust Advisors LP now owns 854,906 shares of the company’s stock worth $199,031,000 after purchasing an additional 95,893 shares during the period. Finally, Norges Bank acquired a new position in shares of MongoDB in the 4th quarter valued at $189,584,000. Institutional investors and hedge funds own 89.29% of the company’s stock.

Insiders Place Their Bets

In related news, CAO Thomas Bull sold 301 shares of the stock in a transaction that occurred on Wednesday, April 2nd. The shares were sold at an average price of $173.25, for a total transaction of $52,148.25. Following the sale, the chief accounting officer now owns 14,598 shares in the company, valued at approximately $2,529,103.50. This trade represents a 2.02 % decrease in their position. The transaction was disclosed in a legal filing with the SEC, which is accessible through this link. Also, insider Cedric Pech sold 1,690 shares of the firm’s stock in a transaction on Wednesday, April 2nd. The shares were sold at an average price of $173.26, for a total value of $292,809.40. Following the completion of the sale, the insider now directly owns 57,634 shares in the company, valued at approximately $9,985,666.84. This represents a 2.85 % decrease in their position. The disclosure for this sale can be found here. Insiders have sold 48,680 shares of company stock worth $11,084,027 in the last 90 days. Company insiders own 3.60% of the company’s stock.

MongoDB Trading Down 0.5 %

MDB stock opened at $159.26 on Monday. The business’s 50 day simple moving average is $208.15 and its 200 day simple moving average is $252.42. MongoDB, Inc. has a 1-year low of $140.78 and a 1-year high of $387.19. The firm has a market capitalization of $12.93 billion, a PE ratio of -58.12 and a beta of 1.49.

MongoDB (NASDAQ:MDBGet Free Report) last released its quarterly earnings results on Wednesday, March 5th. The company reported $0.19 earnings per share for the quarter, missing the consensus estimate of $0.64 by ($0.45). MongoDB had a negative net margin of 10.46% and a negative return on equity of 12.22%. The company had revenue of $548.40 million during the quarter, compared to the consensus estimate of $519.65 million. During the same period in the prior year, the company earned $0.86 earnings per share. As a group, sell-side analysts predict that MongoDB, Inc. will post -1.78 EPS for the current fiscal year.

Analysts Set New Price Targets

Several research analysts have weighed in on MDB shares. Barclays reduced their target price on MongoDB from $330.00 to $280.00 and set an “overweight” rating on the stock in a report on Thursday, March 6th. Wedbush decreased their price objective on MongoDB from $360.00 to $300.00 and set an “outperform” rating for the company in a research report on Thursday, March 6th. Rosenblatt Securities restated a “buy” rating and set a $350.00 target price on shares of MongoDB in a report on Tuesday, March 4th. Wells Fargo & Company cut shares of MongoDB from an “overweight” rating to an “equal weight” rating and reduced their price target for the company from $365.00 to $225.00 in a report on Thursday, March 6th. Finally, The Goldman Sachs Group dropped their price objective on shares of MongoDB from $390.00 to $335.00 and set a “buy” rating on the stock in a report on Thursday, March 6th. Eight research analysts have rated the stock with a hold rating, twenty-four have given a buy rating and one has issued a strong buy rating to the company’s stock. According to data from MarketBeat.com, MongoDB currently has a consensus rating of “Moderate Buy” and an average target price of $299.78.

Get Our Latest Stock Report on MDB

MongoDB Profile

(Free Report)

MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

Featured Articles

Want to see what other hedge funds are holding MDB? Visit HoldingsChannel.com to get the latest 13F filings and insider trades for MongoDB, Inc. (NASDAQ:MDBFree Report).

Institutional Ownership by Quarter for MongoDB (NASDAQ:MDB)

This instant news alert was generated by narrative science technology and financial data from MarketBeat in order to provide readers with the fastest and most accurate reporting. This story was reviewed by MarketBeat’s editorial team prior to publication. Please send any questions or comments about this story to contact@marketbeat.com.

Before you consider MongoDB, you’ll want to hear this.

MarketBeat keeps track of Wall Street’s top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis. MarketBeat has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on… and MongoDB wasn’t on the list.

While MongoDB currently has a Moderate Buy rating among analysts, top-rated analysts believe these five stocks are better buys.

View The Five Stocks Here

13 Stocks Institutional Investors Won't Stop Buying Cover

Which stocks are hedge funds and endowments buying in today’s market? Enter your email address and we’ll send you MarketBeat’s list of thirteen stocks that institutional investors are buying now.

Get This Free Report

Like this article? Share it with a colleague.

Link copied to clipboard.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: From “Simple” Fine-Tuning to Your Own Mixture of Expert Models Using Open-Source Models

MMS Founder
MMS Sebastiano Galazzo

Article originally posted on InfoQ. Visit InfoQ

Transcript

Galazzo: The goal of this session, based on the title, is to tell you how to create your own large language models. To have the best takeaway, you must understand what is the real goal. My real goal is to provide you tips, mistakes that I did along the path from real life. I don’t want to go deep into technical aspects. Sometimes I will tell you something, but the most important thing here is to have a large vision on these tools. I will share with you a lot of sources, links, articles to read that helped me to save a lot of time. That’s the real goal. We said that we have to create our own large language model, but why should I do that? We already have OpenAI. That needs no presentation. Why should I do it?

The first answer could be because I’m a nerd. We said that we come from real life, and as a CTO, I have to deal with cost, timing. People want your product done very well, quickly, and not expensive. That doesn’t match being a nerd that can bust a lot of time experimenting with everything he wants. To do that, we will use some sources from, of course, very large players like Microsoft, Hugging Face, GitHub. The most important thing is, do not believe that it’s easy. Sometimes it could be really painful, so it must be worth.

Fine-Tuning an LLM

When should I create my own large language model? Basically, we believe that OpenAI is God: never fails, is always perfect. It’s not true. Again, from real life, it happens that there are a lot of mistakes. Mistakes because you cannot know everything about your business. Then, if you need that the answer that comes from OpenAI or other services that you want, like Anthropic with Claude, should be error-free, let’s say, that is impossible, but for a while, let’s say error-free, you can consider training your own model. Starting with suggestions.

My choice, because I am telling you my experience and my feedback, is to use Mistral. Mistral, because based on my test, it was the best model to train. I also trained Llama3. It was great. How easy is it to train Mistral based on my experience as a comparison, today? This is a world that changes every day. Maybe next week, Llama3 will be better than Mistral. I can tell you how things are today, unfortunately. People know that version 3 of Mistral has been released, but based on my test and based on my experience, it is still not so ready and I still prefer the version 2.

How can we train a large language model like Mistral, like Llama3, whatever you want, like Gemma, or Phi from Microsoft? The best starting point is to use a technique that is named LoRA. How does it work? First, we need to understand why we need this technique, because to train a large language model is very expensive. We are talking about millions of dollars, and none of us has millions of dollars, neither the resources, nobody. Just as an example, to train a very small model that has been released just for a researcher, a toy, Microsoft Phi needed two weeks of training with 96 GPUs working all day. Consider that I pay one GPU more than €2,000 per month. You can understand how expensive it could be to train a large language model. How can we save money? Using some techniques like LoRA.

To simplify a lot, we have to imagine a large language model like a big matrix. It is not that, but let’s imagine to simplify that it is like a big matrix. We said that to train the whole matrix is too expensive. Then, we use a technique that is named LoRA. How does it work? It leverages a math property that basically multiplies two vectors, we have a matrix. The basic idea is, instead of training the whole matrix, what happens if I train just one vector, the A and B, and then multiply it, the resulting matrix is added as an overlay over the original matrix or the base model. You can understand that you have reduced a lot the number of parameters that you have to train. It means that I can train a model like Llama3, Mistral, that is very huge, it is comparable to ChatGPT, in two days, three days, if I really want to do the best. It’s saving a lot of money.

To do that, I suggest using libraries from Hugging Face that are named Diffusers, and that’s the configuration that you need. Why am I here? To tell about that slide. As a joke, I said that that slide cost my company, €20,000. Just that. Why? Because I spent a month understanding how to move parameters, how they work. This is just one slide, it’s just one parameter, but to understand the best value, I spent a lot of time and a lot of money. What is the r-value that you must understand when training with the LoRA technique? We said here, r is the dimension of the vector that I want to train. If that dimension is just 1, it means that it’s a vector, of course, but if it’s bigger than 1, it’s a matrix, and multiplying them, you get a matrix already.

The more parameters you train, of course, the better it is. That’s false. It’s at most true, because I realized that when the parameter is between 32, at least 64, you have a good result, but if you increase that value too much, you go against overfitting problems. It’s not always true that the bigger it is, the better it is, in this case. Another very important parameter that I struggled with a lot, is the alpha value. You need just that to do a good job. What is it? To simplify a lot, you have to imagine that it’s a parameter that explains how much emphasis to put when adding what I trained over the base model, because we said that LoRA techniques train the vectors multiplying, and what I get is added over the original model. It’s a kind of damping factor.

If it’s greater than 1, it means that you put much more emphasis on what you trained compared to the base model. If less than 0, I guess that you can understand. It’s easy. Again, do not exaggerate. It’s not true that the bigger it is, the better it is. I use a value that is a factor of 2. In this case, it’s 64 because the r-value is 32, so it’s a factor of 2. Do not exaggerate. No more than a factor of 2. That’s my suggestion.

We said that training a LoRA model, I can specialize a large language model with a specific task. Maybe we say that I’m for lawyer or marketing, whatever you want. If it is just a matrix that I add over the base model, what if I could have the possibility to train more than one LoRA on a specific task and swapping the LoRA in real time? It means that I can increase easily the power of my model, by a lot. You can do that, of course. It’s very complex. Don’t do it from scratch. I can suggest that this library works perfectly. Then, if you like this solution, use that library, in my opinion.

Merge Models – (Franken Models)

LoRA is great, but I don’t like swapping models too much. Another technique that I like so much is merging models. I get better results merging already trained models than LoRA. I’m not saying that LoRA is not good. LoRA is great. I use it a lot. Here is another technique. What is the idea? The idea is to take different models already trained by someone else, you can download on repositories that you can find everywhere, and merge them. I did it with images. The result was impressive. There are two tools. The most famous is mergekit, of course. There is another one that I started to prefer over mergekit, it’s named safetensors-merge-supermario. It works. Here, as you can see, just a line of python command and you can merge different models and you have the magic, no training, nothing.

Of course, what is the difference compared to LoRA? With LoRA, you have to own your own database. You have to spend three days, one week, I don’t know how much of the cost of GPU, but you have the most accurate result for your task. When you merge models from someone else, the cost is zero, but of course the model is trained by someone else, so you cannot complain that it’s not perfect for your task. For images, it’s the best solution. I saved a lot of money in image creation. Maybe for natural language processing, I suggest LoRA as a technique, because it’s much more accurate.

There are different techniques that you can use while merging different models. The easiest is when you have models with the same architecture and the same configuration. That’s easy. Just two minutes of work. You can use these techniques that are a bit different on how they merge the weights between the two models. This is the easiest case. Different is when we have the same architecture, so the same model, but different initialization. Because maybe someone else changed some parameters a little bit. Both models are Llama3 or Stable Diffusion.

It’s the same architecture, but maybe someone else changed something, some layers, and so they do not match perfectly. You need other techniques. In this case, here are the sources of the method that you need to use to merge. The worst case is when we have totally different models, different initializations. Here you have to do so much work, and I do not suggest to use merging when you have totally different models. It’s too much work, and, in my opinion, isn’t worth. If you want to try it, you are free to do that.

Moe (Mixture of Experts)

Another technique is Mixture of Experts. We said that we fine-tune a model because we want that that model has a task specialization, and LoRA works great. Even if with LoRA we said, ok, we can swap if we want, we can change on the fly the task, the LoRA that is specialized on telling stories, and the LoRA that is specialized on math or developing code, let’s say. That’s great, but we lose something. Because, for example, if I want to translate something, and at the same time I want to do a summarization. If I use a LoRA that is specialized in summarization, everything will work, but is not able to leverage the knowledge of the task of the model that is specialized on translation and that is specialized on summarization, because they do not talk to each other.

With this technique, we take different models of, of course, the same architecture, that’s very important, you cannot mix models of different architectures, different initialization, and we create an array of experts with an addition of some layers and some gates, like a switch between layers. We can merge all of them in order to allow the flow of the query to follow the path of the best model or the best route that has the knowledge to solve that problem. Of course, I’m doing a simplification of how it works, but that’s enough.

The good thing is that you do not activate all the weights. You have a model that can be 400 billion parameters, the sum of all models all together, so it’s very expensive to run, but with this technique, you use just a few, a portion, a branch of the model, so saving cost. It works, but the bad side is that all the models stay on the RAM, and so, despite it consuming, in terms of GPU, in terms of time of inference, 10 times less of a model of the same size anyway, the amount of RAM that you need is still huge. You cannot have everything in life.

How to create a Mixture of Experts. Again, it’s really complex, but with this tool, a few lines of code, and you can create your model in a few minutes. What you need is to create carefully that configuration file, where you say, that’s the base model, that’s the specialized model, those are the prompts, they are some examples, that says, when talking about music, please prefer that branch. When talking about code, prefer that model. This tool blends everything together, a few minutes. I tried, it works perfectly.

Multimodal Models

Now, multimodal models. You know that at the beginning, not right now, for example, ChatGPT was able to process just text, but how? What is the concept behind converting text as an input for large language models? Because LLMs is math. We say that isometrics are numbers. How do we convert text to numbers? Again, using vectors. Vectors are everywhere here. The idea is to have a dictionary, and let’s say, to simplify, each word is converted in a vector. It is not always true, because we talk about tokens. Tokens are a small portion of a word, because in this way, combining tokens, we can cover all the words of all languages, instead of having the dictionary for Italian, for English, for Hindi, whatever you want. Combining a small portion of words, you can create all words of all languages.

To simplify, we say that everything is converted into vectors. Then, these vectors are used as input of my large language model. Again, it’s a simplification, so forgive me for people that know very well how it works, but we have to focus not on the detail. Here, the idea, text has been converted into numbers using this technique. What if I would use images, audio, video as input? They are different from text. I don’t have that dictionary, but we use techniques like the convolution for images, for example. Everything is translated into vectors, again, images, audio, video, with the same representation, with the same format of the text. Having the same representation, I can use that input like it was a text. To do that is very expensive. I’m not talking about money, but yes, a little bit, because time is money. Really, you need a lot of knowledge.

Again, here there is a tool that I suggest to you that helps to save a lot of time, and then, again, money. It’s named multi-token. Maybe there are other tools. That’s my tool that I suggest to you. It works in a way that is more or less similar to the use of GPT-4o or other multimodal models. Thanks to this technique, what you have after is a model that is able to have more than one kind of input, just adding an input like that. Images, you pass the rule. The syntax is the same as the one you are accustomed to using with OpenAI, so messages, system, users, agent. Now, you have to just add the tag, images, and running just a few commands, you have a multimodal model. Now we are beginning to have a lot of tools, because we have fine-tuned our model, we can merge them, and after, we can make it multimodal.

TTS – Voice Cloning

This is a slide that is just itself, if you want to play with voice cloning. I don’t want to talk about ethical issues. I pay my mortgage with AI. I’ve dealt with artificial intelligence for more than 20 years, but even before. I know how it works. I don’t believe that it will destroy our life, that machine and so on. There was just one time that I was really scared, is when I heard for the first time a voice cloned of a friend of mine. Pay attention. It’s really powerful, but works far way better in English and with female voices. If you are a man, sorry, you need much more training. It’s a very quick project. I suggest you use it.

Performances and Optimizations

Now let’s say that we created our best model. It’s wonderful, but now we have to talk about performances and optimizations because that’s a pain. It’s a matter of user expectations because people want that your model must transfer in one second. Else, they complain, it’s so slow. That’s a point. On the other side, if you are able to optimize your model, it means less cost. Because, for example, as we will see, if I reduce the size of my model, I can use a different machine on the cloud that costs less. Again, from real life, do not believe that everything comes with magic. You need very expensive hardware, and if you can compress everything, you save a lot of money.

The first technique is pruning. Again, here are a couple of tools that I want to suggest to you. You can download, you can try, that are the ones I want to suggest. I know that there are other tools. Here, they are just mine. How does it work? You have to know that not all weights of our model are activated always. There are a lot of weights of our big metrics that we don’t know why, but are often to zero. The basic idea is, what if I try to figure out which weights are not activated or are less used than others, and simply cut. I can reduce my model, so the smaller it is, the faster it is, and then the less power I need to run it. That is how pruning works. I provided a couple of tools. Also, we have something else that, in my opinion, is much more effective, is named quantization. It allows you to save a lot of memory consumption and speed up your inference a lot. It’s named quantization.

How does it work? The idea, you know that the weights of the metrics are floating points. Floating points are 32 bits. It means 4 bytes in memory. I can say to use, for example, 16 bytes, not 32 bytes. It means half of memory consumption. There are also techniques that convert that number into an integer of 8 bits. The saving in terms of memory consumption and speed, is not linear but quadratic. You can understand that is very effective, but do not exaggerate, because of course, cutting weights, in theory, it means cutting also the accuracy. Based on my test, FP16 is amazing. When you go to integer 8, you can see performances drop down. It’s up to you. I’m not here to say that’s better than other. Everybody has his own needs. I have to say to you, be careful, don’t cut too much, and don’t reduce too much. The best option is FP16, in my opinion.

How does a large language model work? Here we’re talking about generative AI. Basically, it releases one token each time. It is the token with the highest probability based on what there is behind, more or less is how it works. To do that, you need a cache that is named KV cache. You have to imagine like another matrix that has in memory what it has already produced. For example, if you said, tell me the story of Napoleon. Napoleon Hoyden was born, so when you see that the text start to be created. To predict the next token, you have to know what has been already created. Everything is stored in this big cache. Again, we are going to simplify a lot. Here is a very good reading that I suggest to you. There is the link. Why is it important? Here, there is a simple calculation.

Simple, more or less, but you can understand. If you have a model with that amount of layers and with this kind of token that you want to produce, here is an example of how to calculate the amount of memory for KV cache that you need. What does it mean? It means that if you use default values when you run the inference, maybe the size of KV cache is not enough or is too much, so busting money. To speed up your model in terms of inference, you must pay attention on KV cache. You can set some parameters using Diffusers library, of course, but there are some tools that are really specialized on this kind of optimization.

I do not suggest to stay there and try to figure out, what is the best configuration for my model? Tools like llama.cpp, TensorRT, and vLLM do already this kind of configuration. If you want to do by yourself from scratch, don’t worry. They allow you to configure the value that you want. My suggestion is to use these tools. llama.cpp is my favorite, but it’s just an opinion.

Really, the last topic about RAG techniques. How does it work? You know that large language models have a general knowledge. Here there are resources. Here there is an algorithm that I developed. How does it work? The idea is to use a smaller model to calculate the signature of the paragraph to retrieve from the database, using a kind of concatenation. Instead of having a big model for embeddings calculation, using a smaller one, make a concatenation, and that works even better than large language models.

Am I Really Doing AI?

Am I really doing AI? A lot of people say, but this is not really AI. I’m just merging models, training a small dataset. It’s a topic that I don’t know if it makes sense to discuss. You have to solve problems in your company. You have to save money. That’s the most important thing. Not a matter for a nerd. This is a topic to discuss for sure in another stage.

Questions and Answers

Participant 1: I have a question regarding the reduction techniques you mentioned in the beginning. You gave us some ballpark numbers in terms of time reduction, but not in terms of if you actually need less GPUs. Could you give us a ballpark number saying, with this LoRA technique, for most use cases, whatever that means, you can get 90% of the performance of a completely manually tuned model with 10% of the cost, or some number like that.

Galazzo: Yes, unfortunately I cannot answer because it depends on your dataset. It’s not a matter of training itself. Because, for example, we busted a couple months in training after we figured out that there was a mistake on the dataset. Not always, it depends on the technique, but on the data.

Participant 1: Which ballpark is the cost reduction, generally, is it like you can save 95% of the cost or more like you can save 10% of the cost?

Galazzo: No, up to 90%.

Participant 1: Really large numbers.

Galazzo: Yes, it’s really powerful. This is the reason why I have to suggest to be careful because it works so well that you can lose generalization. It learns too much.

Participant 1: That’s the overfitting you mentioned.

Galazzo: It’s not a kind of overfitting. It’s a little bit different than overfitting. The model will still be able to perform, for example, the summarization, but you learn too much on your specific task that sometimes could be a problem. Not always, it depends on your task.

For example, if you trained a LoRA for translation, maybe you don’t see any differences, so you don’t lose so much generalization. For example, in my case, I trained a LoRA to have a specific JSON structure in our sphere because large language models are not able to have consistency. You say, please, I want a JSON because it must be processed by Python code or by other APIs. Never happens. Sometimes reply with a malformed JSON or sometimes with missing keys or different names. You have no idea how bad the bad guy could be. I trained to have much more consistency with the JSON reply. It worked, but I realized that was so good understanding JSON in our sphere that it lost a bit of capabilities in reasoning, for example.

Participant 1: You turned it into a generalized JSON parser that couldn’t do anything else.

Galazzo: In that case, for my company, for the project, that was my goal. It wasn’t a need to have that generalization. For me, it was good. Be careful because it’s too powerful.

Losio: As you mentioned until now, I got the idea that, if you’re OpenAI, you can lose billions a year. If you’re Microsoft, probably you can do the same. We discussed 90%, 80% saving, but what’s the minimum budget if I want to start my own project? How much money am I supposed to lose?

Galazzo: It’s not a sport for poor people. What I can tell you, for example, at home, just for playing, to research, to play, like a playground. At home, I have a computer that now, with all accessories, cost more or less €10,000. I have two GPUs, liquid-cooled, the best, but it’s just for playground. You can do something with a good GPU and with a computer, €3,000, maybe.

Participant 2: What about cloud?

Galazzo: About cloud, I can say that an A100 on Azure, for our company, cost more than €2,000 per month, just one GPU.

Participant 2: If you run it 24/7?

Galazzo: Yes, 24/7. No, not for training. For training, we use another machine that has 4X A100, 80 gigabytes of memory. It cost €10,000 per month. I turn on just when I have to do training, so two days, three days of training, and then soon, turn off, because it’s so expensive. These are GPUs, really, for poor people. Not the T4. The T4 is just for a child. An A100 is a very good GPU, but it’s not the H100. It is even more expensive. For training, you need 10 times even more of the power. Just when training, I turn on the very expensive machine.

For inference, you need a smaller GPU, and so, of course, we can use the A100. You can save money if you reserve GPUs on the cloud a lot. You are up to 70%, but you have to reserve for 3 years. That’s a very big problem. Why is it a big problem? Because that’s a market that changes each day. A big issue as a CTO is what to do. Because if I reserve an A100 for 3 years, maybe in six months, it’s already old and I cannot go back, and so I busted money. It’s not that easy to save money when talking about a GPU. Yes, you see programs for startups, programs that say save up to 80%, but it’s not that easy from real life for that reason. You need money.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Appian: Serge Tanjga Named As Chief Financial Officer – Pulse 2.0

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Appian announced the appointment of Serge Tanjga as Chief Financial Officer (CFO), effective May 27, 2025. Tanjga will report directly to Appian CEO Matt Calkins. Tanjga succeeds Mark Matheos, who became CFO of Dragos in November.

Tanjga brings over 20 years of financial experience to Appian. And he was the Senior Vice President of Finance at MongoDB where he oversaw financial planning, strategic finance, business operations, and analytics. And most recently, he served as MongoDB’s interim CFO.

Before MongoDB, Tanjga was a Managing Director at Emerging Sovereign Group (a subsidiary of The Carlyle Group). And Tanjga also held leadership positions at the Harvard Management Company and 40 North Industries.

Tanjga received a B.A. in Mathematics and Economics from Harvard College and an MBA from Harvard Business School, where he was a Baker Scholar.

Appian is a company that delivers a software platform that helps organizations run better processes that reduce costs, improve customer experiences, and gain a strategic edge. And Appian will release financial results for the first quarter ended March 31, 2025 before the U.S. financial markets open on Thursday, May 8, 2025. And the company will host a conference call and live webcast to review its financial results and business outlook.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Generative AI is reshaping the legacy modernization process for Indian enterprises

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

In an era where digital agility is no longer optional, Indian enterprises find themselves at a pivotal crossroad. With nearly 98% of enterprise applications still tethered to rigid, legacy systems, the challenge of modernization looms large—entangled in a web of technical debt, resistance to change, and the pressing demand for compliance in regulated sectors.

As India intensifies its push toward a digital-first economy, a cloud-agnostic approach is critical in transforming legacy roadblocks into scalable, AI-ready infrastructure. Boris Bialek, Field CTO at MongoDB, who brings global insight and deep technological expertise to the conversation on legacy modernization, shares with us how organizations can turn legacy challenges into launchpads for digital excellence.

Some edited excerpts:

What are the top challenges Indian enterprises face when modernizing legacy systems—be it technical debt, skill gaps, or resistance to change?
Modernizing legacy systems in India, or anywhere in the world, has historically been challenging, expensive and prone to stalling or complete failure. But one of the things we’re most excited about in 2025 is that our new AI-driven modernization process has proven it can dramatically accelerate the speed and reduce the cost of these projects.

But first, let’s look at what the challenges really are.

One of the primary obstacles enterprises face is, of course, technical debt. Outdated systems are deeply embedded in business operations, making migration complex, costly, and time-consuming. These legacy systems often have hardcoded dependencies and intricate architectures, necessitating substantial investment in re-engineering efforts.

Beyond technical debt, introducing new development processes and technologies across engineering teams remains a critical challenge. Organizations must ensure seamless adoption of AI-ready architectures while overcoming resistance to change. Legacy systems have often been in place for decades, and decision-makers fear disruptions to core operations, which slows down modernization efforts. Striking a balance between innovation and operational stability is crucial for enterprises undergoing transformation.

Given that 98% of enterprise applications in India still rely on legacy systems, how should Indian enterprises overcome the limitations of rigid relational databases, particularly in terms of scalability and innovation?

One of the most effective ways to overcome these challenges is by adopting a modern, document-based database like MongoDB. Unlike traditional RDBMS, MongoDB offers a flexible schema that allows organizations to evolve, adapt and scale. This adaptability is critical in today’s fast-paced business environment, where rapid iteration and responsiveness to market needs are key to staying competitive.

From a scalability perspective, MongoDB’s distributed architecture enables enterprises to scale horizontally, ensuring systems can handle growing workloads seamlessly—whether on-premises, in the cloud, or across hybrid environments. This is especially relevant for Indian enterprises expanding into digital-first services and real-time operations.

Moreover, MongoDB’s Application Modernization Factory (AMF) provides structured advisory and migration services, helping enterprises replatform legacy monolithic apps, rewrite core systems with modern tech stacks, and rehost databases on the cloud with MongoDB Atlas.

To move from a legacy infrastructure to a modern solution like MongoDB, enterprises must go on a modernization journey. As I mentioned earlier, AI is massively changing the dynamic of what’s possible in this area.

How is Generative AI reshaping the legacy modernization process for Indian enterprises, and what specific capabilities does MongoDB bring to the table to integrate GenAI into these transitions?
Generative AI is reshaping the legacy modernization process for Indian enterprises by streamlining application migration, reducing technical debt, and accelerating innovation. MongoDB plays a crucial role in this transformation by offering a cloud-agnostic, developer-friendly platform that integrates seamlessly with AI-driven modernization strategies. With tools like the MongoDB Modernization Factory, enterprises can migrate legacy SQL databases, transition from outdated application servers, and automate regression testing using GenAI. This significantly reduces the time and effort required for code migration, freeing up IT resources for more strategic AI-driven initiatives.

For Indian enterprises navigating large-scale modernization, MongoDB’s scalable and AI-ready architecture ensures flexibility, improved developer productivity, and compliance with regulatory requirements.

With India’s digital transformation accelerating, what is MongoDB’s strategy to capture the growing market opportunity for legacy modernization, particularly among PSUs and traditional enterprises?
For the Indian market—particularly public sector undertakings and traditional enterprises—our goal is customer focused. We want to make modernization faster, more cost-effective, and scalable—unlocking innovation and delivering better citizen and customer experiences.

Depending on the customer or the exact use case we have a number of proven methods for modernizing. More recently, we’ve combined these with the power of Generative AI to accelerate the modernization journey—intelligently assisting in rewriting legacy code, redesigning database schemas, and streamlining application migration.

As AI evolves, we foresee even more intuitive tools that will make application development and modernization easier than ever—turning India’s legacy burden into a leapfrog opportunity.

Beyond modernization, MongoDB is trusted by some of India’s most dynamic businesses and institutions. Our customer base includes names like Canara HSBC, Zepto, Zomato, and SonyLIV—reflecting the platform’s flexibility, scale, and performance across diverse use cases.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


TencentDB, MongoDB Renew Strategic Partnership for AI Data Management

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

TencentDB and MongoDB announced the renewal of their strategic partnership agreement, focusing on delivering cutting-edge data management solutions tailored for the AI era. This collaboration aims to empower global users with advanced technological innovations.

MongoDB, a leading NoSQL database, is renowned for its flexible data schema, high performance, and native distributed scalability. It dominates the NoSQL category in the DB-Engines global rankings and is widely adopted across industries such as gaming, social media, e-commerce, finance, and IoT.

Since their initial five-year collaboration in 2021, TencentDB and MongoDB have jointly expanded in the Chinese market. Leveraging Tencent’s vast user scenarios and technical innovation, Tencent Cloud enhanced MongoDB with enterprise-grade capabilities, including:

Backup and Restore: Intelligent O&M and key-based flashback for rapid recovery.

Elastic Scaling: Dynamic resource allocation to handle fluctuating workloads.

Cross-Region Disaster Recovery: Ensuring business continuity for global operations.

These enhancements have supported high-profile clients like Kuro Games’ Tides of Thunder (32 million pre-registered players), Xiaohongshu (小红书), and NIO (蔚来), optimizing stability, scalability, and cost efficiency.

The renewed partnership prioritizes AI integration, equipping Tencent Cloud with features such as full-text search and vector search to address modern application demands. These tools enable clients to build intelligent, future-proof digital solutions.

Beyond China, the collaboration will target the Asia-Pacific region and support domestic enterprises in overseas expansion. TencentDB for MongoDB offers:

Industry-leading backup/restore capabilities.

Robust security compliance frameworks.

Cross-region data synchronization for seamless global operations.

Over the past four years, Tencent Cloud contributed multiple optimizations to the MongoDB open-source community, improving user experience. Both parties emphasized their commitment to fostering a superior MongoDB ecosystem.

Li Qiang, Vice President of Tencent Group

Our partnership has delivered world-class MongoDB services while contributing to the community. We aim to further elevate the ecosystem and provide industry-leading database solutions.

Simon Eid, MongoDB’s APAC SVP

Combining Tencent’s cloud expertise with MongoDB’s robust technology accelerates innovation, particularly for gaming, automotive, and internet sectors. As AI adoption grows, our joint expertise becomes indispensable.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Google DeepMind Introduces QuestBench to Evaluate LLMs in Solving Logic and Math Problems

MMS Founder
MMS Srini Penchikala

Article originally posted on InfoQ. Visit InfoQ

Google DeepMind’s QuestBench benchmark helps in evaluating if LLMs can pinpoint the single, crucial question needed to solve logic, planning, or math problems. DeepMind team recently published an article on QuestBench which is a set of underspecified reasoning tasks solvable by asking at most one question.

Large language models (LLMs) are increasingly being applied to reasoning tasks such as math, logic and planning/coding. These applications largely assume that tasks are well-defined where all necessary information has been provided. But in real world applications, queries to LLMs are often underspecified, only solvable through acquiring missing information. Users may omit crucial details in math problems, and robots in factories might operate in environments with partial observability. In such cases, LLMs need the ability to proactively gather missing information by asking clarifying questions.

Deepmind team’s work investigates whether LLMs can identify and acquire the missing information necessary to solve reasoning tasks by generating accurate clarifying questions for underspecified reasoning tasks. The goal is to rigorously evaluate an LLM’s ability to identify the minimal necessary question to ask and quantify axes of difficulty levels for each problem.

They formalize this information-gathering problem as an underspecified Constraint Satisfaction Problem (CSP) which is a group of mathematical questions defined as a set of objects whose state must satisfy a number of constraints or limitations. The key idea is that many reasoning tasks can be modeled as determining the value of a target variable given a set of variables and constraints. A problem is underspecified if and only if the value of the target variable cannot be inferred from the given information. This formalization helps pinpoint the difference between semantic ambiguity and underspecification. Semantic ambiguity is where multiple valid interpretations exist, but each yields a solvable answer. And underspecification is when a problem is unsolvable without additional information. The scope of QuestBench effort focuses on underspecification where the user has not provided enough information for the language model to fulfill the request. This situation can arise because users may not know what information the model lacks, or what information is necessary to complete the task.

The team evaluated LLMs’ ability to address underspecification in structured reasoning tasks with a clearly defined ground truth. For each task, the model needs to ask exactly one question, allowing for reliable evaluation of LLMs’ information gathering capabilities. They also evaluated the accuracy of breadth-first-search up to a depth “n”. There are four task categories which include: 1) Logic-Q for logical reasoning tasks with one missing proposition; 2) Planning-Q for planning problems defined in Planning Domain Definition Language (PDDL), with partially observed initial states; 3) GSM-Q which are basically human-annotated grade school math problems with one missing variable assignment; and 4) GSME-Q: also human-annotated GSM-Q word problems but are translated into equations.

The datasets used for QuestBench include constructing 1-sufficient CSPs in logical reasoning (Logic-Q), planning (Planning-Q), and math (GSM-Q/GSME-Q) domains. Each problem instance is composed of a user request, the full set of question choices and a subset including correct questions. They evaluated whether the models can pick out a correct question from the question choices.

QuestBench’s evaluation included several state-of-the-art (SOTA) LLMs like GPT-4o, GPT-4-o1 Preview, Claude 3.5 Sonnet, Gemini 1.5 Pro, Gemini 2.0 Flash Thinking Experimental and open-sourced Gemma models. They tested the benchmark framework in various settings like zero-shot (ZS), chain-of-thought (CoT), and four-shot (4S) settings.

Team also conducted studies to assess LLMs’ ability to reason in the presence of sufficient information and detect whether the problem is underspecified, and found that these abilities are correlated with identifying the right question to ask in the benchmark, but to varying degrees in varying domains. SOTA and near-SOTA LLMs are relatively good at identifying missing information in simple algebra problems, but struggle with more complex tasks involving logic and planning.

In terms of specific conclusions of the study, language models demonstrated strong performance on GSM-Q and GSME-Q domains with over 80% accuracy. This could be due to these domains having a smaller number of variables and constraints, and requiring shallower search depth than the other two domains. But all the models tested struggled to perform beyond 50% on Logic-Q and Planning-Q domains. Neither chain of thought nor few-shot examples resulted in significant gains across all models in either domain. To investigate these discrepancies, they also analyzed the correlation between model accuracy and axes of difficulty in QuestBench, finding differing trends between domains. LLMs are more sensitive to the search depth in Logic-Q than in Planning-Q, suggesting that models may be utilizing strategies similar to backwards search when solving Logic-Q, but not when solving Planning-Q.

LLM evaluation benchmarks are important to understand a specific model’s strengths and limitations, to help with the fine tuning process, and also as a reference to decide which model to use for a specific use case. There are several LLM evaluation frameworks available for assessing the language models performance based on different criteria.

For more information on this study, check out the website, the research paper PDF, and the GitHub project, which is available under Apache 2.0 license model, for code to generate QuestBench data and evaluate LLMs on it. If you want to run the evaluation in your local environment, the steps include the installation of Conda environment, downloading the datasets, and running evaluations.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.