Category: Uncategorized

MMS • Leo Browning
Article originally posted on InfoQ. Visit InfoQ

Transcript
Browning: We’re going to be asking the question, how green is green? We’re going to be answering it using a RAG-based system. We’re going to take you through that journey of doing that over the last year at an early-stage startup. The how green is green is being asked of financial instruments. Where that’s important to the problem context, I’ll give you a little bit of background. Mostly we’re motivating the kind of problem that you see when you’re starting up in a deep domain space, and looking to apply these kinds of tools to that, where you start, where you make your early gains, where the challenges are, and where you move to from there. There are a couple things that I’d like you to walk away from this talk with.
The first is, I’d like you to understand that RAG system development from 0 to, in our case, first users, with a reasonably small team, so myself and growing to two other engineers in that time. Also understand the constraints that are placed on developing in that environment, especially when you’re dealing with large language models, which I think are often associated with having to have large amounts of data or large engineering teams or large compute budgets to get up to something valuable. Lastly, I would hope that you get a little bit of a taste of what it is to work in a climate and AI startup and understand why that’s a great place to be.
In order to do that, I’m going to give you a little bit of problem context. Like I said, this is in the area of climate finance, so putting money towards climate initiatives at reasonably large scale. This isn’t running a local raffle to plant trees. This is putting nation scale finance towards global scale problems. We’re going to talk about the initial technical problem and the technical solution that we put towards it. Then I’m going to walk you through how we built that up from initial value propositions.
Some of the challenges and ways that we overcame them and built into a more scalable solution. We’re specifically going to be focusing on the human in the loop and data flywheel component of starting up in these spaces. Then I’m going to talk a little bit about the work that I’m doing currently or looking to do in the next little while to continue scaling up, with the aim that these steps, whether you’re in climate finance or a startup or a large company, do mirror some of the challenges and hopefully some of the successes you can take away from this kind of development using these tools.
Really quickly on the problem context, I want to let you know who I am, so you know who’s speaking, you know where I’m coming from. I’ll talk a little bit about the company again that motivates the kind of resource constraints that we’re working in, and why we’re choosing to tackle the problem we are. Then I’m going to talk a little bit about what our solution has been, and we’re going to move into the technical meat and potatoes of this. My name is Leo. I started off in physics. I worked in complex systems, nanotechnology, and quickly discovered that the code and the models and the simulation was more to my taste than the physics, and have been working in AI and machine learning engineering ever since. I was the first engineering hire at ClimateAligned. ClimateAligned is an early-stage startup.
What that means is we have a reasonably small team, which, at the moment, sits at 10 members, 3 of which are technical. That’s the scale of the work that’s possible with this team. We have some funding, which means we’ve got a year 18 months to work with on this problem. In order to deliver the maximum value and have the highest chance of success in that, which is, I think, probably a good scope for any first project, we are looking to have a real laser focus on proving initial value while not hamstringing ourself for being able to demonstrate that what we are building can scale to hundreds, sometimes thousands of times in terms of the amount of data that we need to be working with and how fast it needs to be produced.
A big reason for that is because the market that we’re looking for and the customers that we are engaging with are some of the biggest financial players in the world. They manage an eye watering amount of the world’s money. That money needs to go to a pretty significant challenge, which most of the world and many of the corporate big players have decided is big enough that they are putting significant targets towards it in monetary terms, and they are big. That’s GDP scale money that needs to be put towards these very significant challenges. One of the big appetites that’s growing out of that is credible information on what the things to invest in that are going to be sound monetary and climate investments, so that you can put your money in the right place.
Problem Context
At the moment, this isn’t a new problem, although these targets, the 20, 50 targets are recent in the last 10 or 15 years, especially the level of commitment. The problem itself isn’t, which is deciding where to put your money towards the maximum impact. At the moment, the industry does that by applying a lot of expert internal hours towards combing through the information available for a financial instrument or a company, determining if it’s a sound investment, and making a decision on that. That’s expensive, and as I said, requires a significant amount of expert time. You can outsource that, but at the end of the day, whoever you’re paying is doing that same thing with people. This is hopefully painting a picture that’s familiar to some of you here, where AI is a useful tool for this problem.
In our case, we’re specifically emphasizing the tool component of AI. We’re not looking to have it make decisions because I don’t think the tech is quite there yet, but also because these decisions often involve gray areas where there isn’t quite a right answer, so putting the correct information and a tool capable of aiding in decision making is the way that we’re targeting it. That is hopefully familiar to those who have had some exposure to large language models and their applications. Probably the most common is ChatGPT or some chat-like interface. We had our initial prototypes for chat type because that’s a useful way to engage with something that people understand.
Also understanding more structured or more system scale data, rather than as a human interacting with it, consuming the kind of documentation that companies put out around their financial and climate statements. We’re looking at both the kind of human question text input data, as well as, I’m going to use the term structured very loosely here, because these companies produce some wild documentation, but it is still a different form of data input. Our solution is to use these LLMs to process at scale. We use a document and data management ingestion, which we run in-house. I’m not going to focus on that hugely at this stage, and probably won’t touch on it in this talk.
More or less, we’re consuming documents from the internet and storing them in a structured dataset within our system. Because this is financial decisions that need to be made, accuracy is quite important, and trust in that accuracy is quite important, both for our users and for ourselves from a liability perspective. I will be emphasizing quite a bit the accuracy and accountability side of things, which, if I’m honest, outside of chat applications, even in chat applications, I think should be an emphasis on any work with these systems. They are very powerful, but their auditability is often low out of the box, and it is one of the areas where RAG, which is one of the things that we focused on, really shines.
Technical Overview
Technical overview, we’re going to talk a little bit about how this actually works. I’m going to give you a system overview. I am going to give a brief introduction to RAG. I’ll give a little bit of an overview. The simple version of the core technical problem that we’re trying to solve is to answer questions that need to have answers for them in order for somebody to make decisions on a financial instrument. The answers to those questions are in documents, and they usually come in a large selection of questions that need to be presented in a structured format. You can see, there’s a screenshot of the product here. It’s structured format of a number of hierarchical questions.
All these questions, if you look into them, have answers, and, very importantly, have sources, which is a very critical component of the RAG process. It’s literally what it is, but it’s also one of the reasons we picked it. It’s quite critical to its value proposition. I emphasized it earlier, but the documents that we’re consuming are of no structured format. They wildly vary across companies, and even within companies, year-to-year. They get a different consultant to make them. There’s a significant variation in the source data that needs to be accounted for and which LLMs, at least ideally, are well suited towards processing.
Challenges. Accuracy, how do I ensure the answers are correct? How do I know when they are correct at the time that we’re serving them to a user? Then, as I’ve said, auditability and transparency, which is covered by default in the RAG system. Then also flexibility, how do I ensure that if a user has a different set of these questions over here, that they can get answers in a timely manner and in a reliable manner, especially given some of the ways we initially approach things with more human involvement for correctness. I’ll go into some detail there. Then, as I said, ensuring things scale as we look to ask many more questions across the landscape of all documents and all companies, which, off the top of my head, there’s about 60,000 companies that are producing data in this space at a reasonable rate, a couple of times a year. That should give you a scale of the problem there.
As I said, first component, I’m not going to talk too much about document consumption other than to reference the quality of it. We’re going to focus primarily on the question asking, as well as the things that make that question asking more reliable and more auditable. We’re going to focus on the evolution of our search system, the evolution of how we have done the question answering. Then, looking at the involvement of humans and the need for review when you’re dealing with these LLM based systems. This is a quick jargon drop, so if you already work in this space, I’m going to reference a couple of quick technologies. They aren’t rocket science, but it gives you a little bit of a picture of what we’re working with. You can slot that away, if these are unfamiliar to you. I will be conceptually explaining everything. It’s not a big deal. It’s a RAG based system.
The search that is the differentiating aspect of RAG, versus just asking a question of an LLM as a hybrid text-based search. I’ll go into the various aspects of the hybridization. We use Postgres for all of our data storage. Especially to start with, we have used OpenAI for all of our large language model usage, although we will talk about moving away from that. We’re running everything on a Python stack, because it tends to be lingua franca for this workspace.
What is RAG? RAG stands for Retrieval Augmented Generation, where the retrieval component of it is a search across some sources of information that you know or hope contain the correct information actually in them. In this case, the LLM is used to process and synthesize that information, but is not in a single call being directly used to produce an answer from its own internal knowledge. You are using a search system to deliver some sources that get grouped in to the initial prompt along with whatever question you might be asking. That provides, in theory, the correct information to then ask the question of an LLM and get a better answer, or more up to date, or just with verifiable sources than you would get from asking the LLM directly.
Project Evolution
I’m going to talk a little bit about that evolution. The goal here is I’m going to be talking about my experience, and hopefully that parallels the experience that you would have starting a reasonably substantial project within your own organization, or as a side project, or elsewhere, moving beyond a first introduction to RAG or a first introduction to LLMs. Focuses that we have when we are thinking about it is that it has to be AI driven or some form of automated system, because, as I mentioned, this is already being done slowly by humans. That’s not a solution for us, at least not a long-term one. Because accuracy and competence are very important, I am going to talk about how we have heavily used human in the loop systems, both to ensure correctness in the product and to build datasets that allow us to then build automated systems to speed up that automation system.
Then we’re going to talk a little bit about search and how that applies to a larger scale problem. Let’s talk about humans. I think with AI talks in general, or especially talks about LLM, humans are left out a little bit. Human in the loop is a very powerful way to think about these systems, especially when you’re starting out. This is assuming that you don’t have, beforehand, a large amount of correct examples. In that case, you might reach for a human which has, out of the box, very good accuracy. Our analyst, Prashant, who is living and breathing this kind of analysis work for the last 5 years, is about 99% correct, and part of that is because these are tough problems and sometimes gray areas, but that’s very high. We as humans tend to have a high degree of trust in other humans. A user or a customer is always more likely to trust Prashant’s answer than a model’s just because of that initial trust. He’s highly flexible. He can abstract to other problem spaces or new types of data, but he’s very slow and he’s very expensive.
Even when we’re talking about GPUs that LLMs run, a human is a very expensive resource compared with that. Human in the loop thus has a great initial benefit when you’re starting out with a problem like this, because it lets you build your own internal trust. If you can find a way to demonstrate to your users that the human in the loop has that accuracy, and then you’re not going to rely on that forever more, it builds that trust with your users as well. Eventually, it’s too slow to keep up, and that’s where building that trust in scaling, even if the scaling is still down the road, as in our case, where we have made significant improvements and need to make significant ones in the future, is quite a valid approach.
Extending that RAG model to include human in the loop, in the case of our system, so we’re asking these questions to determine the greenness of a document type. We might be asking 100 or so questions about any given financial entity, and you can pick any big nation state or giant corporate. That’s what we’re talking about here. The model itself, so the core question answering LLM with some prompt tuning and with a reasonable initial search approximation, and we’ll dive more into the search in a little bit, is about 85% accurate for our system after some effort. That’s pretty good for an LLM. I think Jay mentioned that for a substantive problem, RAG can sometimes be 3 or 6 out of 10 accuracy, so 30% or 60% is a normal range you would expect, even with a bit of prompt tuning. We’re already pretty happy with that.
That’s a useful thing to think about when you’re thinking about your first problem is, where’s your happy level of accuracy? Because you could start out with something quite low, especially with a dense domain area where the models are a little bit less generally trained. Because we put Prashant, our analyst, our expert, insert whatever your expert knowledge source is between the model produced answers and our production, we have a system now, a product that we can confidently say is much more accurate than the underlying model, 99% as opposed to 85%. In this particular case, even with Prashant looking at every single answer, the time to assess for a single company goes from about 2 hours to 20 minutes, because already we’re giving him the sources in front of him. We’re giving him proposed answers, and most of the time they’re correct. He’s only having to make some judgment call, some additional assessment on top of that.
I think that’s a very powerful step, going from manual totally to something fully automated that is often overlooked when you’re thinking about how you can get initial value in things. That’s already a loose order of magnitude in speed-up in time, while still having everything human checked. I think that it was certainly very powerful for us. It enabled us to get out productionized data in a full platform, far before we were able to produce exceptionally high-quality models internally. The other benefit of having this initial human in the loop is that you do build up a fair dataset now of correct and incorrect examples produced by your core RAG system. A correct or labeled dataset lends itself to what I often, tongue in cheek, call Trad NLP or Trad ML or Trad AI. People call it classical AI. You need a little bit of training data. When you have that training data, you’re allowed to then pick from a wide variety of capabilities and specificities of model types, which you can then fit to a purpose that allows you to not use something massive and general purpose for a specific task.
However, that’s only unlocked with a certain amount of data, and that varies from task to task. Sometimes you might need 50 examples for something like few-shot learning in the NLP space. Or if you were trying to train a model from scratch, or an adapter, you could be looking at 5, 10, 100,000 examples. We found that the 500 mark is super useful for having validation data and exploring the few-shot space. We’ve just recently hit the 10,000 space, where we’re looking at fine-tuning models. That’s a really nice data mark there when you’re looking at the flywheel value that comes out of human in the loop.
Once we have this labeled data and we’re looking at what kind of other models we can add to the system, note here that we’re not actually thinking at this stage about making changes to the core LLM. At this moment the core LLM that’s being asked the significant question with the sources is still GPT-4, which I’ll talk a little bit more about pros and cons there. We were able to add in both some traditional AI to our search methodology, and I’ll go through that in greater detail, but also some traditional AI to accelerate, yet again, the process of our analyst looking through those systems that need review. Because we have this 10,000 dataset, that’s enough to build a simple classifier that’s able to flag those things that do need review, so that Prashant can completely ignore those that do not. We see about another order of magnitude increase in just throughput and ability to get the data into production while still getting some extra labeled data out of the system.
Again, the whole arc here has had our system accuracy at 99%. The underlying model, the core LLM, or the RAG component, is still at 85%. That’s not what’s been changed here. We’ve added other components to the system around RAG to increase the throughput and increase or decrease the latency of the system, and thus enable us to move faster and get more into production at scale.
I’ve said at least twice that we would dig a little bit into the hybrid search, or the search aspect of things. We started off, as I think most people do and should, in trying out RAG with a vector database or a similarity search. Essentially, you look at the vector similarity of your query and your sources, and you collect those things which are most similar. We found that especially because our system is designed to work on a very specific set of domain data, and we know about that domain, a similarity search combined with some heuristics, so some rules based on our own understanding of what questions might be asked of what kind of data to narrow down the search space, was very tractable initially.
By adding a keyword search, we used BM25, and then doing a little bit of reranking using RRF, which is Reciprocal Rank Fusion. I would recommend both of those as good first passes for keyword search. Combining that ranking with a similarity search, we were able to see significant increases in the quality of the model. This was the state of our search that brought us to that 85% accuracy. That was most of the improvement, in fact, from the out of the box 30% to 60% that you get. We saw most of our improvements by just tuning up our search and adding in a little bit of domain knowledge, as well as a combined keyword and similarity search. That provided a large amount of initial value in the space. I would encourage that as a first pass. I think the first thing off the shelf that I would make better is your search, especially making it more specific to your problem space.
Challenges and Future
What are we working on now? What are the challenges that are in front of us? What do we want to do in the future? Again, we’re at about the nine months to a year of technical work on this problem. We have a first user on the platform. We’re staring down the barrel, hopefully, of lots more users coming in and lots more use cases needing to be sold. Through that, we need to make sure that we can scale. Two bottlenecks at the moment, or rather, maybe two areas for improvement, rather than bottlenecks. We like to keep it positive. The search was the biggest improvement that we saw to the underlying model accuracy, which does underpin how hard every other problem is. I’m going to talk a little bit about how we can include document context and things like topic or semantic based tagging into that search system.
Then I’ll finish on talking a little bit about fine-tuning and in-house models or open-source models, and how we see that taking us from this initial stage of more general, more expensive API based models, into more task specific models as part of this growing constellation of components that build up the system that underpins our product. Those are the things that we want to do. I’m going to hammer home again, we always need to be considering for ourself, how can we ensure that the answers that we’re putting into production are correct? We don’t have Prashant or our analyst looking at every single one of them anymore.
That’s always something of consideration, whether the model that is now flagging things for review is still fit for purpose based on, for instance, a new customer’s use case, a new entity’s documentation that is very different than others, or a new type of financial instrument. We need to consider their own desires for different types of questions being asked. You can imagine the important information you might get from a document, could be anything from important numbers or summaries of their broad stances on things, comparison with other entities in the same industry space. You wanted to compare all car manufacturers.
Those are all both slightly different question cases, but also require a bit of scale. You can’t compare all the different car manufacturers until you have all the different car manufacturers. You can repeat that for every industry that you can think of. These all require a larger scope of work, which is a little bit of a pro and a con because we’re such a small team, we’ve been working on fast improvements up till this point. I think that that’s very important at the early stages of a company or a project, if you’re in a larger company. We’re now looking at more serious chunks to try and knock off that last little bit of accuracy and really reduce down that latency and have a faster time to product for new data.
We’re going to talk a little bit about improving search and document context. Then I’ll end on improving core LLM performance. The first two bullet points are the first two sections there, what we talked about earlier. They were the main improvement, as I said, that took us to that 85% accuracy. Something that I’ve been working on very recently has been adding topic-based ranking, which is tractable, both because we have some training data, so we now have enough correct examples that there’s something to work with, and because we have some expert knowledge. We understand that when you are looking to provide answers for a certain question type, the most general case might be that you ask the question, and the sources that you are searching across is something like the internet. It’s very hard to inject domain knowledge into such a large search set.
However, if you were able to, for instance, know that you’re always going to be asking questions about climate, you might narrow down your search set. We will be looking to do that in an automatic way, using topic detection. I’m going to talk a little bit of some initial results that we have there, and some of the initial challenges.
A little bit of a dense graph, but mostly it’s there to show the shape of the data than any specific data point. What we’re seeing is two documents for a particular entity, a financial entity. You can see them pretty clearly differentiated. I’ve just put them up on one slide so you can see the difference in the shape of the data here. The white boxes represent the tag targets that we have from our training data, so we know for these documents what different chunks of text. They are on the order of 500 across these two documents, which is a reasonably arbitrary number. You can slice and dice however you like. You can see the arc of these topics over the course of the document. You can see that in the first document, when we’ve labeled things, there’s a clear arc of important topic tags that happens in the body section, and a little bit at the start that’s obscured by my labels. We’re going to focus on this. In the body section, there’s a clear arc from a number of different topics.
These might be topics like investment in waste recycling, or investment in green energy sources, or investment in more efficient buildings. They’re referenced in sequence to the main body. Then, the rest of the document is filled with what I’m calling admin tags. They’re useful to know what kind of not useful the text chunk might be, but we don’t actually care about them. They might be things like headers and footers or filler information on methodology. The second document, at least in terms of the targets, doesn’t look very useful. That’s because this second document talks about assessing the company against a different way of thinking of climate analysis, and so isn’t actually useful to the questions that we’re trying to ask in this particular example.
However, you do see the model predictions, as noted by these red X’s, actually labeling some significant topic tags up in this top right corner. The reason for that is that in the text chunk size that we’re talking about, which might be about a half a page, a lot of the same language is being used. This latter document actually considers the case where they talk about how one might assess a company. Over the course of many pages or over the course of a whole section of the document, it might be clear that that’s the case. In the individual text chunk, the language being used is the same as the language used elsewhere when they’re talking about how it actually is assessed. This is quite a common problem with LLMs, broadly speaking, and with RAG systems, is that you have to decide the size of chunk of information that you give to them.
Some of the latest models, you can put in whole documents, if you would like, but their ability to determine specific facts within that context is quite reduced. For the most part, at the moment, you’re always trying to chop it up a little bit and serve the right things through to it. In this particular case we’re looking for about half a page, the problem being that when you serve that half a page, it doesn’t know where in a document that is coming from, unless you supply that in addition. While we were working on this topic detection, we realized that this topic detection itself, and most likely, the underlying LLM that would be using these source chunks would be greatly benefiting from having longer range context added to the chunks being passed through. That’s something that we’re actively working on, and a lot of it comes down to correctly parsing and consuming messy PDFs, because it’s a problem that’s been solved many times in many different ways. The likelihood of it being solved in your way is very low.
Then, lastly, the next step that we’re looking to make, and this is mostly on a scale performance, both in time and in money side of things, is we want to move away from using GPT-4 for everything. I think there was a talk that used a great phrase, she said, “GPT-4 is king, but you don’t get the king to do the dishes”. At the moment, we very much got the king up to his elbows doing everything, and that’s ok. When you’re proving initial value, I think that’s a very valid and important way to start. You don’t want to over-engineer yourself into building a lot of in-house custom models right out the gate. You don’t know if your problem is tractable. It does require more engineering effort to bring that in. Depending on the scale of the engineering team and expertise you can bring to this, that could be something you want to consider.
One thing we are moving towards on this theme of topics and being more specific about how we handle each source, is that we’re looking to split and potentially fine-tune the kind of models we’re using on a topic or document type basis. We think that there’s significant ground to be gained in terms of using different models for different source types. This isn’t for us in particular, but if you were using RAG in general for a more general context, you might use a different model, fine-tuned or otherwise, to do question answering on web pages versus on code. I think code is probably the most well used example here in terms of fine-tuning models to work on code. Yes, they are still technically natural language, and a general model can operate on code, but fine-tuning on a particular language or domain tends to prove great value, and that’s something that we’re looking at next.
ChatGPT and the GPT-4 models are the best-in-class off the shelf. If people would like to challenge that, I would welcome it, because I would love to hear about what the other best-in-class off the shelf is. That’s fantastic. They’re wrapped behind a pretty slow and unreliable API. If you haven’t worked with it yet, I would encourage you to try that out. If you’re used to APIs in any other context, these are the slowest and most unreliable. That is not to denigrate OpenAI, because the things behind them are enormous, and it’s a miracle that they function at all at the scale that they do, but they are that. When you’re thinking about them just as a service, there’s a risk there, but because of their flexibility off the shelf, great initial tradeoffs. I would recommend that highly.
Then, as I said, open models have this high control. We had a talk about all of the values that you get from open models with the engineering setup cost, which I think is a worthwhile value exchange once you’re past that initial stage of your project. We are looking at now expanding our data across a number of dimensions, covering financial reporting across time, so historical information on companies and financial instruments across different kinds of financial instruments, of which there are many. I couldn’t even list them all, because I’m not a financial expert. As well as across industries and applications of that money towards the climate goals.
Once you have that large scale of data, you start to unlock things like comparison, the ability to understand differences between the data that your users are interested in, and then, hopefully, unlock the ability for them to do new work, and then for you to add new value once you see what those new workflows are. It’s important to put the initial product in front of them, so that you can see what those workflows are without having to guess too much about that. We’re at that stage where we are now seeing customers use the products and look to certain things like this, although we have guessed that they would use others and they don’t want to, it seems.
Not over-engineering that too much is quite important so that you end up with a system where you’re investing the engineering effort that you need to when you know that it’s worthwhile, especially when you’re working with a people constrained development cycle, which I hope mirrors a project team as much as it does a startup. Although sometimes the constraints are a little sharper or more relaxed, depending on your institutional context.
Questions and Answers
Participant 1: There’s a bit of a paradox in using LLMs and financing climate initiatives. Can you elaborate a bit about that. Do you also assess your own CO2 emissions?
Browning: There is always a paradox in using the master’s tools to dismantle the master’s house. That is a core problem, just a core constraint, I think, in using tech to tackle any of these problems. We don’t assess our CO2 on a regular basis, although we did do some back of the napkin calculations in terms of the financial mistakes that one would make investing $10 trillion per annum, which is what’s required to achieve these targets at a very high level.
The mistakes one might make in terms of making bad decisions, versus our very modest GPU costs, at least at this stage, and of course, those numbers don’t stack up, but it is something of primary concern. It’s one of the reasons why I think those open models, if you’re assuming, which we are, that AI is a powerful tool to do this job better and faster, then looking towards smaller models that can be run more efficiently and more effectively is a very practical solution towards it. I think at this stage, that’s probably the only practical solution, at least in this case, because you are looking to answer questions on complex data. At the moment, LLMs, that’s an unlocker or enabler. I think that we’ll see because of the ability to generate data from LLMs, the ability to move to much lighter weight systems in the future.
Participant 2: It seems like a good chunk of the work that you’re doing isn’t actually climate specific, in terms of, you’re figuring out how to parse documents for certain information. I’m curious, A, why you’ve limited yourself to climate for this, and B, how much work you think it would take to transition to a different focus within a company’s finances?
Browning: Especially in light of how this talk, the aim was to think about how you might tackle these similar problems in a larger institutional context, you can substitute climate in your question, which I will reiterate, which is that this is a general problem. Why limit yourself to climate if you’re trying to solve a general problem? Then the flip side is, how do you move out of your specific problem? In a startup’s case, what we’re looking for, and perhaps in any project where you don’t have unlimited money initially, you’re looking for a small problem that easily scales to a big solution. For climate, the amount of money involved makes it a really strong value proposition for how big it can move. Where, in terms of the money we’re able to address, the scope of the technical problem isn’t a huge limiting factor. In terms of moving from specific to general, that would not be something we’d want to do, because a lot of people are solving RAGs generally.
A lot of the value that we have as a smaller team is not that we’re likely to crack the general problem much better than those teams. In fact, that would be very unlikely. Rather, that we are able to inject both, sometimes quite literally, human expertise, but also just expertise or domain-driven development, into those thorny problems that, in my view, will be very tough to solve generally, at least with the state of things at the moment. Every so often, I just try and slam whole documents through whatever the latest and greatest model is, and see how it works, or use somebody’s off the shelf RAG. While impressive in the fact that they can address many problems, they always come short of even our initial approaches to a very tightly focused technical solution that we’ve often encoded a lot of very domain specific problems around.
I think actually in answer to you, as a summary, I think tackling a very specific slice of a general problem is very tractable. Unless you believe the general solution is right around the corner, is going to take everything off the board, which I don’t, and I used to worry about it. Then as soon as I started working with these things, I realized both how magic and how rubbish they are. I wouldn’t be too worried about the general solution wiping you off the board, although, if the problem you have is general, then I think there are some pretty great options around for pulling in a service or rolling it yourself if you have enough resource.
Participant 3: You say you’re not going to provide a generic solution of SaaS with what you did. Do you plan to open source the project that could be reused by other business people, so for matters of business that they can use your approach to accelerate their time to business?
Browning: On the scale of a startup lifetime, given that our expertise is one of our significant moats, and there are certainly other people with that domain expertise who are not applying it to this problem, either because they are not able to or haven’t thought about it, that we would be unlikely to open source that side of things in the next little while. I think probably a much more likely situation is that we start to use some of the better, general-purpose system components and contribute in that way, and get a little bit of a bidirectional flow.
At least until we establish ourself, the more heuristic or domain specific knowledge components are unlikely to be opened up. Fine-tuning models is still a pie in the sky at the moment. I would love to get to it in the next three months. Once I do, I think that’s a really different consideration. There are certainly some people who are fine-tuning models on climate specific or financial document specific datasets, and I think that would be someplace as well where we would consider something like that. The core heuristic approaches will stay closed for a little while.
Participant 4: I’m trying to understand, so you process climate data, intersect that with financial product information to identify investment opportunities?
Browning: The data that we’re producing is companies actually reporting on their own climate financial data. It’s all in one. We don’t have to do any intersection there. As part of trying to be an attractive company to invest in, companies will release this data, often poorly. They’re not very good at it yet. We hope that they get better. We’re looking to answer questions across that self-reported data, at least at this stage. There’s no reason why, once the system becomes well-tuned, you couldn’t add in other sources of information, like their financials that are non-climate based to see if that’s stacking up. Our initial solution is focusing very much on their self-reported climate finance information.
See more presentations with transcripts

MMS • Sergio De Simone
Article originally posted on InfoQ. Visit InfoQ

Currently in preview with Docker Desktop 4.40 for macOS on Apple Silicon, Docker Model Runner allows developers to run models locally and iterate on application code using the local models— without disrupting their container-based workflows.
Using local LLMs for development offers several benefits, including lower costs, improved data privacy, reduced network latency, and greater control over the model.
Docker Model Runner addresses several pain points for developers integrating LLMs into containerized apps, such as dealing with different tools, configuring environments, and managing models outside of their containers. Additionally, there is no standard way to store, share, or serve models. To reduce the friction associated with that, Docker Model Runner includes
an inference engine as part of Docker Desktop, built on top of llama.cpp and accessible through the familiar OpenAI API. No extra tools, no extra setup, and no disconnected workflows. Everything stays in one place, so you can test and iterate quickly, right on your machine.
To avoid the typical performance overhead of virtual machines, Docker Model Runner uses host-based execution. This means models run directly on Apple Silicon and take advantage of GPU acceleration, which is crucial for inference speed and development cycle smoothness.
For model distribution, Docker is, unsurprisingly, betting on the OCI standard, the same specification that powers container distribution, aiming to unify both under a single workflow.
Today, you can easily pull ready-to-use models from Docker Hub. Soon, you’ll also be able to push your own models, integrate with any container registry, connect them to your CI/CD pipelines, and use familiar tools for access control and automation.
If you are using Docker Desktop 4.40 for macOS on Apple Silicon, you can use the docker model
command, which supports a workflow quite similar to the one you are used to with images and containers. For example, you can pull
a model and run
it. To specify the exact model version, such as its size or quantization, docker model
uses tags, e.g.:
docker model pull ai/smollm2:360M-Q4_K_M
docker model run ai/smollm2:360M-Q4_K_M "Give me a fact about whales."
However, the mechanics behind these commands are particular to models, as they do not actually create a container. Instead, the run
command will delegate the inference task to an Inference Server running as a native process on top of llama.cpp. The inference server loads the model into memory and caches it for a set period of inactivity.
You can use Model Runner with any OpenAI-compatible client or framework via its OpenAI endpoint at http://model-runner.docker.internal/engines/v1
available from within containers. You can also reach this endpoint from the host, provided you enable TCP host access running docker desktop enable model-runner --tcp 12434
.
Docker Hub hosts a variety of models ready to use for Model Runner, including smollm2 for on-device applications, as well as llama3.3 and gemma3. Docker has also published a tutorial on integrating Gemma 3 into a comment processing app using Model Runner. It walks through common tasks like configuring the OpenAI SDK to use local models, using the model itself to generate test data, and more.
Docker Model Runner isn’t the only option for running LLMs locally. If you’re not drawn to Docker’s container-centric approach, you might also be interested in checking out Ollama. It works as a standalone tool, has a larger model repository and community, and is generally more flexible for model customization. While Docker Model Runner is currently macOS-only, Ollama is cross-platform. However, although Ollama supports GPU acceleration on Apple Silicon when run natively, this isn’t available when running it inside a container.

MMS • RSS
Posted on mongodb google news. Visit mongodb google news

Article originally posted on mongodb google news. Visit mongodb google news

MMS • RSS
Posted on mongodb google news. Visit mongodb google news
Key Highlights:
- MongoDB (MDB, Financial) surpasses Q4 revenue expectations by 5.6% with a 19.7% increase, totaling $548.4 million.
- Despite strong financial metrics, the stock price has decreased by 38.4%.
- Analyst projections show a significant potential upside, with a one-year average price target of $287.89.
Impressive Q4 Performance Yet Stock Decline
MongoDB Inc. (MDB) has recently reported a robust 19.7% uplift in its Q4 revenue, reaching an impressive $548.4 million. This figure not only exceeded analysts’ predictions by 5.6% but also highlighted the company’s strong revenue-generating capabilities, particularly in its Atlas segment. Nevertheless, a perplexing trend has emerged; the stock has encountered a substantial decline of 38.4% since the report was released, trading currently at $163.06.
Wall Street Analysts’ Future Projections
The sentiment on Wall Street towards MongoDB remains optimistic despite recent price setbacks. An aggregation of predictions by 34 financial analysts indicates an average target price of $287.89 for MongoDB Inc. (MDB, Financial) over the next year. This includes a high forecast of $520.00 and a low estimate of $170.00, presenting an average anticipated
upside of 89.81%
from its current trading price of $151.67. Investors can explore further detailed estimates on the MongoDB Inc (MDB) Forecast page.
In addition to the target price, brokerage firms are expressing a positive outlook with an average recommendation score of 2.0, denoting an “Outperform” status. This rating, which is based on inputs from 38 brokerage firms, spans a scale where 1 means Strong Buy and 5 stands for Sell, reinforcing the confidence in MongoDB’s forward trajectory.
GuruFocus’s Estimation of Fair Value
GuruFocus’ proprietary metrics yield further insights into MongoDB’s valuation. The estimated GF Value for MongoDB Inc (MDB, Financial) in the next year is projected at $432.89. This suggests a substantial
upside
of 185.42% from its current market price of $151.67. The GF Value reflects an intrinsic value calculation, grounded in historical trading multiples, business growth trajectories, and future performance forecasts. For a more comprehensive understanding, additional data is accessible on the MongoDB Inc (MDB) Summary page.
Article originally posted on mongodb google news. Visit mongodb google news

MMS • RSS
Posted on mongodb google news. Visit mongodb google news
MongoDB (MDB, Financial) disclosed in a recent regulatory filing that Srdjan Tanjga, the Interim Chief Financial Officer, has decided to step down from his role. His resignation will be effective as of May 8, 2025. Tanjga has been serving as the company’s principal financial officer, and his departure will not affect his compensatory package.
The company is currently in advanced discussions with a prospective candidate to fill the CFO position and anticipates making a formal announcement within the next week to ten days. MongoDB is looking to ensure a smooth transition by swiftly appointing a successor to maintain continuity in its financial leadership.
Wall Street Analysts Forecast
Based on the one-year price targets offered by 34 analysts, the average target price for MongoDB Inc (MDB, Financial) is $287.89 with a high estimate of $520.00 and a low estimate of $170.00. The average target implies an
upside of 89.81%
from the current price of $151.67. More detailed estimate data can be found on the MongoDB Inc (MDB) Forecast page.
Based on the consensus recommendation from 38 brokerage firms, MongoDB Inc’s (MDB, Financial) average brokerage recommendation is currently 2.0, indicating “Outperform” status. The rating scale ranges from 1 to 5, where 1 signifies Strong Buy, and 5 denotes Sell.
Based on GuruFocus estimates, the estimated GF Value for MongoDB Inc (MDB, Financial) in one year is $432.89, suggesting a
upside
of 185.42% from the current price of $151.67. GF Value is GuruFocus’ estimate of the fair value that the stock should be traded at. It is calculated based on the historical multiples the stock has traded at previously, as well as past business growth and the future estimates of the business’ performance. More detailed data can be found on the MongoDB Inc (MDB) Summary page.
- CEO Buys, CFO Buys: Stocks that are bought by their CEO/CFOs.
- Insider Cluster Buys: Stocks that multiple company officers and directors have bought.
- Double Buys: Companies that both Gurus and Insiders are buying
- Triple Buys: Companies that both Gurus and Insiders are buying, and Company is buying back.
Article originally posted on mongodb google news. Visit mongodb google news

MMS • Mandy Gu
Article originally posted on InfoQ. Visit InfoQ

Transcript
Gu: I’m very excited to share about some of the ways we’re leveraging generative AI for productivity at Wealthsimple, and the journey that got us to this place. My talk is going to be roughly structured and broken into four sections. I’ll start by sharing some context about what we do. We’ll dive deeply into our LLM journey. I’ll talk also about the learnings that came out of it. Then I’ll end with sharing a quick snapshot overview of generative AI today.
Wealthsimple is a Canadian FinTech company. Our mission is to help Canadians achieve their version of financial independence. We do this through our unified app, where investing, saving, and spending comes together as one. At Wealthsimple, our generative AI efforts are primarily organized into three streams. The first is employee productivity. This was the original thesis of how we envision LLMs to add value, and continues to be an area of investment today. As we started building up the foundations, the tools, the guardrails for employee productivity, this also gave us the confidence to start extending the same technologies for our clients, to actually optimize operations, which becomes our second stream of focus.
In optimizing these operations, our goal is to use LLMs and generative AI to provide a more delightful experience for our clients. Third, but certainly not least, there’s the underlying LLM platform, which powers both employee productivity and optimizing operations. Through the investments in our platform, we have a few wins to share in the past 1.5 years since we’ve embarked on our LLM journey. We developed and open sourced our LLM gateway, which, internally, is used by over half the company. We developed and shipped our in-house PII redaction model. We made it really simple to self-host open source LLMs within our own cloud environment. We provided the platform support for fine-tuning and model training with hardware accelerations. We have LLMs in production optimizing operations.
LLM Journey (2023)
How do we get here? Almost two years ago, on November 30, 2022, OpenAI released ChatGPT, and that changed the way the world understood and consumed generative AI. It took what used to be a niche and hard to understand technology and made it accessible by virtually anyone. This democratization of AI led to unprecedented improvements in both innovation and productivity. We were just one of the many companies swept up in this hype and in the potential of what generative AI could do for us. The first thing that we did in 2023 was launching our LLM gateway. When ChatGPT first became popularized, the security awareness from the general public for fourth, and third-party data sharing was not as mature as it was today. There were cases where companies were inadvertently oversharing information with OpenAI, and this information was then being used to train new models that would become publicly available.
As a result, a lot of companies out there, Samsung being one of them, had to actually ban ChatGPT among employees to prevent this information from getting out. This wasn’t uncommon especially within the financial services industry. At Wealthsimple, we really did see GenAI for its potential, so we quickly got to work building a gateway that would address these concerns while also providing the freedom to explore. Our gateway, this is a screenshot of what it used to look like in the earlier days. In the first version of our gateway, all it did was maintain an audit trail. It would track what data was being sent externally, where was it being sent externally, and who sent it.
Our gateway was a tool that we made available for all employees behind a VPN, gated by Okta, and it would proxy the information from the conversation, send it to various LLM providers such as OpenAI, and track this information. Users can leverage a dropdown selection of the different models to initiate conversations. Our production systems could also interact with these models programmatically through an API endpoint from our LLM service, which also handles retry and fallback mechanisms. Another feature that we added fairly early on in our gateway was the ability to export and import conversations. Conversations can be exported to any of the other platforms we work with, and they can be imported as checkpoints to create a blended experience across different models.
After we built the gateway, we ran into another problem, which was adoption. A lot of people saw our gateway as a bootleg version of ChatGPT, and there wasn’t that much incentive to use it. One of our philosophies at Wealthsimple is, whenever it comes to new technologies or new tools, we want to make the right way the simple way, or the right way the path of least resistance. We wanted to do something similar with our gateway as well. We wanted people to actually want to use it, and we want to make it really easy for people to use it. We emphasized and amplified a series of sticks and carrots to guide them towards that direction. There was a lot of emphasis on the carrots, and we let a lot of the user feedback drive future iterations of our gateway. Some of the benefits of our gateway is that, one, it’s free to use. We pay for all of the cost. Second, we want to provide optionality. We want to provide a centralized place to interact with all of the different LLM providers.
At the beginning, it was just OpenAI and Cohere, so not much to choose from. This list also expanded as time went on. We also wanted to make it a lot easier for developers. In the early days of interacting with OpenAI, their servers were not the most reliable, so we increased reliability, availability through a series of retry and fallback mechanisms. We actually worked with OpenAI to increase our rate limits as well.
Additionally, we provided an integrated API with both our staging and production environments, so that anyone can explore the interactions between our gateway and other business processes. Alongside those carrots, we also had some very soft sticks to nudge people into the right direction. The first is what we called these nudge mechanisms. Whenever anyone visited ChatGPT or another LLM provider directly, they would get a gentle nudge on Slack saying, have you heard about our LLM gateway? You should be using that instead. Alongside that, we provided guidelines on appropriate LLM use which directed people to leverage the gateway for all work-related purposes.
Although the first iteration of our LLM gateway had a really great paper trail, it offered very little guardrails and mechanisms to actually prevent data from being shared externally. We had a vision that we were working towards, and that drove a lot of the future roadmap and the improvements for our gateway. Our vision was centered around security, reliability, and optionality. Our vision for the gateway was to make the secure path the easy path, with the appropriate guardrails to prevent sharing sensitive information with third-party LLM providers. We wanted to make it highly available, and then again, provide the options of multiple LLM providers to choose from.
In building off of those enablement philosophies, the very next thing we shipped in June of 2023 was our own PII redaction model. We leveraged Microsoft residuals framework along with an NER model we developed internally to detect and redact any potentially sensitive information prior to sending to OpenAI or any external LLM providers. Here’s a screenshot of our PII redaction model in action. I provide this dummy phone number, I would like you to give me a call at this number. This number is recognized by our PII redaction model as being potentially sensitive PII, so it actually gets redacted prior to being sent to the external provider. What was interesting is that with the PII redaction model, while we closed a gap in security, we actually introduced a different gap in the user experience. One of the feedback that we heard from a lot of people is that, one, the PII redaction model is not always accurate, so a lot of the times it interfered with the accuracy, with the relevancies of the answers provided.
Two, for them to effectively leverage LLMs into their day-to-day work, it needs to be able to accept some degree of PII, because that fundamentally was the data that they worked with. For us, and going back to our philosophy of making the right way the easy way, we started to look into self-hosting open source LLMs. The idea was that by hosting these LLMs within our own VPCs, we didn’t have to run the PII redaction model. We could encourage people to send any information to these models, because the data would stay within our cloud environments. We spent the next month building a simple framework using llama.cpp, a quantized framework for self-hosting open source LLMs. The first three models that we started self-hosting was Llama, it was Llama 2 at the time, the Mistral models, and also Whisper, which OpenAI had open sourced. I know technically, Whisper is not an LLM, it’s the voice transcription model. For simplicity, we included in the umbrella of our LLM platform.
After introducing these self-hosted LLMs, we made a fast follow by introducing retrieval augmented generation as an API, which also included a very deliberate choice of our vector database. We heard from a lot of the feedback, and we saw in both industry trends and the use cases that the most powerful use cases of LLMs involved grounding it against context that was relevant to the company. Making these in similar investments within our LLM platform, we first introduced Elasticsearch as our vector database.
We built pipelines and DAGs in Airflow, our orchestration framework, to update and index our common knowledge bases. We offered a very simple semantic search as our first RAG API. We encouraged our developers and our end users to build upon these APIs and building blocks that we provided in order to leverage LLMs grounded against our company context. What we found very interesting was that even though grounding was one of the things that a lot of our end users asked for, even though intuitively it made sense as a useful building block within our platform, the engagement and adoption was actually very low. People were not expanding our knowledge bases as we thought they would. They were not extending their APIs. There was very little exploration to be done. We realized that we probably didn’t make this easy enough. There was still a gap when it came to experimentation. There was still a gap when it came to exploration. It was hard for people to get feedback on the LLM and GenAI products that they were building.
In recognizing that, one of the next things that we invested in was what we called our data applications platform. We built an internal service. It runs on Python and Streamlit. We chose that stack because it’s easy to use and it’s something a lot of our data scientists were familiar with. Once again, we put this behind Okta, made it available behind our VPNs, and created what was essentially a platform that was very easy to build new applications and iterate on those applications. The idea was that data scientists and developers, or really anyone who was interested and willing to get a little bit technical, they were able to build their own applications, have it run on a data applications platform, and create this very fast feedback loop to share with stakeholders, get feedback.
In a lot of the cases, these proof-of-concept applications expanded into something much bigger. Within just the first two weeks of launching our data application platform, we had over seven applications running on it. Of those seven, two of them eventually made it into production where they’re adding value and optimizing operations and creating a more delightful client experience. With the introduction of our data applications platform, our LLM platform was also starting to come together. This is a very high-level diagram of what it looks like. In the first row, we have a lot of our contextual data, our knowledge bases, is being ingested through our Airflow DAGs to our embedding models, and then updated and indexed in Elasticsearch. We also chose LangChain to orchestrate our data applications, which sits very closely with both our data applications platform and our LLM service. Then we have the API for our LLM gateway through our LLM service, tightly integrated within our production environments.
As our LLM platform came together, we started also building internal tools that we thought would be very powerful for employee productivity. At the end of 2023, we built a tool we called Boosterpack, which combines a lot of the reusable building blocks that I mentioned earlier. The idea of Boosterpack is we wanted to provide a personal assistant grounded against Wealthsimple context for all of our employees. We want to run this on our cloud infrastructure with three different types of knowledge bases, the first being public knowledge bases, which was accessible to everyone at the company, with source code, help articles, and financial newsletters. The second would be a private knowledge base for each employee where they can store and query their own personal documents.
The third is a limited knowledge base that can be shared with a limited set of coworkers, delineated by role and projects. This is what we call the Wealthsimple Boosterpack. I have a short video of what it looks like. Boosterpack was one of the applications we actually built on top of our data applications platform. In this recording, I’m uploading a file, a study of the economic benefits of productivity through AI, adding this to a private knowledge base for myself. Once this knowledge base is created, I can leverage the chat functionality to ask questions about it. Alongside the question answering functionalities, we also provided a source, and this was really effective, especially when it came to documents as a part of our knowledge bases. You could actually see where the answer was sourced from, and the link would take you there, so if you wanted to do any fact checking or further reading.
LLM Journey (2024)
2023 ended with a lot of excitement. We rounded the year off by introducing our LLM gateway, introducing self-hosted models, providing a RAG API, and building a data applications platform. We ended the year off by building what we thought would be like one of our coolest internal tools ever. We were in a bit of a shock when it came to 2024. This graph, this is our Gartner’s hype cycle, which maps out the evolution of expectations and changes when it comes to emerging technologies. This is very relevant, especially for generative AI, which in 2023 for most of us, we were entering this peak of inflated expectations. We were so excited about what LLMs could do for us. We weren’t exactly sure in concrete ways where the business alignment came from, but we had the confidence, we wanted to make big bets in this space.
On the other hand, as we were entering 2024, it was sobering for us as a company and for the industry as a whole too. We realized that not all of our bets had paid off. That in some cases, we may have indexed a little bit too much into investments for generative AI, or building tools for GenAI. What this meant for us, for Wealthsimple in particular was, our strategy evolved to be a lot more deliberate. We started focusing a lot more on the business alignment and on how we can get business alignment with our generative AI applications. There was less appetite for bets. There was less appetite for, let’s see what happens if we swap this out for one of the best performing models. We became a lot more deliberate and nuanced in our strategy as a whole. In 2024, we actually spent a big chunk of time at the beginning of the year just going back to our strategy, talking to end users, and thinking really deeply about the intersection between generative AI and the values our business cared about.
The first thing we actually did as a part of our LLM journey concretely in 2024 was we unshipped something we built in 2023. When we first launched our LLM gateway, we introduced the nudge mechanisms, which were the gentle Slack reminders for anyone not using our gateway. Long story short, it wasn’t working. We found very little evidence that the nudges were affecting and changing behavior. People who are getting nudged, it was the same people getting nudged over again, and they became conditioned to ignore it. Instead, what we found was that improvements to the platform itself was a much stronger indicator for behavioral changes. We got rid of these mechanisms because they weren’t working and they were just causing noise.
Following that, in May of this year, we started expanding the LLM providers that we wanted to offer. The catalyst for this was Gemini. Around that time, Gemini had launched their 1 million token context window models, and this was later replaced by the 2-plus million ones. We were really interested to see what this could do for us and how it could circumvent a lot of our previous challenges with the context window limitations. We spent a lot of time thinking about the providers we wanted to offer, and building the foundations and building blocks to first introduce Gemini, but eventually other providers as well. A big part of 2024 has also been about keeping up with the latest trends in the industry.
In 2023, a lot of our time and energy were spent on making sure we had the best state-of-the-art model available on our platform. We realized that this was quickly a losing battle, because the state-of-the-art models were evolving. They were changing every week or every few weeks. That strategy shifted in 2024 where instead of focusing on the models themself, we took a step back and focused higher level on the trends. One of the emerging trends to come out this year was multi-modal inputs. Who knew you could have even less friction-full mediums of interacting with generative AI? Forget about text, now we can send a file or a picture. This was something that caught on really quickly within our company. We started out by leveraging Gemini’s multi-modal capabilities. We added a feature within our gateway where our end users could upload either an image or a PDF, and the LLM would be able to drive the conversation with understanding what was being sent.
Within the first few weeks of launching this, close to a third of all of our end users started leveraging a multi-modal feature at least once a week. One of the most common use cases we found was when people were running into issues with our internal tools, when they were running into program errors, or even errors working with our BI tool. As humans, if you’re a developer, and someone sends you a screenshot of their stack trace, that’s an antipattern. We would want to get the text copy of it. Where humans offered very little patience for that sort of things, LLMs embraced it. Pretty soon, we were actually seeing behavioral changes in the way people communicate, because LLM’s multi-modal inputs made it so easy to just throw a screenshot, throw a message. A lot of people were doing it fairly often. This may not necessarily be one of the good things to come out of it, but the silver lining is we did provide a very simple way for people to get the help they needed in the medium they needed.
Here is an example of an error someone encountered when working with our BI tool. This is a fairly simple error. If you asked our gateway, I keep running into this error message while refreshing MySQL dashboard, what does this mean? It actually provides a fairly detailed list of how to diagnose the problem. No, of course, you could get the same results by just copying and pasting it, but for a lot of our less technical users, it’s a little bit hard sometimes to distinguish the actual error message from the full result.
After supporting multi-modal inputs, the next thing we actually added to our platform was Bedrock. Bedrock was a very interesting addition, because this marked a shift in our build versus buy strategy. Bedrock is AWS’s managed service for interacting with foundational large language models, and it also provides the ability to deploy and fine-tune these models at scale. There was a very big overlap between everything we’ve been building internally and what Bedrock had to offer. We had actually considered Bedrock back in 2023 but said no to it, in favor of building up a lot of these capabilities ourselves. Our motivation at that time was so that we could build up the confidence, the knowhow internally, to deploy these technologies at scale. With 2024 being a very different year, this was also a good inflection point for us, as we shifted and reevaluated our build versus buy strategy.
The three points I have here on the slides are our top considerations when it comes to build versus buy. The first is that we have a baseline requirement for security and privacy. If we wanted to buy something, they need to meet that. The second is the consideration of time to market and cost. Then, third, this was something that changed a lot between 2023 and 2024, was in considering and evaluating our unique points of leverage, otherwise known as the opportunity cost of building something, as opposed to buying it. There were a lot of trends that drove the evolution of these strategies and this thinking. The first was that vendors and LLM providers, their security awareness got a lot better over time. LLM providers were offering mechanisms for zero-day data retention. They were becoming a lot more integrated with cloud providers. They had learned a lot from the risks and the pitfalls of the previous year to know that consumers cared about these things.
The second trend that we’ve seen, and this was something that affected us a lot more internally, is that as we got a better understanding of generative AI, it also meant we had a better understanding of how to apply it in ways to add value, to increase business alignment. Oftentimes, getting the most value out of our work is not by building GenAI tools that exist on the marketplace. It’s by looking deeply into what we need as a business and understanding and evaluating the intersections with generative AI there. Both of these points actually shifted our strategy to what was initially very build focus, to being a lot more buy focused. The last point I’ll mention which makes this whole thing a lot more nuanced is that, over the past year to two years, a lot more vendors, both existing and new, are offering GenAI integrations. Almost every single SaaS product has an AI add-on now, and they all cost money.
One analogy we like to use internally is, this is really similar to the streaming versus cable paradigm, where, once upon a time, getting Netflix was a very economical decision when contrasted against the cost of cable. Today, with all of the streaming services, you can easily be paying a lot more for that than what you had initially been paying for cable. We found ourselves running into a similar predicament when evaluating all of these additional GenAI offerings provided by our vendors. All that is to say is the decision for build versus buy has gotten a lot more nuanced today than it was even a year ago. We’re certainly more open to buying, but there are a lot of considerations on making sure we’re buying the right tools that add value and not just providing duplicate value.
After adopting Bedrock, we turned our attentions to the API that we offered for interacting with our LLM gateway. When we first put together our gateway, when we first shipped our gateway, when we first offered this API, we didn’t think too deeply about what the structure would look like, and this ended up being a decision that we would regret. As OpenAI’s API specs became the gold standard, we ran into a lot of headaches with integrations. We had to monkey patch and rewrite a lot of code from LangChain and other libraries and frameworks because we didn’t offer a compatible API structure. We took some time in September of this year to ship v2 of our API, which did mirror the OpenAI’s API specs. The lesson we learned here was that it’s important to think about, as this industry, as the tools and frameworks within GenAI matures, how those providers were thinking about like, what is the right standard and the right integrations?
Learnings
This brings us a lot closer to where we are today, and over the past few years, although our platform, our tools, and these landscapes have changed a lot. We’ve also had a lot of learnings along the way. Alongside these learnings, we also gain a better understanding of how people use these tools and what they use them to do. I wanted to share some statistics that we’ve gathered internally on this usage. The first is that, there is, at least within Wealthsimple, a very strong intersection between generative AI and productivity.
In the surveys and the client interviews we did, almost everyone who used LLMs found it to significantly increase or improve their productivity. This is more of a qualitative measure. We also found that LLM gateway adoption was fairly uniform across tenure and level. It’s a fairly even split between individual contributors and people leaders. This was great affirmation for us, because we had spent a lot of time in building a tool and a platform that was very much bottoms-up driven. This was good affirmation that we were offering these tools that were genuinely delightful and frictionless for our end users.
In terms of how we were leveraging LLMs internally. This data is a few months outdated, but we actually spent some time annotating a lot of the use cases. The top usage was for programming support. Almost half of all of the usage was some variation of debugging, code generation, or just general programming support. The second was content generation/augmentation, so, “Help me write something. Change the style of this message. Complete what I had written”.
Then the third category was information retrieval. A lot of this was focused around research or parsing documents. What’s interesting is that almost everything, all the use cases we saw, basically fell within these three buckets, there was very little use case outside. We also found that about 80% of our LLM usage came through our LLM gateway. This is not going to be a perfectly accurate measure, because we don’t have a comprehensive list of all of the direct LLM accesses out there, but only about 20% of our LLM traffic hit the providers directly, and most of it came through the gateway. We thought this was pretty cool. We also learned a lot of lessons in behavior. One of our biggest takeaways this year was that, as our LLM tooling became more mature, we learned that our tools are the most valuable when injected in the places we do work, and that the movement of information between platforms is a huge detractor. We wanted to create a centralized place for people to do their work.
An antipattern to this would be if they needed seven different tabs open for all of their LLM or GenAI needs. Having to visit multiple places for generative AI is a confusing experience, and we learned that even as the number of tools grew, most people stuck to using a single tool. We wrapped up 2023 thinking that Boosterpack was going to fundamentally change the way people leverage this technology. That didn’t really happen. We had some good bursts in adoption, there were some good use cases, but at the end of the day, we actually bifurcated our tools and created two different places for people to get their GenAI needs. That was detrimental for both adoption and productivity. The learning from here is that we need to be a lot more deliberate about the tools we build, and we need to put investments into centralizing a lot of these toolings. Because even though this is what people said they wanted, even though this intuitively made sense, user behavior for these tools is a tricky thing, and that will often surprise us.
GenAI Today
Taking all of these learnings, I wanted to share a little bit more about generative AI today at Wealthsimple, how we’re using it, and how we’re thinking about it going into 2025. The first is that, in spite of the pitfalls we’ve made, overall, Wealthsimple really loves LLMs. Across all the different tools we offer, over 2200 messages get sent daily. Close to a third of the entire company are weekly active users. Slightly over half of the company are monthly active users. Adoption, engagement for these tools is really great.
At the same time, the feedback that we’re hearing is that it is helping them be more productive. We also learn all the lessons, all of the foundations and the guardrails that we learned and developed for employee productivity also paves the way to providing a more delightful client experience. These internal tools establish the building blocks to build and develop GenAI at scale, and they’re giving us the confidence to find opportunities to optimize operations for our clients. By providing the freedom for anyone at the company to freely and securely explore this technology, we had a lot of organic extensions and additions that involve generative AI, a lot of which we had never thought of before. As of today, we actually do have a lot of use cases, both in development and in production, that are optimizing operations. I wanted to share one of them. This is what our client experience triaging workflow used to look like. Every single day, we get a lot of tickets, both through text and through phone calls from our clients.
A few years ago, we actually had a team dedicated to reading all of these tickets and triaging them. Which team should these tickets be sent to so that the clients can get their issue resolved? Pretty quickly, we realized this is not a very effective workflow, and the people on this team, they didn’t enjoy what they were doing. We developed a transformer-based model to help with this triage. This is what we’re calling our original client experience triaging workflow. This model would only work for emails. It would take the ticket and then map it to a topic and subtopic. This classification would determine where this ticket gets sent to. This was one of the areas which very organically extended into generative AI, because the team working on it had experimented with the tools that we offered. With our LLM platform, there were two improvements that were made.
The first is that by leveraging Whisper, we could extend triaging to all tickets, not just emails. Whisper would transcribe any phone calls into text first, and then the text would be passed into the downstream system. Generations from our self-hosted LLMs were used to enrich the classification, so we were actually able to get huge performance boosts, which translated into so many hours saved by both our client experience agents and our clients directly themselves through these improvements in performance. Going back to this hype chart, 2023 we were climbing up that peak of inflated expectations. 2024 was a little bit sobering as we made our way down. Towards the end of this year, and as we’re headed into next year, I think we’re on a very good trajectory to ascend that slope of enlightenment. Even with the ups and downs over the past two years, there’s still a lot of optimism, and there’s still a lot of excitement for what next year could hold.
Questions and Answers
Participant 1: When it comes to helping people realize that putting lots of personal information into an LLM is not necessarily a safe thing to do, how did you help ensure that people weren’t sharing those compromising data from a user education standpoint?
Gu: I think there’s two parts to this. One is that we found, over the years of introducing new tools, that good intentions are not necessarily enough. That we couldn’t just trust people. That we couldn’t just focus on education. That we actually needed the guardrails and the mechanisms within our system to guide them to ensure they make the right decision, outside of just informing them about what to do. That was one part to our philosophy. I think, to your point, definitely leveling up that understanding of the security risk was very important. Being a financial services company, we work with very sensitive information for our clients. As a part of our routine training, there’s a lot of education already about like, what is acceptable to share and what is not acceptable to share. The part that was very hard for people to wrap their heads around is what happens when this information is being shared directly with OpenAI, for instance, or in a lot of cases like fourth-party data sharing.
For instance, Slack has their AI integration. Notion has their AI integration. What does that mean? To an extent, it does mean all of this information will get sent to the providers directly. That was the part that was really hard for people to wrap their heads around. This is definitely not a problem that we’ve solved, but some of the ways that we’ve been trying to raise that awareness is through onboarding. We’ve actually added a component for all employee onboarding that includes guidelines for proper AI usage. We’ve added a lot more education for leaders and individuals in the company who may be involved in the procurement process for new vendors, and the implications that may have from a security point of view.
Participant 2: What consisted of the data platform, and how did you use that in your solution?
Gu: There’s definitely a very close intersection between our data platform and our machine learning platform. For instance, one of the bread and butters to our data platform is our orchestration framework through Airflow. That was something we use to update the embeddings within our vector database and make sure it was up to date with our knowledge bases. Outside of that, when it comes to exploration, and especially for our data scientists, as they’re building new LLM and ML products, there’s a very close intersection between the data we have available in our data warehouse and the downstream use cases. I would call those two out as the biggest intersections.
Participant 3: Early in the conversation, you talked about Elasticsearch as your vector database capability for similarity search for RAG purposes. Later, you talked about transitioning to Bedrock. Did you keep Elasticsearch, or did you get off of that when you transitioned to Bedrock?
Gu: We didn’t get off of that. Actually, we’re using OpenSearch, which is AWS’s managed version of Elasticsearch. At the time we chose OpenSearch/Elasticsearch, because it was already part of our stack. It was easy to make that choice. We didn’t go into it thinking that this would be our permanent choice. We understand this is a space that evolves a lot. Right now, Bedrock is still fairly new to us. We’re primarily using it to extend our LLM provider offerings, specifically for Anthropic models. We haven’t dug or evaluated as deeply, like their vector database or like their fine-tuning or their other capabilities. I think that’s definitely one of the things we want to dig deeper into for 2025 as we’re looking into what an evolution the next iteration of our platform would look like.
Participant 3: Are you happy with the similarity results that you’re getting with OpenSearch?
Gu: I think we are. I think studies have shown that this is usually not the most effective way of doing things from a performance and relevancy perspective, at least. Where we’re really happy with it is like, one, it’s easy to scale. Latency is really good. It’s just overall simple to use. I think depending on the use cases, like maybe using a reranker, or leveraging a different technique may be better suited, depending on the use case.
See more presentations with transcripts

MMS • RSS
Posted on mongodb google news. Visit mongodb google news
MCLEAN, Va., April 21, 2025 /PRNewswire/ — Appian (Nasdaq: APPN) today announced the appointment of Serge Tanjga to the position of Chief Financial Officer, effective as of May 27, 2025. He will report directly to Matt Calkins, CEO of Appian.
Serge Tanjga is announced as Chief Financial Officer, and brings more than 20 years of financial experience to Appian.
Tanjga brings more than 20 years of financial experience to Appian. He was the Senior Vice President of Finance at MongoDB where he led financial planning, strategic finance, business operations, and analytics; and most recently, served as MongoDB’s interim CFO.
Prior to MongoDB, Serge was a Managing Director at Emerging Sovereign Group, a subsidiary of The Carlyle Group. Tanjga also held leadership positions at the Harvard Management Company and 40 North Industries. He received a B.A. in Mathematics and Economics from Harvard College and an MBA from Harvard Business School, where he was a Baker Scholar.
About Appian
Appian is The Process Company. We deliver a software platform that helps organizations run better processes that reduce costs, improve customer experiences, and gain a strategic edge. Committed to client success, we serve many of the world’s largest companies across industries. For more information, visit appian.com. [Nasdaq: APPN]
Follow Appian: LinkedIn, X (Twitter)
View original content to download multimedia:https://www.prnewswire.com/news-releases/appian-appoints-serge-tanjga-as-chief-financial-officer-302432971.html
SOURCE Appian
Article originally posted on mongodb google news. Visit mongodb google news

MMS • RSS
Posted on mongodb google news. Visit mongodb google news
MCLEAN, Va.
, April 21, 2025 /PRNewswire/ — Appian (Nasdaq: APPN) today announced the appointment of Serge Tanjga to the position of Chief Financial Officer, effective as of May 27, 2025. He will report directly to Matt Calkins, CEO of Appian.
Serge Tanjga is announced as Chief Financial Officer, and brings more than 20 years of financial experience to Appian.
Tanjga brings more than 20 years of financial experience to Appian. He was the Senior Vice President of Finance at MongoDB where he led financial planning, strategic finance, business operations, and analytics; and most recently, served as MongoDB’s interim CFO.
Prior to MongoDB, Serge was a Managing Director at Emerging Sovereign Group, a subsidiary of The Carlyle Group. Tanjga also held leadership positions at the Harvard Management Company and 40 North Industries. He received a B.A. in Mathematics and Economics from Harvard College and an MBA from Harvard Business School, where he was a Baker Scholar.
About Appian
Appian is The Process Company. We deliver a software platform that helps organizations run better processes that reduce costs, improve customer experiences, and gain a strategic edge. Committed to client success, we serve many of the world’s largest companies across industries. For more information, visit appian.com. [Nasdaq: APPN]
Follow Appian: LinkedIn, X (Twitter)
View original content to download multimedia:https://www.prnewswire.com/news-releases/appian-appoints-serge-tanjga-as-chief-financial-officer-302432971.html
SOURCE Appian
Article originally posted on mongodb google news. Visit mongodb google news

MMS • RSS
Posted on mongodb google news. Visit mongodb google news
Shares of MongoDB Inc (MDB, Financial) fell 4.53% in mid-day trading on Apr 21. The stock reached an intraday low of $151.40, before recovering slightly to $152.05, down from its previous close of $159.26. This places MDB 60.73% below its 52-week high of $387.19 and 8.01% above its 52-week low of $140.78. Trading volume was 853,435 shares, 31.7% of the average daily volume of 2,691,845.
Wall Street Analysts Forecast
Based on the one-year price targets offered by 34 analysts, the average target price for MongoDB Inc (MDB, Financial) is $287.89 with a high estimate of $520.00 and a low estimate of $170.00. The average target implies an upside of 89.34% from the current price of $152.05. More detailed estimate data can be found on the MongoDB Inc (MDB) Forecast page.
Based on the consensus recommendation from 38 brokerage firms, MongoDB Inc’s (MDB, Financial) average brokerage recommendation is currently 2.0, indicating “Outperform” status. The rating scale ranges from 1 to 5, where 1 signifies Strong Buy, and 5 denotes Sell.
Based on GuruFocus estimates, the estimated GF Value for MongoDB Inc (MDB, Financial) in one year is $432.89, suggesting a upside of 184.7% from the current price of $152.05. GF Value is GuruFocus’ estimate of the fair value that the stock should be traded at. It is calculated based on the historical multiples the stock has traded at previously, as well as past business growth and the future estimates of the business’ performance. More detailed data can be found on the MongoDB Inc (MDB) Summary page.
This article, generated by GuruFocus, is designed to provide general insights and is not tailored financial advice. Our commentary is rooted in historical data and analyst projections, utilizing an impartial methodology, and is not intended to serve as specific investment guidance. It does not formulate a recommendation to purchase or divest any stock and does not consider individual investment objectives or financial circumstances. Our objective is to deliver long-term, fundamental data-driven analysis. Be aware that our analysis might not incorporate the most recent, price-sensitive company announcements or qualitative information. GuruFocus holds no position in the stocks mentioned herein.
Article originally posted on mongodb google news. Visit mongodb google news

MMS • Craig Risi
Article originally posted on InfoQ. Visit InfoQ

In February 2025, researchers at Socket uncovered a significant supply chain attack within the Go programming ecosystem. A malicious package, named github.com/boltdb-go/bolt, was discovered impersonating the legitimate and widely-used BoltDB module. This backdoored package exploited the Go Module Proxy’s caching mechanism to persist undetected for years, underscoring vulnerabilities in module management systems.
The Go Module Proxy is designed to cache modules indefinitely to ensure consistent and reliable builds. While this immutability offers benefits like reproducible builds and protection against upstream changes, it also presents a risk: once a malicious module is cached, it remains available to developers, even if the source repository is cleaned or altered. In this incident, the attacker leveraged this feature to maintain the presence of the backdoored package within the ecosystem, despite subsequent changes to the repository.
This case is part of a broader trend where attackers exploit package management systems through techniques like typosquatting. Similar incidents have been observed in other ecosystems, such as npm and PyPI, where malicious packages mimic popular libraries to deceive developers.
To reduce the risk of supply chain attacks, developers should carefully verify package names and sources before installation, ensuring they’re using trusted libraries. Regular audits of dependencies can help catch signs of tampering or malicious behavior early. Security tools that flag suspicious packages offer another layer of protection, and staying up to date with known vulnerabilities and ecosystem alerts is essential for maintaining safe development practices.
By adopting these practices, developers can enhance the security of their software supply chains and reduce the risk of introducing malicious code into their projects.