Category: Uncategorized

MMS • RSS
Posted on mongodb google news. Visit mongodb google news
MongoDB, Inc. (NASDAQ:MDB – Get Free Report) saw a significant increase in short interest in March. As of March 15th, there was short interest totalling 3,040,000 shares, an increase of 84.2% from the February 28th total of 1,650,000 shares. Currently, 4.2% of the company’s stock are sold short. Based on an average daily volume of 2,160,000 shares, the days-to-cover ratio is presently 1.4 days.
Analyst Upgrades and Downgrades
A number of equities research analysts recently commented on the company. Wedbush decreased their target price on MongoDB from $360.00 to $300.00 and set an “outperform” rating for the company in a report on Thursday, March 6th. UBS Group set a $350.00 price objective on shares of MongoDB in a report on Tuesday, March 4th. Robert W. Baird cut their price objective on shares of MongoDB from $390.00 to $300.00 and set an “outperform” rating on the stock in a research note on Thursday, March 6th. DA Davidson boosted their target price on shares of MongoDB from $340.00 to $405.00 and gave the company a “buy” rating in a research report on Tuesday, December 10th. Finally, Macquarie cut their price target on MongoDB from $300.00 to $215.00 and set a “neutral” rating on the stock in a research report on Friday, March 7th. Seven analysts have rated the stock with a hold rating and twenty-three have assigned a buy rating to the company. According to MarketBeat.com, the company has a consensus rating of “Moderate Buy” and an average target price of $320.70.
View Our Latest Analysis on MDB
MongoDB Trading Down 5.6 %
MDB stock opened at $178.03 on Monday. The company has a market capitalization of $14.45 billion, a P/E ratio of -64.97 and a beta of 1.30. The business has a 50 day simple moving average of $245.56 and a 200-day simple moving average of $265.95. MongoDB has a twelve month low of $173.13 and a twelve month high of $387.19.
MongoDB (NASDAQ:MDB – Get Free Report) last issued its quarterly earnings data on Wednesday, March 5th. The company reported $0.19 earnings per share (EPS) for the quarter, missing the consensus estimate of $0.64 by ($0.45). The business had revenue of $548.40 million for the quarter, compared to analysts’ expectations of $519.65 million. MongoDB had a negative return on equity of 12.22% and a negative net margin of 10.46%. During the same period in the previous year, the business posted $0.86 earnings per share. Research analysts predict that MongoDB will post -1.78 EPS for the current fiscal year.
Insider Activity at MongoDB
In other news, insider Cedric Pech sold 287 shares of MongoDB stock in a transaction on Thursday, January 2nd. The shares were sold at an average price of $234.09, for a total transaction of $67,183.83. Following the completion of the sale, the insider now directly owns 24,390 shares in the company, valued at approximately $5,709,455.10. The trade was a 1.16 % decrease in their ownership of the stock. The transaction was disclosed in a filing with the SEC, which is accessible through the SEC website. Also, CAO Thomas Bull sold 169 shares of the company’s stock in a transaction on Thursday, January 2nd. The shares were sold at an average price of $234.09, for a total value of $39,561.21. Following the completion of the transaction, the chief accounting officer now directly owns 14,899 shares in the company, valued at approximately $3,487,706.91. The trade was a 1.12 % decrease in their ownership of the stock. The disclosure for this sale can be found here. Insiders have sold a total of 43,139 shares of company stock valued at $11,328,869 over the last quarter. 3.60% of the stock is owned by corporate insiders.
Hedge Funds Weigh In On MongoDB
Hedge funds and other institutional investors have recently bought and sold shares of the company. B.O.S.S. Retirement Advisors LLC purchased a new stake in shares of MongoDB in the 4th quarter worth about $606,000. Geode Capital Management LLC lifted its holdings in shares of MongoDB by 2.9% in the third quarter. Geode Capital Management LLC now owns 1,230,036 shares of the company’s stock valued at $331,776,000 after purchasing an additional 34,814 shares in the last quarter. Union Bancaire Privee UBP SA acquired a new stake in shares of MongoDB in the fourth quarter valued at approximately $3,515,000. Nisa Investment Advisors LLC increased its stake in shares of MongoDB by 428.0% during the 4th quarter. Nisa Investment Advisors LLC now owns 5,755 shares of the company’s stock worth $1,340,000 after purchasing an additional 4,665 shares in the last quarter. Finally, HighTower Advisors LLC raised its position in shares of MongoDB by 2.0% during the 4th quarter. HighTower Advisors LLC now owns 18,773 shares of the company’s stock worth $4,371,000 after purchasing an additional 372 shares during the last quarter. 89.29% of the stock is owned by institutional investors and hedge funds.
MongoDB Company Profile
MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.
Featured Stories
Receive News & Ratings for MongoDB Daily – Enter your email address below to receive a concise daily summary of the latest news and analysts’ ratings for MongoDB and related companies with MarketBeat.com’s FREE daily email newsletter.
Article originally posted on mongodb google news. Visit mongodb google news
Java News Roundup: Jakarta EE 11 and Spring AI Updates, WildFly 36 Beta, Infinispan, JNoSQL

MMS • Michael Redlich
Article originally posted on InfoQ. Visit InfoQ

This week’s Java roundup for March 24th, 2025 features news highlighting: updates for Jakarta EE 11 and Spring AI; the first beta release of WildFly 36.0; the third alpha release of Hibernate Search 8.0; the March 2023 release of Open Liberty; and point releases for Quarkus, Infinispan, JHipster and OpenXava.
OpenJDK
JEP 503, Remove the 32-bit x86 Port, has been elevated from Proposed to Target to Targeted for JDK 25. This JEP proposes to “remove the source code and build support for the 32-bit x86 port.” This feature is a follow-up from JEP 501, Deprecate the 32-bit x86 Port for Removal, delivered in JDK 24.
JDK 25
Build 16 of the JDK 25 early-access builds was also made available this past week featuring updates from Build 15 that include fixes for various issues. More details on this release may be found in the release notes.
For JDK 25, developers are encouraged to report bugs via the Java Bug Database.
Jakarta EE
In his weekly Hashtag Jakarta EE blog, Ivar Grimstad, Jakarta EE Developer Advocate at the Eclipse Foundation, provided an update on Jakarta EE 11 and Jakarta EE 12, writing:
The Release Review for Jakarta EE 11 Web Profile has started! According to the process, it will conclude on April 7 at the latest. When I write this, seven out of ten members have voted “+1,” which means that super-majority is reached and Jakarta EE 11 Web Profile in practice has passed the release review.
So, what about the Jakarta EE 11 Platform? The status, as of Wednesday [March 26, 2025] is that we are down to ~50 test failures, most of which pass for Jakarta EE 11 Web Profile. This is an indication that there is some configuration or setting for the CI jobs that may be the problem.
Plan reviews for the component specifications targeting Jakarta EE 12 are ongoing. So far eight specifications have completed, or are in the process of completing their plan reviews. More are expected to follow as we get closer to April 15, the deadline communicated by the Jakarta EE Platform project. Check out the Jakarta EE 12 Plan Reviews Project Board for a complete overview.
The road to Jakarta EE 11 included four milestone releases, the release of the Core Profile in December 2024, and the potential for release candidates as necessary before the GA releases of the Web Profile in 1Q 2025 and the Platform in 2Q 2025.
Eclipse JNoSQL
The release of Eclipse JNoSQL 1.1.6, the compatible implementation of the Jakarta NoSQL and Jakarta Data specifications, provides bug fixes, performance improvements and new features such as: a new GraphTemplate
interface that supports NoSQL Graph databases; and enhancement to CDI Lite for improved performance and compatibility. More details on this release may be found in the release notes.
BellSoft
In conjunction with the release of JDK 24 and GraalVM for JDK 24, BellSoft has also released version 24.2.0 of Liberica Native Image Kit. Enhancements include: experimental support for the jcmd
diagnostic tool on Linux and macOS that complements the existing Native Image monitoring capabilities such as the JDK Flight Recorder (JFR).
Spring Framework
The Spring AI team has posted important changes and updates for using version 1.0.0-SNAPSHOT. These include artifact IDs, dependency management and autoconfiguration. The most significant change is the naming pattern for Spring AI starter artifacts: for model starters, the spring-ai-{model}-spring-boot-starter
artifact has been renamed to spring-ai-starter-model-{model}
; for vector store starters, the spring-ai-{store}-store-spring-boot-starter
artifact has been renamed to spring-ai-starter-vector-store-{store}
; and for MCP starters, the spring-ai-mcp-{type}-spring-boot-starter
artifact has been renamed to spring-ai-starter-mcp-{type}
.
The Spring AI team offers two methods for developers to update their projects: update automatically using AI tools or update manually.
Quarkus
Versions 3.21.0 and 3.20.0 of Quarkus (announced here and here, respectively), the former designated as a new LTS version, with bug fixes, dependency upgrades and new features such as: support for the MongoDB Client extension in their TLS Registry; and enable the Jakarta RESTful Web Services ClientRequestFilter
interface run on the same Vert.x context as other handlers to resolve a context propagation issue with blocking REST Clients. More details on this release may be found in the release notes.
Open Liberty
IBM has released version 25.0.0.3 of Open Liberty with new features such as: the ability to configure a shared library using a new configuration element, path
, that complement the existing file
, folder
and fileset
configuration elements in an XML file; and compliance with FIPS 140-3, Security Requirements for Cryptographic Modules, for the IBM SDK, Java Technology Edition 8.
WildFly
The first beta release of WildFly 36.0.0 delivers big fixes, dependency upgrades and enhancements such as: the jboss.as.jpa.classtransformer
persistence unit is now enabled by default for improved performance; and a warning is now logged should more than one metrics system be enabled. More details on this release may be found in the release notes.
Hibernate
The third alpha release of Hibernate Search 8.0.0 ships with: an alignment with Hibernate ORM 7.0.0.Beta5 that implements the Jakarta Persistence 3.2 specification; and a migration to the Hibernate Models ClassDetailsRegistry
interface, based on the Jandex index, to replace the deprecated getJandexView()
method defined in the BootstrapContext
interface. Further details on this release may be found in the release notes.
Infinispan
The release of Infinispan 15.2.0.Final, codenamed Feelin’ Blue, ships with bug fixes, many dependency upgrades and new features such as: an implementation of the Redis JSON API; and a new look and feel to the console based on the recent upgrade to PatternFly 6. More details on this release may be found in the release notes.
Apache Software Foundation
Apache TomEE 10.0.1, the first maintenance release, provides dependency upgrades and resolutions to notable issues such as: Jakarta Expression Language expressions in Jakarta Faces not working with Eclipse Mojarra, the compatible implementation of Jakarta Faces specification; and the addition of the missing service-jar.xml
file in the Serverless Builder API and Embedded Scenarios due to the file being omitted from the BOMs when the TomEE webapp
was removed. More details on this release may be found in the release notes.
JHipster
The release of JHipster Lite 1.30.0 ships with bug fixes, improvements in documentation and new features such as: the use of colors to identify modules by rank; and a new display to filter the rank options in the frontend. More details on this release may be found in the release notes.
OpenXava
The release of OpenXava 7.5 released delivers bug fixes, dependency upgrades and new features such as: support for hot code reloading during development without affecting performance in production; and UI improvements that include rounded corners for various widgets; and a flat design applied to most UI elements, thus removing shadows. More details on this release may be found in the release notes.

MMS • Robert Krzaczynski
Article originally posted on InfoQ. Visit InfoQ

Google DeepMind has announced the launch of TxGemma, an open collection of AI models designed to enhance the efficiency of drug discovery and clinical trial predictions. Built on the Gemma model family, TxGemma aims to streamline the drug development process and accelerate the discovery of new treatments.
The development of new therapeutics is a slow, costly process that often faces a high rate of failure—90% of drug candidates do not progress past phase 1 trials. TxGemma seeks to address this challenge by utilizing large language models (LLMs) to enhance the prediction of therapeutic properties across the entire research pipeline. From identifying promising drug targets to assessing clinical trial outcomes, TxGemma equips researchers with advanced tools to streamline and improve the efficiency of drug development.
Jeremy Prasetyo, co-founder & CEO of TRUSTBYTES, highlighted the significance of AI-driven explanations in drug research:
AI that explains its own predictions is a game-changer for drug discovery—faster insights mean faster breakthroughs in patient care.
TxGemma is the successor to Tx-LLM, a model introduced last October for therapeutic research. Due to overwhelming interest from the scientific community, DeepMind has refined and expanded its capabilities, developing TxGemma as an open-source alternative with enhanced performance and scalability.
Trained on 7 million examples, TxGemma comes in three sizes—2B, 9B, and 27B parameters—with specialized Predict versions tailored for critical therapeutic tasks. These include:
- Classification – Predicting whether a molecule can cross the blood-brain barrier.
- Regression – Estimating drug binding affinity.
- Generation – Inferring reactants from chemical reactions.
In benchmark tests, the 27B Predict model outperformed or matched specialized models on 64 of 66 key tasks. More details on the results are available in the published paper.
In addition to its predictive models, TxGemma-Chat offers an interactive AI experience, allowing researchers to pose complex questions, receive detailed explanations, and engage in multi-turn discussions. This capability helps clarify the reasoning behind predictions, such as explaining why a molecule may be toxic based on its structure.
To make TxGemma adaptable to specific research needs, Google DeepMind has released a fine-tuning example Colab notebook, allowing researchers to adjust the model for their own data.
In addition to its predictive models, Google DeepMind is introducing Agentic-Tx, which integrates TxGemma into multi-step research workflows. By combining TxGemma with Gemini 2.0 Pro, Agentic-Tx utilizes 18 specialized tools to enhance research capabilities.
Agentic-Tx has been tested on benchmarks like Humanity’s Last Exam and ChemBench, showing its ability to assist with complex research tasks that require reasoning across multiple steps.
TxGemma is now available on Vertex AI Model Garden and Hugging Face, where researchers and developers can experiment with the models, use fine-tuning tools, and provide feedback.

MMS • RSS
Posted on mongodb google news. Visit mongodb google news
MongoDB, Inc. (NASDAQ:MDB – Get Free Report) was the target of a large increase in short interest in the month of March. As of March 15th, there was short interest totalling 3,040,000 shares, an increase of 84.2% from the February 28th total of 1,650,000 shares. Approximately 4.2% of the company’s shares are short sold. Based on an average daily trading volume, of 2,160,000 shares, the short-interest ratio is currently 1.4 days.
Insider Transactions at MongoDB
In other MongoDB news, CAO Thomas Bull sold 169 shares of the stock in a transaction dated Thursday, January 2nd. The shares were sold at an average price of $234.09, for a total transaction of $39,561.21. Following the completion of the transaction, the chief accounting officer now owns 14,899 shares of the company’s stock, valued at $3,487,706.91. This trade represents a 1.12 % decrease in their ownership of the stock. The sale was disclosed in a filing with the SEC, which is available through the SEC website. Also, insider Cedric Pech sold 287 shares of the business’s stock in a transaction that occurred on Thursday, January 2nd. The shares were sold at an average price of $234.09, for a total transaction of $67,183.83. Following the completion of the sale, the insider now directly owns 24,390 shares in the company, valued at $5,709,455.10. This trade represents a 1.16 % decrease in their ownership of the stock. The disclosure for this sale can be found here. Insiders sold 43,139 shares of company stock worth $11,328,869 in the last 90 days. Insiders own 3.60% of the company’s stock.
Hedge Funds Weigh In On MongoDB
Large investors have recently modified their holdings of the business. Strategic Investment Solutions Inc. IL acquired a new position in MongoDB during the fourth quarter worth $29,000. Hilltop National Bank boosted its position in shares of MongoDB by 47.2% during the 4th quarter. Hilltop National Bank now owns 131 shares of the company’s stock worth $30,000 after purchasing an additional 42 shares in the last quarter. NCP Inc. acquired a new position in shares of MongoDB during the 4th quarter worth $35,000. Brooklyn Investment Group purchased a new position in shares of MongoDB in the 3rd quarter valued at about $36,000. Finally, Continuum Advisory LLC lifted its stake in shares of MongoDB by 621.1% in the third quarter. Continuum Advisory LLC now owns 137 shares of the company’s stock valued at $40,000 after purchasing an additional 118 shares during the period. 89.29% of the stock is owned by hedge funds and other institutional investors.
Wall Street Analysts Forecast Growth
A number of research analysts have commented on the company. Wedbush decreased their price target on MongoDB from $360.00 to $300.00 and set an “outperform” rating on the stock in a research note on Thursday, March 6th. Royal Bank of Canada decreased their target price on MongoDB from $400.00 to $320.00 and set an “outperform” rating on the stock in a research note on Thursday, March 6th. JMP Securities reiterated a “market outperform” rating and set a $380.00 price target on shares of MongoDB in a research note on Wednesday, December 11th. Bank of America lowered their price objective on shares of MongoDB from $420.00 to $286.00 and set a “buy” rating for the company in a research note on Thursday, March 6th. Finally, China Renaissance started coverage on shares of MongoDB in a research note on Tuesday, January 21st. They set a “buy” rating and a $351.00 target price on the stock. Seven research analysts have rated the stock with a hold rating and twenty-three have issued a buy rating to the stock. According to MarketBeat.com, the stock presently has an average rating of “Moderate Buy” and a consensus target price of $320.70.
Read Our Latest Analysis on MDB
MongoDB Stock Down 5.6 %
NASDAQ:MDB opened at $178.03 on Friday. The company has a market capitalization of $13.26 billion, a PE ratio of -64.97 and a beta of 1.30. MongoDB has a 52-week low of $173.13 and a 52-week high of $387.19. The stock’s 50 day moving average price is $245.56 and its 200 day moving average price is $266.34.
MongoDB (NASDAQ:MDB – Get Free Report) last announced its earnings results on Wednesday, March 5th. The company reported $0.19 EPS for the quarter, missing analysts’ consensus estimates of $0.64 by ($0.45). The business had revenue of $548.40 million during the quarter, compared to analyst estimates of $519.65 million. MongoDB had a negative net margin of 10.46% and a negative return on equity of 12.22%. During the same period in the previous year, the firm earned $0.86 earnings per share. As a group, sell-side analysts anticipate that MongoDB will post -1.78 EPS for the current fiscal year.
About MongoDB
MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.
Recommended Stories
This instant news alert was generated by narrative science technology and financial data from MarketBeat in order to provide readers with the fastest and most accurate reporting. This story was reviewed by MarketBeat’s editorial team prior to publication. Please send any questions or comments about this story to contact@marketbeat.com.
Before you consider MongoDB, you’ll want to hear this.
MarketBeat keeps track of Wall Street’s top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis. MarketBeat has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on… and MongoDB wasn’t on the list.
While MongoDB currently has a Moderate Buy rating among analysts, top-rated analysts believe these five stocks are better buys.

MarketBeat has just released its list of 20 stocks that Wall Street analysts hate. These companies may appear to have good fundamentals, but top analysts smell something seriously rotten. Are any of these companies lurking around your portfolio?
Article originally posted on mongodb google news. Visit mongodb google news

MMS • Edin Kapic
Article originally posted on InfoQ. Visit InfoQ

Microsoft’s latest ASP.NET Core 10 Preview 2 release from March 18th introduces targeted improvements to Blazor’s navigation behavior, OpenAPI documentation generation, and developer tooling, addressing community feedback. The update focuses on small enhancements rather than new features, refining existing capabilities ahead of .NET 10’s stable release later this year.
The most relevant change in Preview 2 is the revamping of the navigation system in Blazor to eliminate jarring user experience issues. When using NavigateTo
for same-page navigations (e.g., query string updates), the browser will no longer forcibly scroll to the top. In previous versions, this was a behavior that developers had to work around manually.
The NavLink
component also sees improvements, now ignoring query strings and fragments by default when matching using NavLinkMatch.All
. This means a link will be active even if the query strings or fragments in the URL change. An AppContext
switch reverts to the legacy behavior for teams needing backwards compatibility. For custom matching behaviour, NavLink
now exposes an overridable ShouldMatch
method.
Blazor’s reconnection UI, visible when the client loses WebSocket connection to the server, receives structural upgrades in the project template, with a new ReconnectModal
component that separates CSS and JavaScript for stricter Content Security Policy (CSP) compliance. A custom components-reconnect-state-changed
event provides finer-grained control over connection states, including a new “retrying” phase.
API developers gain built-in support for propagating XML source code comments into OpenAPI documents. The feature requires enabling documentation file generation in the project file (GenerateDocumentationFile
property) and moves comment processing to compile time via a source generator. However, minimal API endpoints must now use named methods rather than lambdas to leverage this feature, which can be seen as a trade-off for better discoverability.
The underlying OpenAPI.NET library is updated to version 2.0.0-preview7
, introducing breaking changes for advanced users. Schema definitions now use interfaces, and the Nullable
property for the schema is replaced with JsonSchemaType.Null
checks.
The Preview 2 also introduces minor quality of life improvements:
- Blazor
QuickGrid
control adds aCloseColumnOptionsAsync
method for programmatically dismissing column menus, simplifying filter interactions. - Form binding now treats empty strings as null for nullable types, matching minimal API behavior elsewhere.
- New authentication metrics for Aspire dashboard track sign-in/sign-out events and authorization attempts, while request duration telemetry helps identify performance bottlenecks.
Preview 2 is available now via the .NET 10 Preview SDK. Developers should test navigation changes and OpenAPI integration, with particular attention to the breaking schema modifications. The team expects the .NET 10 to be released around November 2025.
AWS Enhances EC2 Capacity Reservation Management with Split, Move, and Modify Features

MMS • Steef-Jan Wiggers
Article originally posted on InfoQ. Visit InfoQ
AWS has announced updates to Amazon EC2 On-Demand Capacity Reservations (ODCRs), introducing split, move, and modify functionalities to improve resource management and cost efficiency. These new features are designed to give users greater control over their reserved EC2 capacity, addressing the dynamic needs of modern cloud deployments.
The split feature enables users to divide existing ODCRs, creating new reservations from unused capacity. This allows for more granular resource allocation, significantly when demand fluctuates. Instead of maintaining a large, underutilized reservation, users can now create smaller, more targeted reservations.
(Source: AWS News blog post)
The move capability allows users to transfer unused capacity slots between existing ODCRs. This optimizes resource utilization across different reservations, preventing wasted capacity and reducing costs. Users can reallocate resources to where they are needed most, improving overall efficiency.
The modify feature allows users to change reservation attributes without disrupting running workloads. Users can adjust the instance quantity, switch between targeted and open reservation types, and modify the reservation’s end date. This eliminates the need to create new reservations for minor adjustments and reduces operational overhead.
“These new capabilities provide you with powerful tools for improved capacity management and resource usage, leading to more efficient operations and cost savings,” states the AWS blog post, highlighting the benefits of these updates. The enhancements aim to improve capacity management, offering greater flexibility and control. By optimizing resource utilization and minimizing disruptions, users can achieve cost savings and improve the overall efficiency of their EC2 deployments.
The company writes in the blog post:
The ability to dynamically adjust and share Capacity Reservations provides the flexibility you need while maintaining the stability necessary for your critical workloads.
While cloud providers like Microsoft Azure and Google Cloud Platform (GCP) offer similar capacity reservation mechanisms, the specific features differ. Azure’s Reserved Virtual Machine Instances (Reserved VMs) and GCP’s Committed Use Discounts (CUDs) provide cost savings for committed compute usage. However, the newly introduced AWS “split, move, and modify” features offer more granular control over reservations than Azure and GCP’s standard offerings.

MMS • Meryem Arik
Article originally posted on InfoQ. Visit InfoQ

Transcript
Arik: I’ve called this, navigating LLM deployment: tips, tricks, and techniques 2.0. I could also rename it, how to deploy LLMs if you don’t work at Meta, OpenAI, Google, Mistral, or Anthropic. I’m specifically interested in how you deploy LLMs if you’re not serving it as a business, and instead you’re serving it so you can build applications on top of it, and you end up deploying it in fairly different ways if you work at a normal company versus one of these guys.
Hopefully you’re going to get three things out of this session. Firstly, you’re going to learn when self-hosting is right for you, because you’re going to find out it can be a bit of a pain, and it’s something you should only do if you really need to. Understanding the differences between your deployments and the deployments of AI Labs. Then, also, I’m going to give some best practices, tips, tricks, techniques, a non-exhaustive list for how to deploy AI in corporate and enterprise environments. Essentially, we build infrastructure for serving LLMs.
Evaluating When Self-Hosting is Right for You
Firstly, when should you self-host? To explain what that is, I’ll just clarify what I mean by self-hosting. I distinguish self-hosting compared to interacting with LLMs through an API provider. This is how you interact with LLMs through an API provider. They do all of the serving and hosting for you. It’s deployed on their GPUs, not your GPUs. They manage all of their infrastructure, and what they expose is just an API that you can interact with. That’s what an API hosted model is. All of those companies I mentioned at the beginning host these models for you. Versus being self-hosted. When you self-host, you’re in control of the GPUs. You take a model from Hugging Face or wherever you’re taking that model from, and you deploy it and you serve it to your end users. This is the broad difference. It’s essentially a matter of who owns the GPUs and who’s responsible for that serving infrastructure. Why would you ever want to self-host? Because when people manage things for you, your life is a little bit easier.
There are three main reasons why you’d want to self-host. The first one is you have decreased costs. You have decreased costs when you’re starting to scale. At the very early stages of POCing and trying things out, you don’t have decreased costs. It’s actually much cheaper to use an API provider where you pay per token, and the per token price is very low. Once you get to any kind of scale where you can fully utilize a GPU or close to it, it actually becomes much more cost efficient. The second reason why you’d want to self-host is improved performance. This might sound counterintuitive, because on all the leading benchmarks, the GPT models and the Claude models are best-in-class to those benchmarks.
However, if you know your domain and you know your particular use case, you can get much better performance when self-hosting. I’m going to talk about this a bit more later. This is especially true for embedding models, search models, reranker models. The state of the art for most of them is actually in open source, not in LLMs. If you want the best of the best breed models, you’ll have a combination of self-hosting for some models and using API providers for others. You can get much better performance by self-hosting. Some privacy and security. I’m from Europe, and we really care about this. We also work with regulated industries here in the U.S., where you have various reasons why you might want to deploy it within your own environment. Maybe you have a multi-cloud environment, or maybe you’re still on-prem. This sits with the data that a16z collected, that there’s broadly three reasons why people self-host: control, customizability, and cost. It is something that a lot of people are thinking about.
How do I know if I fall into one of those buckets? Broadly, if I care about decreased cost, that’s relevant to me if I’m deploying at scale, or it’s relevant to me if I’m able to use a smaller specialized model for my task run than a very big general model like GPT-4. If I care about performance, I will get improved performance if I’m running embedding or reranking workloads, or if I’m operating in a specialized domain that might benefit from fine-tuning. Or I have very clearly defined task requirements, I can often do better if I’m self-hosting, rather than using these very generic models.
Finally, on the privacy and security things, you might have legal restrictions, you’ll obviously then have to self-host. Potentially, you might have region-specific deployment requirements. We work with a couple clients who, because of the various AWS regions and Azure regions, they have to self-host to make sure they’re maintaining that sovereignty in their deployments. Then, finally, if you have multi-cloud or hybrid infrastructure, that’s normally a good sign that you need to self-host. A lot of people fall into those buckets, which is why the vast majority of enterprises are looking into building up some kind of self-hosting infrastructure, not necessarily for all of their use cases, but it’s good to have as a sovereignty play.
I’m going to make a quick detour. Quick public service announcement that I mentioned embedding models, and I mentioned that the state of the art for embedding models is actually in the open-source realm, or they’re very good. There’s another reason why you should almost always self-host your embedding models. The reason why is because you use your embedding models to create your vector database, and you’ve indexed vast amounts of data. If that embedding model that you’re using through an API provider ever goes down or ever is depreciated, you have to reindex your whole vector database. That is a massive pain. You shouldn’t do that. They’re very cheap to host as well. Always self-host your embedding models.
When should I self-host versus when should I not? I’ve been a little bit cheeky here. Good reasons to self-host. You’re building for scale. You’re deploying in your own environment. You’re using embedding models or reranker models. You have domain-specific use cases. Or my favorite one, you have trust issues. Someone hurt you in the past, and you want to be able to control your own infrastructure. That’s also a valid reason. Bad reasons to self-host is because you thought it was going to be easy. It’s not easier, necessarily, than using API providers. Or someone told you it was cool. It is cool, but that’s not a good reason. That is why. That’s how you should evaluate whether self-hosting is right for you. If you fall into one of these on the left, you should self-host. If not, you shouldn’t.
Understanding the Difference between your Deployments and Deployments at AI Labs
Understanding the difference between your deployments and the deployments in AI Labs. If I’m, for example, an OpenAI, and I’m serving these language models, I’m not just serving one use case. I’m serving literally millions of different use cases. That means I end up building my serving stack very differently. You, let’s say I’m hosting with an enterprise or corporate environment. Maybe I’m serving 20 use cases, like in a more mature enterprise. Maybe I’m serving now just a couple. Because I have that difference, I’m able to make different design decisions when it comes to my infrastructure. Here are a couple reasons why your self-hosting regime will be very different to the OpenAI self-hosting regime. First one is, they have lots of H100s and lots of cash.
The majority of you guys don’t have lots of H100s, and are probably renting them via AWS. They’re more likely to be compute bound because they’re using the GPUs like H100s, rather than things like A10s. They have very little information about your end workload, so they are just going on a, we’re just trying to stream tokens out for arbitrary workloads. You have a lot more information about your workload. They’re optimizing for one or two models, which means they can do things that just don’t scale for regular people, self-hosting. If I’m deploying just GPT-4, then I’m able to make very specific optimizations that only work for that model, which wouldn’t work anywhere else.
If you’re not self-hosting, you’re likely using cheaper, smaller GPUs. You’re probably using also a range of GPUs, so not just one type, maybe you’re using a couple different types. You want it to be memory bound, not compute bound. You have lots of information about your workload. This is something that is very exciting, that for most enterprises the workloads actually look similar, which is normally some kind of long-form RAG or maybe extraction task, and you can make decisions based on that. You’ll have to deal with dozens of model types, which is a luxury. You just don’t have the luxury of those AI Labs. Here are some differences between the serving that you’ll do, and the serving that your AI Labs will have to do.
Learning Best Practices for Self-Hosted AI Deployments in Corporate and Enterprise Environments
Best practices, there are literally an infinite number of best practices that I could give, but I’ve tried to boil them down to I think it’s now six non-exhaustive tips for self-hosting LLMs. This isn’t a complete guide to self-hosting, but hopefully there are some useful things that we’ve learned over the last couple years that might be useful to you. The first one is, know your deployment boundaries and work backwards. Quantized models are your friend. Getting batching right really matters. Optimize for your workload. That goes back to what I said earlier about, you can do optimizations that they just can’t. Be sensible about the models you use. Then, finally, consolidate infrastructure within your org. These are the top tips that I’m going to talk through now for the rest of the session.
1. Deployment Boundaries
Let’s start with deployment boundaries. Assuming you don’t have unlimited compute, and even AI Labs don’t have unlimited compute. It’s a really scarce resource at the moment. You need to be very aware of what your deployment requirements are, and then work backwards from what you know about them. If you know you only have a certain available hardware, maybe you’re, for example, just CPU bound, and you’re deploying completely on-prem, and you don’t have GPUs, then you probably shouldn’t be looking into deploying a Llama 4 billion or 5 billion. Knowing that boundary is important from the get go. You should have an idea of your target latency as well. You should have an idea of your expected load.
If you have all of these together, and you can construct some sentence of, I would like to deploy on an instance cheaper than x, which will serve y concurrent users in an average latency of less than z. If you can form that sentence, it makes everything else that you have to do much easier, and you won’t have that bottleneck of being like, “I built this great application. I just have no idea how to deploy it”. I can use that information to then work backwards and figure out what kind of models I should be looking at and how much effort I should be putting into things like prompting and really refining my search techniques, rather than upgrading to bigger models.
2. Quantize
Which leads me on to my second tip, which is, you should pretty much always be using quantized models, for a couple different reasons. The first reason I’m going to say is that you should always use quantized models, more or less, is because the accuracy is pretty much always better than a model of that same memory requirement that you’ve quantized it down to, so you retain a lot of that accuracy. The second reason that you should pretty much always quantize is actually the accuracy loss isn’t that different from the original model. I’ll reference two pieces of research that show this. There was this paper that came out in 2023, by Tim Dettmers, amongst others, who’s a bit of a legend in the field, and it’s called, “The case for 4-bit precision”. What he showed in this paper, actually, the highlight figure, is that for a fixed model bit size, the accuracy of the model is far higher if you’re using a quantized model.
Given that we know when we have a model with more parameters, as I scale up the parameters, the accuracy of the model goes up. What’s interesting is, if I take one of those large models and quantize it down to a natively smaller size, it retains a lot of that accuracy that we had in the beginning, which is very good. I’m pretty much always going to have better performance using a quantized 70 billion parameter model than I will of a model of a native size. This goes on to some great research that Neural Magic did on this. They showed that, firstly, if I quantize models, so here they have the accuracy recovery. This is when they take original bit precision models and then quantize them. It pretty much retains all of the accuracy from the original model, which is great. You get like 99% accuracy maintenance, which is awesome. You can see that even though there are slight dips in the performance, so, for example, if I look at this 405 here, with one of the quantized variants, it’s a slight dip in the original, like a couple of basis points.
It’s still far higher than the 70 billion parameter or the 8 billion parameter model. It retains much of that accuracy that you were looking for. If you know your deployment boundaries, and you don’t have unlimited amounts of compute to call from, using that quantized model, using the best model that will fit within that is a great piece of advice. If I know that this is the piece of infrastructure I’m working with, this is the GPU I’m working with, I can then say, ok, which is the biggest model that when quantized down to 4-bit, is going to perform the best?
In this case, at least a couple months old. Mixtral is not really a thing anymore, but you get the idea that I would rather use the Mixtral 4-bit, which is 22-gig, than the Llama 13B, because it’s retained a lot of that performance. I put at the bottom places that you can find quantized models. We retain an open-source bank of quantized models that you can check out. TheBloke also used to be a great source of this as well, although he’s not doing it as much recently.
3. Batching Strategies
The third thing I’m going to talk about is batching strategies. This is something that’s very important to get right. The reason it’s very important to get right is because it’s a very easy way to waste GPU resources. When you’re first deploying, you’re probably not using any batching, which means you end up with GPU utilization coming like this. It’s not great. We then see people going straight to dynamic batching. What this means is I have a set batch size, so let’s say my batch size is 16. I will wait until 16 requests have come in to process, or I’ll wait a fixed amount of time, and you end up with this spiky GPU utilization, which is still not great either. What’s far better if you’re deploying generative models, a very big piece of advice is to use something like continuous batching, where you can get consistently high GPU utilization.
It’s a state-of-the-art technique designed for batching of generative models, and it allows requests to interrupt long in-flight requests. This batching happens at the token level rather than happening at the request level. I could, for example, have my model generate the 65th token of one response, and then the fifth token of another response, and I end up with a utilization that is far more even. This is just one of the examples of inference optimizations that you can do that make a really big difference. Here, I think I’ve gone from a 10% utilization to near enough 80. Really significant improvements. If you’re in a regime where you don’t have much compute resource, this is a very valuable thing to do.
4. Workload Optimizations
I’m going to talk about workload optimizations. It’s something that we’ve been researching a lot actively, and we think is really promising. You know something that the AI Labs don’t know, which is you know what your workload looks like. What that means is you can make a lot of decisions based on what your workload looks like, that it would just never, ever make sense for them to make. I’m going to give you a couple examples of techniques that you can use that make sense if you know what your workload looks like, that don’t make sense if you’re serving big multi-tenant environments and multiple user groups. One of them is prefix caching. This is probably one of the things that I’m most excited by that happened this year.
One of the most common use cases that we see from our clients is situations where they have really long prompts, and often these prompts are shared between requests. Maybe what I have is I have a very long context RAG on a set document that maybe I just pass through the whole document and I’m asking questions of it. Or maybe I have a situation where I have a very long set of user instructions to the LLM. These are instances where I have very long prompts. Traditionally, what I would have to do is I’d have to reprocess that prompt every single request, which is super inefficient. What we can do, if you know what your workload looks like, and you know that you’re in a regime where you have a lot of shared prompts, or very long shared prompts, you can use something called prefix caching, which is essentially when you pre-compute the KV cache of text and you reuse that when the context is reused, when you do your next generation.
My LLM doesn’t need to reprocess that long prompt every single time, they can process just the difference between them. If I have a very long-shared prompt and then slightly different things each time, it can just process that difference and then return. On the right, I have some results that are fresh off the press. What we have here is our server with the prefix caching turned on and turned off. In the green lines, we have it turned off. The light green line is with two GPUs, and the dark green line is with one GPU. With the blue line, we have it with the prefix caching turned on. What you can see is we have very significant throughput improvement. It’s throughput on the Y and then batch size on the X. About 7x higher throughput, which is very significant. It means that you can process many more requests, or you can process those requests much cheaper.
This has like no impact, if you didn’t know that you had a long-shared prompt. It doesn’t really have an impact if I’m serving multiple dozens of users at any one time. This only really works if I know what my use case looks like. Another example of caching that we are really excited by. We call this internally, SSD, which is a form of speculative decoding, but you can also think of it like caching, which only makes sense if you know what your workload looks like. This is a use case where, if your model might predict similar tokens between tasks. Let’s say I’m doing a task where I have phrases that are very frequently repeated, what I can do is I can essentially cache those very frequently repeated phrases.
Instead of computing the whole phrase every single time, token by token, I can just pull a cache hit and inference on that. We did benchmarking on this, and for a workload of the JSON extractive workload, we got about a 2.5 latency decrease on this, which is pretty significant. Again, would have literally no impact if I’m using the OpenAI API, because there’s no way they could cache your responses, and 75 million other people’s responses. It only really makes sense if you’re deploying in your own environment.
5. Model Selection
I have two more that I’m going to go through. This one is model selection. I think everyone knows this by now, but the only real reason I put it up is because last time people really liked the image. This is something that I still see people getting wrong, which is, they get their large language models, which are the most difficult things to deploy and the most expensive things to deploy, to do pretty much all of the workload. That’s not a good idea. The reason it’s not a good idea is because you then have more of this difficult, expensive thing to manage. There are a lot of parts to an enterprise RAG pipeline. In fact, the most important part of your enterprise RAG pipeline is actually not the LLM at all. The most important part is your embedding and retrieval part, by far. They don’t need GPT-4 level reasoning, obviously.
If you have really good search and retrieval pipelines, you can actually use much smaller models to get the same results. We’ve seen clients move from GPT-4 type setups to move to things as small as the Llama 2 billion models, if they get their retrieval part right, which means you can run the whole application very cheaply. I would advise people to check out the Gemma and the small Llama models for this. We think they’re very good. Very good in extractive workloads as well, especially if you’re using a JSON structured format like Outlines or LMFE. Pick your application design carefully and only use big models when you really need to use big models.
6. Infrastructure Consolidation
Finally, I’m going to talk about infrastructure consolidation. In traditional ML, most teams actually work in silos still, unfortunately, which is, they deploy their own model and they’re responsible for deploying their own model. In GenAI, it does not make sense to do that, and that’s why these API providers are making an absolute killing, because it does make sense to deploy once, manage centrally, and then provide it as a shared resource within the organization. If you do this, it enables your end developers to focus on building this application level rather than building infrastructure. When I speak to my clients, sometimes we talk about building an internal version of the OpenAI API. That’s the experience you want to be able to provide to your developers, because otherwise they’ll have to learn what prefix caching is every single time they want to deploy a model, which is just not an efficient use of time.
I think this kind of centralization is a really exciting opportunity for MLOps teams to take that ownership within the org. This still means you can understand the kind of workloads you work with and optimize for, for example, long context RAG situations. That’s still something that you can do. You can actually end up making a cheaper and better version of something like the OpenAI API. Serving is hard. Don’t do it twice or even three. We’ve seen clients who do it over and again. You’ll have a team deploying with Ollama, another team deploying with vLLM, and trying to do it. Much better to have a central team manage that. GPUs are very expensive, don’t waste them.
I can talk about what this would actually look like in practice. This central consolidation of infrastructure, you can think of it as having like an internal Bedrock, or an internal OpenAI API, or something like that. Individual ML teams can work from these APIs and then build applications on top of them using techniques like LoRA, so that’s a form of fine-tuning, or they use things like RAG, where they’re plugging into it in the same way they use the OpenAI API. We have a quick case study, so a client of ours did this. They started off with a bunch of different models. Each app was essentially attached to its own model and attached to its own GPU setup. This is obviously not ideal, because I’m deploying, for example, at the time, it was a Mixtral. We were deploying that multiple times, over and again and wasting GPU resources.
Instead, what we did is we just pooled it into one and deployed that Mixtral on a centrally hosted infrastructure, which meant we could use much less GPU resource than we were using otherwise. Something that I like the pattern of, if you’re doing this central deployment and central hosting, is give your team access to a couple different models of different sizes, give them optionality. Don’t say something like, you can only use this particular model, because they want to be able to try things. Deploying a large, medium, and small model that you guys have checked out and that you’re happy with is a really good place to start with a bunch of maybe those auxiliary models, so table parses, embedding models, reranker models. If you do that, you give your team optionality, and you still maintain the benefit of being able to consolidate that infrastructure and make those optimizations.
There’s my six non-exhaustive tips for self-hosting LLMs. Know your deployment boundaries and work backwards. If you do know this, everything else becomes much easier. Quantized models are your friends. They might seem scary because they affect the model, but they’re actually, on the whole, much better. Getting batching right really matters for your GPU utilization. Optimize for your particular workload. You have a superpower that the API providers do not, which is, you know what your workload will look like. Be sensible about the models you use. Then, finally, consolidate infrastructure within your organization.
Summary
We’ve gone through, why you should self-host. There’s a bunch of infrastructural reasons why self-hosting is a great idea, for deploying at scale. If you want to use embedding models, reranker models, if you, for example, are using domain-specific models, all very good reasons to self-host. Your needs in deploying will be different to the needs of AI Labs and mass AI API providers, because you’re using different types of hardware and your workloads are different. I’ve also given six tips, best practices that we’ve learned over the last couple years for deploying LLMs in this kind of environment.
Questions and Answers
Luu: For those tips that you gave, if I have a bursty workload, are any of those tips more relevant than others?
Arik: If you have a bursty workload, it’s rough. Because there’s always going to be some cold start problem when you’re deploying these. For bursty workloads, they’re slightly less well suited for self-hosting, unless you’re deploying smaller models. If you’re deploying smaller models, the scaling up and scaling down is much easier, but if you’re deploying bigger models, consider whether you do need to actually self-host, or whether you can use other providers. We, within our infrastructure, have obviously built Kubernetes resources that can do that scaling up and scaling down.
If you have bursty workloads with very low latency requirements, that is just challenging. What you could do, and what we have seen people do is, if they have a bursty workload, migrating to smaller and cheaper models during the periods of high burst and then going back to the higher performance models when things are a bit more chill. You could imagine a regime where you had a 70 billion parameter model servicing most of the requests, and then let’s say you get to periods of very high load, while you’re waiting to scale up, you move to something like an 8 billion parameter model or something slightly smaller. You take a small accuracy hit, and then go from there. That’s something that we’ve seen work as well.
Participant 1: How would you architect a central compute infra, if each downstream application requires different fine-tuning, where you can’t really have an internal OpenAI API like serving infrastructure?
Arik: That’s very difficult if you want to do something like full fine-tuning, which we’ve been used to over the last couple years, which essentially you fine-tune the whole model. Because what that means is, every single time you fine-tune the model, you have to deploy a separate instance of it. Fortunately, over the last two years, there’s been advancements in PEFT methods, which is essentially a way that you can fine-tune, but you don’t have to fine-tune the whole model. You can just fine-tune a small subsection of it. Let’s say I have a situation where I’m centrally hosting and I have 10 different teams, and every single team has done a fine-tuned version of a Llama model, for example. What I used to have to do is go in and deploy each of those on their own GPUs or shared GPUs. What I can do now instead is just deploy a centrally hosted Llama model, and then deploy on the same GPU still, in my same server, all of these different LoRA fine-tuned versions. These small adapters that we add to all of the models.
On inference time, we can hot swap them in and out. It’s like you’re calling separate models, but actually it’s one base model and these different adapters. That’s really exciting, because it means we can deploy hundreds if not thousands of fine-tuned, domain-specific models with the exact same infrastructure and resource that we need to deploy one. That’s the way I would go about it, PEFT methods. We prefer LoRA as a method, but there are other available on the market.
Participant 2: You’re speaking a lot about the models themselves, which is awesome, but in your diagrams, when you’re referring to GPUs, and all the work that you did, should we assume most of this is based on HB100 Blackwell, NVIDIA, or did you actually go across Gaudi and Instinct from AMD and so on? Just curious, because we’re looking to build something. Support for all models is not ubiquitous right now. Where did you stop and start with how far your suggestions go here, on some of that hardware level?
Arik: We very rarely see clients deploying on like H100 or B100s. The kinds of clients that we work with are enterprise and corporates, which might have access to the previous generations of GPUs or maybe slightly cheaper GPUs. We tend to do most of our benchmarking on those kinds of GPUs. We support NVIDIA and AMD. We’re rolling out support for Intel as well. We have seen good results on AMD. We think they’re a really interesting option. Then for Intel, we’re waiting for the software support to catch up there. We also know that like Inferentia are doing interesting things, and the TPUs are doing interesting things as well. There’s definitely more out there than the B100s and H100s that you see.
For most enterprises, you really don’t need them, and that you’ll probably find with previous generations of NVIDIA hardware. My recommendation would be to stick with NVIDIA, mainly because the software stack is a bit more evolved. I would build your infrastructure and applications such that you can move to AMD or you can move to other providers in a couple years, because that gap is going to close pretty quickly, so you don’t want to tie yourself to like an NVIDIA native everything.
Participant 2: We’re using OpenAI off of Azure, so we’re using NVIDIA, but I’m looking to build something out locally, because we’ll balance loads between cloud and on-prem for development, or even just spilling over extra load. Keeping mind open to everything.
Arik: One of our selling points is that we are hardware agnostic, so we can deploy on not just NVIDIA, but others as well. The conversation always goes something like, can I deploy on AMD? We’re like, yes. Do you use AMD? No, but I might want to. It’s a good thing to keep your options open there.
Participant 3: Could you elaborate on some of the batching techniques?
Arik: Let’s assume we’re not going to do no batching, and we’re going to do some kind of batching. If I’m doing dynamic batching, what’s happening is this is batching on a request level. Let’s say my batch size is 16. I’m going to wait for 16 requests to come in, and then I’m going to process them. If I’m in a period of very low traffic, I’ll also have some time as well, so maybe either 16 or a couple seconds, and then I’ll process that batch. What that means is I get this spiky workload where I’m waiting for long requests to finish and stuff like that. What we can do instead is this continuous batching. What this allows you to do is it allows you to interrupt very long responses with other responses as well. This means that we can do essentially like token level processing rather than request level.
I might have a situation where I have a very long response, and I can interrupt that response every so often to process shorter responses as well. It means that my GPU is always at very high utilization, because I’m constantly feeding it new things to do. We end up with continuous batching, which is much better GPU utilization.
Luu: Just token level.
Participant 4: How does interrupting and giving it a different task improve GPU utilization? Why not just continue with the current task, gives you the peak utilization at all times.
Arik: Essentially, it’s because otherwise you have to wait for that whole request to finish, and the GPU is able to process things in parallel, so it doesn’t actually affect the quality of the output at all. It just means that each pass, I can just have it predict different tokens from different requests. It’s constantly being fed with new things, rather than waiting for the longest response to finish, and it’s only working on that one response while we’re waiting for a new batch.
Participant 4: You’re suggesting I can put multiple requests in at the same time, in a single request to the GPU.
Arik: Exactly. What you end up with is your latency, on average, is a little bit higher, but my throughput is much better. If I batch these requests together and process them together, I end up getting much better throughput than if I was just to process them one by one.
Participant 4: When I do give it a long-running request, what is GPU doing if it’s not being utilized 100% percent? What is taking that long? Why is the request taking that long while GPU is not being utilized? What is going on in that setup?
Arik: When the GPU is not being utilized?
Participant 4: Yes. You’re saying, I’m putting in a long-running request, so it is running. What is running if the GPU is not being utilized at the same time to maximum capacity?
Arik: It’s essentially doing each request one by one, so I have to wait until the next one. I have a fairly good article, I think Baseten wrote, which also was very good in this.
See more presentations with transcripts

MMS • Steef-Jan Wiggers
Article originally posted on InfoQ. Visit InfoQ

Confluent Cloud for Apache Flink has introduced Flink Native Inference, Flink search, and built-in ML functions, offering a unified Flink SQL solution. These new features aim to simplify real-time AI development by eliminating workflow fragmentation, reducing costs, and enhancing security.
The new capabilities follow up on the GA release of the service last year. Adi Polak, a director at Confluent, elaborated on the pain points that Flink Native Inference, Flink search, and the built-in ML functions address for enterprise developers building AI applications:
Today, developers must use separate tools and languages to work with AI models and data processing pipelines, leading to complex and fragmented workflows. This results in a data pipeline sprawl, with data spread across many databases and systems. This lack of data integration and orchestration can lead to inefficiencies, increased networking and compute costs, and more development time, making it challenging to scale AI applications and see immediate value from AI investments.
The new features tackle these pain points by offering a unified, Flink SQL-based solution. Flink Native Inference allows any open-source AI model to run directly within Confluent Cloud. Polak explains:
These open-source and fine-tuned AI models are hosted inside Confluent Cloud, so there’s no need to download external files or manage the GPUs to run them. Instead of sending data to a remote model endpoint, Native Inference brings state-of-the-art open-source AI models to customers’ private data, allowing them to build event-driven AI systems using Flink SQL easily.
This approach eliminates network hops, enhances security by keeping data within the platform, and lowers latency for faster inference, which is crucial for real-time AI insights.
Flink search simplifies data enrichment across multiple vector databases, while the built-in ML functions make advanced data science capabilities accessible to a broader range of developers.
Regarding security, cost efficiency, and operational overhead, Polak emphasized the benefits of Confluent’s unified approach:
By unifying data processing and AI workloads, Confluent decreases organizations’ operational overhead by making it easier to develop and deploy AI applications faster. With Confluent Cloud’s fully-managed AI offering, companies can save costs with a hybrid orchestration of data processing workloads, more efficient use of CPUs and GPUs within a pre-defined compute budget, and benefit from having no cloud ingress or egress fees. There is also enhanced security since data never leaves Confluent Cloud, and proprietary data is not shared with third-party vendors or model providers.
According to the company, the new capabilities remove the complexity and fragmentation that have traditionally hindered real-time AI development – reducing the complexity of building real-time AI applications within enterprise environments.

MMS • RSS
Posted on mongodb google news. Visit mongodb google news

Tesla’s stock price (TSLA) fell in latest intraday trading while attempting to retest the main upward trend line that was breached previously, as the price also tests the resistance of $271.00, amid negative pressure due to trading below the 50-day SMA, coupled with negative signals from the Stochastic after reaching overbought levels compared to the price’s movements, hinting at a negative divergence, which would heap more negative pressures on the stock.
Therefore we expect more losses for the stock, targeting the support of $217.00, provided the resistance of $271.00 holds on.
Today’s price forecast: Bearish
Article originally posted on mongodb google news. Visit mongodb google news
Presentation: Data, Drugs, and Disruption: Leading High-performance Company in Drug Development

MMS • Olga Kubassova
Article originally posted on InfoQ. Visit InfoQ

Transcript
Kubassova: This is not an engineering talk, it’s about leadership and how you can make business out of your engineering background. Normally I speak to doctors about tech, and now I’m going to speak about business to tech people, so it’s quite intimidating. I’m going to ask you, have you ever thought of starting a business, ever? Have you started a business? Have you ever tried to start a business? Are you still in that business? Anybody running a business? We have a few examples.
I’ll start from a very long journey from being an entrepreneur, and explain how you can get from being an engineer, a mathematician, whatever technical discipline you have in mind, doesn’t even have to be technical, to be an entrepreneur. Then how to take an idea into something tangible, like a business. This is my personal perspective. My only job over all these years was a CEO. I started as a CEO. I run the company as a CEO. I just hired a CEO for the company, so now I’m president. I don’t have a formal business training. I don’t have a formal CEO training. Everything I say is a mixture of what I have experienced in the Harvard Business Review, which is my Bible. Then I’ll tell you a little bit about what inspires, and challenges, and a little bit of my personal perspective on leadership. Again, if it’s not conventional, just take it as it is. This is how I run a business. If you run your business, you’re probably going to have something of your own.
From Engineer to Entrepreneur
I’ll start with where I come from, because I think the foundation of who you are will stay with you for a very long time. I was born in the Soviet Union, you could see by the red scarf. I did not get to the red scarf stage. That’s given to you when you’re slightly older. I think what’s important here is, a huge area of the Soviet Union was united by a few principles. I think those are very powerful principles, and they go across the business, the leadership, the technical, the non-technical, and they are equality. It didn’t really matter if you were a woman, or a man, young, old, you were expected to just do and deliver as much as everybody else.
If there is a big hole, you take a shovel, and you continue digging, it doesn’t matter who you are. The second principle was fearless. There were really no fears, there were a lot of challenges. People were united. People were doing something without really questioning, is it the right thing, is it the wrong thing? Maybe it’s not always good, but that was the principle, which was ingrained in our mentality. The last one was focus. From a very early age, you are told that you are the engineer, or a doctor, or a lawyer, or whatever it was. As a principle of that, you would go deeper into that discipline. Even the educational system is set up in such a way that it doesn’t give you a broad standard, but gives you a very focused, very determined way to think about the problem. I was a mathematician from the age of 7, which continued with me throughout.
From my Kazakhstan place of birth, I went to St. Petersburg, to State University of Physics and Mathematics. This is one of the extracts they give you after you finish, I think it’s like 5,000, 6,000 hours of pure mathematics. That’s what I’m talking about, focus. You didn’t have much of a philosophy, which probably is not a good thing, because once you get yourself focused on engineering, you lose that aspect of everything else. In the previous talk, it was a little bit about engineering versus the world. In some way, any discipline versus everything else is not really a positive quality. You really want to be multidimensional and multifaceted in your approach, because you, as a leader, will end up dealing with people who are not like you. In fact, if you are surrounded by people who are like you, that’s probably not a good thing. Maybe you picked the wrong team and nobody is challenging your ideas. Once you recognize it, you can actually amend for it and you can think about other ways of dealing with the situation.
However, I think recognizing it is very important. To my second slide, I think some of the leadership lessons, having really hard work. You can’t survive it on your own. You have to survive it in small groups. As you could imagine, doing something very specific, very focused for 5,000 hours, you have to find friends. You have to find your teams. You have to split yourself in small groups and tackle big challenges one by one in small teams, and get together, collate your effort, present it to the lecturer.
An interesting story about St. Petersburg State University is they accept everybody to the first level. There are six years of mathematics. The first level, everybody is accepted. 250 people got in. They allow you to take exams as many times as you like. Every six months, there is an exam. You can take them, four times officially, but as many as your lecturer would tolerate to take it. Only 5% of those people graduate. What does it tell us? It’s stamina. It’s very difficult to pass all these exams, and they question and you fail, and everything. I remember that once, when you’re going through that journey, again, that little team, that leadership within the teams, doing something together, that was something which makes you stay strong. You’re not alone. You’re not doing it by yourself. After the first year in mathematics, your hands are shaking. It’s nothing like you can experience.
Then, once you start collaborating, you really start collaborating. What happened on my journey was very interesting. As I was doing my mathematics, now four years into six years, the Soviet Union just collapsed and there were no jobs for mathematicians or engineers for that sake. Your choices were, you go to McDonald’s or you go to call center, no other jobs. Somehow by some miracle, we identified that there was an IT program in Finland, which is across the road from St. Petersburg. This is an extract from my passport. We actually asked the customs people to put our stamps very close together because it was so many trips backwards and forwards. That small group of mine, we decided to go to Finland.
We decided to do parallel degrees, one in Finland, one in St. Petersburg: one in English, in IT, another one in Russian in mathematics. I remember giving a call to my mother and saying, I really don’t speak English and I really don’t know what IT is about, but there is a scholarship, which is €500 a month versus average salary in Russia at the time being $100 a month. The best advice I’ve got, you go there and you will figure that out once you get there. I know that speaks sometimes to you as leaders. You go there, you just get there through your fears, through your insecurities. Once you arrive, actually, you are surrounded by exactly the same people with their own fears and insecurities. Together, you’re probably going to figure that out. That’s exactly what happened on my journey.
Through the two years and two parallel master degrees, we all traveled six hours one way, six hours the other way. We managed to pass exams in both universities. You saw those 5,000 hours, those were for real. Once we pass it, we’re about 20 of us out of those 250 who started at the university. We all ended up with actually quite applicable degrees, yet not very applicable passports. Mine was Kazakhstan. Many other people had also not the great ones. What I did is I emailed from my Yahoo account to every university I could find on the web to say, I have two master degrees, I have eight publications, I’d like to do a PhD. I don’t mind what country or what subject. Two replied, and I’m surprised two replied. One was from Denmark and one was from Leeds. Denmark probably was by mistake. They never followed up with that reply. Leeds said, “You can come over as long as you pass some exams, get a scholarship”. I think this slide is actually quite interesting for me because all you have is a 20-kilo bag, when you land. What does it mean?
It means freedom. You really have nothing and nothing to lose and nowhere to go back. What does freedom mean? Does it mean really freedom? I took notes for myself just to make sure I don’t forget to tell you, because freedom is not always the easiest of statuses because you’re free when you start your job. What do you immediately need to do? You need to define responsibilities, what are you going to do versus everybody else? What impact you’re going to make versus everybody else. You will have to choose how to set your goals.
For me, the most challenging aspect was, and you remember I arrived there, not really speaking English, understanding the culture where you actually landed. You know the 5 p.m. pub in England? I had no idea that you’re supposed to go there because that’s how you build friendships. Then, Tanja was one of my friends. We were all foreigners. It’s quite difficult. I think for me, if you arrive into a new organization and you’re starting from scratch, set your goals. Understand your culture. Understand what is you versus somebody else or maybe many people. Understand your allies. I think some of the previous talks were extremely insightful on how to build your own path.
For me, it was quite interesting because me being an engineer, obviously I came and said, I’m going to come. I’m going to do a PhD in three years. I have a scholarship for three years. The first thing I’ve done, I arrived with my 20-kilo bag, came to my supervisor, said, what do I do to finish in three years? The first thing he told me, nobody finishes in three years. You do your PhD in three, you write during the fourth year, then you’re finished. Said, “I can’t do that. I have a three-year scholarship. I have to finish in three years”. We were not friends after that. I don’t know if you want to be friends with somebody. It would have helped, I have to say. We were not friends. I got out of him that I have to publish two papers, produce a book with everything I’m going to invent, not yet invented.
Then with that book, I’m going to go to my viva and that the viva will be conducted, and then I will get my PhD. I was given very clear objectives. You clearly remember, I forgot the cultural aspect of it. I forgot that the pub was important, the whatever else was important, the humor was important. All of that was secondary. Then, I think we as engineers sometimes forget that we have very strict goals. I know the deadline, I’m going there. Sometimes we forget that there are other people around who are sometimes not engineers, which always comes as a surprise to me even until this day. Think about it next time you think that you’re free to invent because you’re not in isolation. For me, that’s one of the leadership lessons that you have to just step in, set your goals, set your own understanding of the culture, and then decide what’s important for you at the end of it.
I think for me, halfway through my PhD, I decided what I’m doing is very important. Yet again, not making any friends here. I came to the vice chancellor now and said, I’d like to set up a company. He said, nobody does company. 2007 in England was not a done deal. There was I think an IP office, but not much. The IP office was somewhere downstairs. It was not very popular. In the U.S., it was on trend. England, not quite much. I started the company towards the end of my PhD. University, again, probably could have been a better friend, but all I got is a letter to say, you can do whatever you want with your IP as long as you’re not mentioning the university name. I took the path, and I published my PhD. That’s another story, a very long one.
Making a decision, and I think here is the point of this slide, is really making a decision. What does it mean? Because you’re making a dramatic decision probably daily. You were in one position, you need to go into another position. How do you jump? How do you make that step change? Because going from a crowded lab with all the engineers, programming, doing some software development, into your bedroom to start the company is a very different setup. Right now, I think if you’re starting a business, you really want to define, how do you see yourself tomorrow? Today you start the business, everybody cheering, maybe you’re going to the pub. Tomorrow, you’re on your own with that business. It took me forever. It took me about six months going backwards and forwards thinking, am I wasting my time? Should I close this business down without telling anybody?
Then, Yorkshire Forward, which was a local newspaper, they gave me an award, Entrepreneur of the Year. I said, “I can’t close my business now. I have to receive the award, take the photographs, be in the newspaper, then I close it”. It was really an interesting moment, somebody recognizing you. You don’t even know about these people, but they’re giving you awards. I have seven. Some of them from Yorkshire Forward, which is a really amazing organization. I’m not sure if they’re still around. Somebody, which could be you, recognizing somebody in your team who might be on the verge of maybe yes, no, maybe yes, no.
Then, it puts them on the pedestal for just a second, maybe insignificant, but still, it gives you a chance to be a leader, to think about somebody who is not, and actually making that time of their impact, their little change in the organization, really shine. That does prevent them from giving up. At least that’s in my case. For me, it was very interesting to see how the relationship with me and the business got developed. I’m sure for you, as you’re running your teams, it’s the relationship between you and your team. Can you connect to your team? Can you see every single person for a person, for who they are, for what they do, or is it just like a gray mass of engineers? Sometimes it’s important just to lead everybody forward. Sometimes it’s important to pick particular people forward. In my case, it was very interesting to see those awards.
I have to say, you cannot do things on your own. You have to be the team. I was extremely lucky. During my PhD, I met a number of radiologists, one of them, Professor Mikael Boesen. At the time, he was a PhD researcher. He became the first believer who actually took my algorithm, which was developed in PhD, and used it to diagnose a patient. That’s a breakthrough moment in my head, my head nearly exploded. Could you imagine somebody doing a diagnosis using your technology? I’m sure you can, because if you develop Spotify, or you develop anything magnificent, it’s all out there. It was a breakthrough moment. You get the first investor, that’s another one, the first. Sometimes they just believe in you, and sometimes it’s your friends and family. I suggest you really get a proper investor if you’re starting a business. Then you get your first team. As I said, my business started somewhere in a dark bedroom, there was nothing there. When you move to your office, in your team, it doesn’t matter how shabby this office looks, and it was very shabby for many years, it’s still your office, it’s your team, it’s your armor. Now you’re surrounded.
Building From Scratch – Learning to Fly While Flying at Full Speed
I’ll tell you a little bit more what the business is about, because I told you a story, and this is what we are. In 2007, when I started it, it was a data science company, the company which would take imaging data, radiology scans, run them through the algorithms, and produce data. We were very lucky that when Yorkshire Forward gave me that award, in six months, our first client was Abbott, the largest drug development company of all times, the most successful one. The reason for it, because we could speed up, accelerate the recognition of a change in a hand, this is a hand, MRI scan, the hands of a rheumatoid arthritis patient. You could see that bright change a little bit faster, fast enough for them to claim that their drug works better than somebody else’s drug.
We moved from being a qualitative descriptor of the image, to quantitative analysis of the image. You could see here, that same hand, you can analyze it in two ways. You could look at it, you could scale, on a scale from 0 to 3, I think this joint, called metacarpophalangeal joint, is 3, really bad, really white. You fill the form, as a radiologist, and then you sum up all the numbers, and that’s your score. Alternatively, you can look at it via a methodology, which came out of my PhD. The book on Amazon is still available. You got a quantitative analysis of the pixels, incorporate into the radiology image. You can actually count the number of pixels automatically, obviously, you’re not sitting with a clicker. It allowed Abbott to see changes in four weeks, versus six months. Now you could see how this became valuable.
All of these big drugs, if you ever had autoimmune conditions, which is rheumatoid arthritis, psoriatic arthritis, multiple sclerosis, and it’s an inflammatory in your body, they all will be treated with one of those drugs, and new drugs as well. We were very lucky to be in the right place in the right time. We carried on.
We started with hands. It was a single indication, single focus, single image, enough companies, luckily, to start the business. We started analyzing those images and producing the results. Obviously, you have a paradigm change. You really need to think how to present it. Now you are, and I am, in engineering, a technology, a software engineer, trying to sell something to a doctor. They’re talking to me about, no, IL-6 is working better than IL-1 in this particular category of rheumatoid arthritis patients. It did not compute. It’s layers and years of medical degree, which I don’t have. You have to get your engineering mindset down to the point where you can actually show people what you understand, and then wave your hand and say, I don’t understand every single word you just said to me, but I think that’s going to help. Here comes a lot of noes. We tried. I emailed people.
By that time, I had about maybe 10, 15 engineers in my company. I had a few salespeople. I considered myself a salesperson because somebody had to be emailing those people to say, we have this methodology, we have this algorithm, we’re going to do this and this. Luckily, there were enough yeses. The business grew 100% year-on-year and continues to grow. We expanded, obviously, from rheumatoid arthritis into multiple indications. We’re very lucky to have extremely big names behind the publications that use our methodologies, and very interesting customers.
I think then the next steps come in. We develop engineering. We’re successful at understanding data. Often, in our teams, we stay in that track. I can do this, I’ve got a nail and a hammer, and I can hammer it like that, and I’m really good at it. Actually, the customers potentially want something bigger, because the business never stays still. The business of engineering also never stays still. This is probably subject to many conversations. Our customers told us that you take your algorithm, you put it on a cloud-based platform. They did not express it in those exact words, because it was before cloud. We were the first company to actually have a cloud-based platform. I had to explain to people how cloud is better than sending CDs to each other. That’s another set of emails that went out. We have a cloud-based platform called DYNAMIKA, and we put a number of different methodologies for medical image processing onto this platform.
We then realized that actually medical imaging and radiology as a niche market stays very different to a medical degree and to people who design clinical trials. We created a consulting division to design a trial which would incorporate medical imaging. Then we surrounded all of that with a global project management organization. Now you think, where is the tech? Where is the software? How is this all connected? Many businesses in my world, in drug development, drug discovery, they are normally started either by scientists who understand the cell-to-cell interaction and all of that good stuff, or project managers. It’s very different when actually a software engineer would start a business in this world. Software is a piece which is really universal. It’s not only connecting your own data, but also connects your customers to your data. It connects multiple third-party providers of similar algorithms to one platform. This is how we’re surviving. We’re innovating daily. We’re constantly developing new indications, new pieces of software. We’re thinking about new algorithms within indications. It’s an endless business. There is no end to it. What we have today is we have a platform which has a cloud-based infrastructure.
Then, connectivity of that infrastructure expands way beyond radiology teams. When you run a clinical trial, you really think about your other customers. You think about your hospitals who will upload the data. By now, everybody knows what clinical trial is because we’ve all been through COVID. You have a number of hospitals who will upload the results. They will be analyzed centrally, and decisions would be made based on analytics. Sometimes the analytics are survival or non-survival of the patient. Sometimes analytics are more about pain or function or improvement of health. You’re getting into the business which now connects teams way beyond engineering. Engineering still sits on top.
Obviously, we started with the technology, people who build the software. The product team became very important. That product team, which is the connector between the project managers, sometimes customers, sometimes third parties, and real tech, which are developers and IT specialists, became a critical and fundamental piece to the business. That team is now driving the tech.
Obviously, it’s a highly regulated environment and there is a regulatory piece to the product design and development and running of it, but just to simplify it. What it means in the real world, and I think I want it to be as visual as possible: this is an image of a tumor. Here, you could see a tumor in a nasal cavity. What happens to this tumor? Here is before treatment, here is the after treatment tumor. You see how the size of the tumor didn’t really change? What happened is something happened inside the tumor, some necrotic area got developed. Meaning the treatment worked somehow, but it didn’t change the size. Regulatory interaction with this image is the following. The treatment did not work. The size did not change. What we’re doing on the backend, we’re looking at the algorithms which actually can help us to recognize a change.
Obviously, all of you looking at this are visually thinking, you just take volume and volume would be easy, and you could just understand the areas which changed, which didn’t change. Now what you’re doing in your head is you’re layering software development, radiology, medical, regulatory. It’s an incredible sandwich, because when we’re sitting in a software development team, it’s really obvious for us. By the time you get to regulatory, it’s not obvious to them because the question is, your tumor changed, did the patient die sooner or not? If the patient was not affected, I don’t care about your tumor and your image and your software development. Your end customer and your end decision maker actually is somebody who is sitting in the FDA or EMA. Thinking about the product and the layers of scrutiny this product needs to go to allow someone to make that final decision is tremendous.
We have quite a few interesting examples, and I only have two for you. You can go on our website and have a look at every therapeutic area. It’s fascinating for me still to really think about it. It’s a glioblastoma head of a patient, a slice through the head, and there is a tumor which is in red. Blue is not a tumor, it’s just a necrotic core, so there’s nothing in there. Green is inflammation around the tumor. If you’re using what’s called the standard criteria, you actually see through the follow-ups of the image, so you have a baseline, first follow-up, second follow-up, you could see how this tumor is actually increasing in size. If you take a standard measure, you could see that at some point it reaches a really high size, then it drops, then it reaches it again.
At this point, when tumor increases, patient gets taken off the trial. This is really bad for the patient because the benefit of this therapy, which is called immuno-oncology therapy, only happens afterwards, so that increase is actually positive. As your product goes through the scrutiny of regulatory approval, you nearly need to have to think about, does this matter or is it ok for the patient to be dropped off the trial? You could see here that you’ve got a volume decrease on where it says volume. This volume is actually fundamental for this patient to stay in the trial. You could see that the very last follow-up, May 2007, the tumor reduces further. Your tech product is impacting somebody’s life. I think it can’t get sharper or clearer than that when you are responsible for that line of code.
That line of code is going to get into the hands of a radiologist, and that decision is going to be made on a patient level. Then, that decision is going to be made on a thousandth patient level, and this drug is going to get approved or not. The importance of a line of code gets up as far as I can see. You start understanding how anything you’re developing and your team is developing is becoming critically important for the patients and how the pharmaceutical companies could use it, and so on.
Building a team for the business is definitely not the same as building algorithms. Algorithms are very focused, very no left, no right. You just know your deadline. You know what you want to achieve. You know how precise you want to be. Building a team is incorporating everybody’s objectives, incorporating everybody’s culture, understanding who is your ultimate decision maker. Of course, when you’re thinking about the company, your people are not even your team: it’s your clients, it’s your investors, it’s your network. Each of those teams has completely different objectives. Even if you are within engineering, deep down, taking this out and really understanding how this product is going to impact a broader audience, first of all, makes your life much more interesting, but secondly, you could see the impact of everything you do.
Setting up strategy and vision, I think is one of those critical elements which you only realize when you, in passing, make a decision, and then somehow it gets implemented. All you meant is just to share your thoughts, what’s taken as a vision and a strategy. You need to be very careful how you communicate your ideas and your strategy around your team, clients, investors, and the entire network. I think controls and processes here are very important, but also looking at your own style of communication. Because I like writing emails on a plane. I’m on a seven-hour flight from here to New York, I write about 50 emails, they all come out as soon as I land. My team just gets shaken up in the morning, and they did ask me not to write those emails.
Now I do write the emails, I don’t send them straight away, I page them. It’s a very different objective when you have a team, you have a company, you have a network of people, and you’re trying to place your product in the right position. As a leader, it’s your job and it’s your responsibility. Of course, a company is not just people, strategy, and vision, it’s really luck: being in the right place at the right time, with the right approach, right product, that you can’t predict.
I finish the company story just to show you a few things, how diverse you could be with just one product. I consider that we only have one product, it’s a platform with multiple lines of applicability. We work in glioblastoma. We work in gastroenterology. We work in rare diseases. We work in some of the sophisticated oncology drug development. Building those partnerships, and again, partnerships are very important here because in our world, data is everything, so you cannot innovate one way. You can’t develop a product, place it on the market, hope for the best. This would be a very short span of what you’re doing. If you want to really innovate, you need to set up collaborations with your clients which would in return allow you to reuse this data for your AI algorithms, for training or for testing or for anything else, for development of something new. You share insights back, and that’s how you sustain yourself as a company. Now 15 years later as a company, we truly appreciate the power of the collaborations.
Challenges, Wins, Aspirations, Ambitions, and Principles
Now I’ll tell you a little bit more about the challenges and how I was building the business. Again, as I said, it’s from my perspective. Do you have the right team? How often do you need to ask this question? When you build a business, you never build a business of 50 people. You build a business of one, two, three, then you hire the next one and you think about it. Normally when you start, you can do everything. I can do software development. I can do some cleaning. If the customer comes along, I can make coffee. You have people just like you around because you can’t survive otherwise. You have to have that mentality that everybody has to do everything. As the business grows, first of all, if everybody does everything, it’s very little progress because everybody does everything and nobody focuses.
Then you have to introduce a concept that actually you’re only going to do that, and can you please chop this off and this too? We can hire a cleaner, I no longer need to clean. That focuses your time, it focuses your objectives, it pushes you up. For your team, that could be a challenge, because if you’re starting something from scratch, you are looking for the mindset that they can do everything, and then you’re telling them that you don’t want them to do everything. I can tell you, I changed my team many times. How often do you need to ask this question? I think that I leave to you, because as a leader, you really need to think really hard, do I have the right team? Am I asking this question in the right time because you’re in the middle of a crisis, you think your team just sucks. They don’t suck, it’s just a crisis. It’s not the time to change the team.
As you’re going into the next level of growth, as you’re thinking about your product becoming more established, how do you enable and empower your team to change with you or not? Because many people don’t like to change, and it’s a very difficult ask. You just hired somebody for them being everything, and now you’re asking them to be just half of that. You can give them double amount of work, but if the person is diverse in their talents and they want to be out there, they’re not going to be following you. It’s very painful because your best people are your best people.
What is your goal, when you start in a team, when you’re working within the team? Is it that you want to progress in your career? Because if you honestly look at yourself and say, do I want to progress in my career? Do I have the right team for that? Great. Increase in knowledge. It’s another objective. Again, day to day, month to month, quarter to quarter, you can ask this question. Do you want to be a leader? What type of team do you need to be a better leader? Or you’re striving for building something. Because as a leader, I think our objective is, create other leaders. Also, we want to make an impact, because you don’t empower more people around you and you don’t hire more people if you’re not making an impact, if you’re not feeling it. For me, it was always a step change.
Obviously, I could increase my technical knowledge any day and become a better leader. When you’re building a business, you really need to be quite objective in the assessment of what’s happening today and what you want tomorrow. You probably saw these books, there is a valley of death and chasm of rather something. As an entrepreneur, you constantly feel that you’re either in this valley of death or in that other hole. Any other things don’t exist. I didn’t even bother making them clear because I don’t know what they look like. You’re constantly feeling you’re failing because somebody is doing better, somebody is doing something different. You’re thinking, maybe I should do something else. When you’re building something new, it doesn’t matter where you are on a scale.
For me, what matters is to do a step change. I think an earlier speaker spoke about small incremental change to inspire the team. I don’t deny it. It’s very important to inspire people. If on the 1st of January, you say, when I sit here 12 months later, what do I want? What is my next step? You really can’t write more than three or four very clear objectives. Then you can deploy all sorts of techniques to make people follow your path, to follow your step change. The step change is very painful. It does live in that high impact, half pain rectangle of a graph. To grow your revenue as a business, year-on-year, take step change. It’s very difficult to do small incremental steps and then get everybody on that path. This is where the questions around the team really come from, is it the right team? What is my objective? For me, this is an annual step. An annual step change for the teams.
I also find it’s extremely difficult finding a focus, because as a CEO or as a leader, you will be pulled in so many different ways. You have to stop and you have to say, what is my focus, and what is everybody else’s focus? The clearer you can be, the more precise you can be where to focus. Please don’t take that as a micromanagement tip. That’s not it. Focus is that step change with a goal at the end. If you can write one line for your next level manager to say, I want you to focus here. Don’t tell them how. Just tell them what’s the focus. It helps tremendously. Also remember that mixed messaging, if you have this focus, you really need to be absolutely sure that this is the focus, because people will spend time and effort and mental effort to think how to deploy your vision and how to make it really work. If you set their own focus, do not blame them because you just told them that this is the focus. Within an organization, once you’re beyond five people, it becomes difficult. Focus for yourself, for me, is the most difficult thing.
If you find your own focus, you can then tell other people what it is, and somehow, they may be figuring out what it is for them. At some point, my friend, also a serial entrepreneur, told me, if you can’t find focus, which is, you’re finding yourself with doing too many things, too many lists, sometimes you tick them off, sometimes you don’t tick them off, try to find five aspects of your life. They cannot all be about engineering. Don’t do that, because then you will end up with five engineering objectives. One has to be about you. Another one has to be about your work. Another one has to be about your friends and family. Another one about extra knowledge. Maybe the last one is the material about the next level of money you want to earn, whatever that is. Make them diverse. Write the five statements of what that is going to be in three months. Three months horizon, there’s something magical about it. You could just see it. Not quite, but you could just see it. Great exercise. It’s actually much more difficult to do than it looks.
If I would write, by the end of Q2, I want my business to have revenue X. One more tip, the way you write is like it’s already happened. The business has revenue of X by, and you put date, month, and year. Very difficult. Think about five aspects, five objectives. Then, the most difficult thing, you have to look at your five cards every day in the morning. Visualize it, whatever else you do in your meditations. Visualize. For me, the most difficult one was to actually read the five cards every day. You read five statements you wrote, like they should be, every single day. That brings your focus back, back on the five statements. The reason I’m saying three months, it’s very difficult to plan actually beyond three months. You definitely have your annual step change.
Then you have this help on a three-month basis to support your step change in 12 months. You can write today that by the end of this year, or by the end of Q1 2025, I am a billionaire. I’m going to read this card every day. It’s going to happen: maybe not, maybe. If you write three months, clear statements, clear objectives. Many people talk how to best make it happen. It’s a meditation or visualization techniques, and you can read about them in all sorts of books. If you want it to happen, it will definitely happen. I think that gives you focus back from all those lists and all those diverse objectives. The reason you write five different statements is because you don’t want the five things to be in one area, because you will be very successful in those five, and highly unsuccessful everywhere else. You’re going to be the best technology leader and engineer in three months, but your wife divorced you, maybe it doesn’t matter. Make sure you write diverse things.
The last one, and I think it’s an interesting one, because I call it the box. Again, probably somebody calls it the same thing. What is the box? Are you comfortable with what you’re doing today? Are you confident in your outcomes? Are you in your comfort zone? Probably if I say that, we’re probably all going to raise our hands. Majority. Is this good? Is that good, that you’re comfortable, that you’re confident? Are you stretched? Am I growing in my role? Is this all I can achieve? What is my next challenge?
You define in your box. You might not be thinking about it right now, but after I told you, you will be. You’re comfortable. You know how to write sprints. You know how the AI works. You read that book. You know your team, everybody is productive. That’s your box. What if tomorrow I said, and tomorrow, I’d like you to do some sales? Talk to the customers, do demos, that will be the list of your problems. You actually need to write contracts, and sign those contracts. These contracts might not be ideal. This is a challenge. This is a challenge because you took yourself out of the comfort box, and you now became something else. Is it good or bad? That’s for you to decide.
The challenge. We need to understand ourselves. Understand our guiding principles. As I said, for me the foundation was in equality, focus, hard work, collaboration, all of the good stuff. My best people came out of being introverts. Way much better leaders than I am, because that focus, hard work, do not allow you to be the most amazing leader, you just don’t have the time. You have different people. You have different buildup of those people, and every leader is very different.
To me, I think the most important thing you can do as a leader is set yourself for growth. How do you set yourself for growth? You could really think about ability to self-doubt, ability to get out of the box. Because there’s nothing wrong with questioning, are you doing the right thing, are you in the right place? Then also being comfortable with being uncomfortable. There’s nothing wrong with that, because that’s how you grow. If it’s scary, potentially that’s the next path for growth. Of course, it gets scary when you go there, but then once you’re in, once one step in, once you get there, you’ll probably figure that out. The last slide is your ability to change. You could listen to all these talks, and change any day and any time.
See more presentations with transcripts