Category: Uncategorized

MMS • Sergio De Simone
Article originally posted on InfoQ. Visit InfoQ

Now generally available, GitHub Copilot Extensions allow developers to use natural language to query documentation, generate code, retrieve data, and execute actions on external services without leaving their IDEs. Besides using public extensions from companies like Docker, MongoDB, Sentry, and many more, developers can create their own extensions to work with internal libraries or APIs.
The GitHub Marketplace already offers a couple of dozen extensions covering a wide range of development-oriented services. For example, you can use the Stack Overflow extension to ask questions about coding tasks without leaving the editor; instead, the GitBook extension allows you to ask questions about GitBook docs.
Besides providing access to documentation, Copilot extensions may help developers interact with a service directly from their IDEs. For example, the Docker extension allows you to generate Docker assets and analyze vulnerabilities; the LambdaTest extension lets developers manage testing workflows and streamlines test execution, automation, and insight generation; the Mermaid Chart extension can generate various kinds of diagrams based on you GitHub Actions, SQL, or other files you are currently working within your IDE.
As mentioned, developers can also create their own extensions to access private data or in-house services. To make it easier for developers to create extensions, GitHub has published several repositories showing how you build a basic “Hello World” extension, how to gather feedback from extension beta users, and more.
There are two ways to build Copilot extensions. On the one hand, you can define a skillset, meaning you have Copilot handle all AI interactions with the extension providing a description of several endpoints it can call to process user requests. Currently, a single extension can use up to five distinct skills.
On the other hand, you can use your own AI agent, in which case you pass certain information from the user context to the agent, such as details about a user’s current file, selected text, and repository. In this case, the agent receives server-sent events (SSEs) with user messages and references to their current environment. The actual context information varies with the client hosting the extension. For example, while Visual Studio and Visual Studio Code pass the current selection or the whole file content, GitHub.com doesn’t, but provides the URL of the page the user is currently visiting.
To make it easier for extension builders to manage authentication, GitHub has recently added support for OpenID Connect (OIDC). This frees developers from having to verify a GitHub token’s validity on each request by allowing them to use a pre-exchanged token.
GitHub Copilot Extensions can be used in a variety of clients, including Visual Studio and Visual Studio Code, GitHub.com and GitHub’s mobile app, and JetBrains’ IDEs. They are not supported in Xcode or GitHub Codespaces, though, nor vim or emacs.

MMS • Olalekan Elesin
Article originally posted on InfoQ. Visit InfoQ

Transcript
Elesin: My name is Olalekan. I was the director of engineering, now I’m VP of engineering. The title says, elevate developer experience with generative AI capabilities on AWS.
We go through, we introduce Amazon Bedrock, code review assistant, agentic code generation, code summarization, and I have a bonus material which I would share with you. Personally, I’m a fan of football. I would show you an example of, I was watching a game, and then in 20 minutes or so, built a simple application.
Introduction to Amazon Bedrock
What is Amazon Bedrock? It’s a fully managed service. I’m not, as I said, maybe AWS hero, but I just want to introduce you to this. Not a marketing slide. I would also give a critique of my own of this service, or AWS services that I have used to elevate developer experiences at my workplace. Fully managed service comes with foundation models out of the box from leading companies, AI21, Anthropic, and things like that. Key features, Bedrock Studio, Knowledge Bases.
I’m not here to talk about Bedrock, but I’m just trying to give you an introduction to it and how we can use it to elevate developer experience, which for me is very important. If you have access to an AWS console, you look at Bedrock, this is what it looks like. It’s why Meta, which is also a very key contributor in the open-source community for the development of foundational models or large language models. This is where it gets interesting, everyday developer experience. For me, this is what I care about, not Bedrock.
Code Review Assistant with Amazon Bedrock
As an engineer, who loves code reviews? If you’re an engineer, you love code reviews, you’re from Mars, everybody else is from Earth. Let’s take that. I still do code reviews, and at every point in time I’m wondering, why do I have to do this? If you look at the coding flow, there is 19 hours of coding on average, pickup time takes about 9 hours on average. Review takes five days because, if someone says, yes, in this daily standup, I need you to review this. Don’t worry, I’ll pick it up. Who picks it up? Nobody. I also do the same, nobody picks it up. Then the deployment time is on an average 80 minutes. There is a lot of work getting stuck in the review phase.
The value that we as engineers create is that we are very excited when the value that we create with code, when the problems that we solve with code, gets into hand of customer. For me, this is very important. How do we reduce this? One day, I was super pissed at code reviews, and I realized that the engineers on my teams also were struggling with code reviews. I sat and said, how can I solve this? Then I came up with this architecture. Fortunately, it’s quite simple, so you look at a third-party Git repository, think of GitHub or whichever one. Created a webhook with API gateway, which goes to a lambda function, sends the information to Bedrock.
Once you create a pull request, it triggers the webhook. Goes straight, does the review of the code, goes back and comments, and does review. This is something that can shorten that nine hours plus five days to minutes, if not seconds. I was super excited on working on this, only to realize that people already built this on GitHub, and they’re putting in the GitHub Marketplace.
Then again, this is something that can take you from nine hours plus five days, into minutes. It doesn’t mean that the reviews are super perfect, but it gets you up to speed as quickly as possible. Now think of the webhook, and think of how you can extend this use case within your organization. You can also exchange Amazon Bedrock. You can change it to Mistral code assistant, you can change it to Claude code assistant. You can change it to anything that you want. It’s simple. Gets you up to speed as quickly as possible.
Agentic Code Generation with Amazon Q Developer
Who loves moving from Java 8 to Java 17 and Java 19 or 21? Who has ever embarked on that journey where we say, we’re running on Java 8, we have CVE issues, now we need to go from 8 to 95? Who enjoys that? I think there was some Log4j issue in, I think 2022, where lots of people had old Java versions that we had to migrate. This was really interesting for a lot of us. If I see this, I know I’m excited. Who’s excited, to do code migrations? Here comes Amazon Q Developer. Amazon has mentioned a lot about this. They’ve done migrations on their own, they’ve done 1000-plus migrations. I can tell you that this actually works, from my own personal test.
What I did not like about the Amazon Q was that it took us months to provision. The reason, you have to connect it to AWS SSO, which for most of us, in our organizations, we already have an SSO provider. What we did was not to use Amazon Q Developer, but we used GitHub Copilot. I cannot share the stats, but I will share with you what I observed with the engineers that used Copilot in my teams and engineers that didn’t use Copilot.
Every morning we use our instant messaging tool, and then I see at least four engineers sending pull requests every day saying, please review this. It turns out that these engineers are actually using GitHub Copilot. When we ran a survey with them, estimated again, they had between 15% to 30% efficiency gain by using Copilot. Then I asked myself, the colleagues in suits saying, in five years, engineers will go away. I don’t think so. I think in five years, engineers will become better at the craft. We will be problem solvers.
One of the things that I noticed with the engineers that used Copilot or AI assistants in writing code was that they became better problem solvers than code implementers. It can be Amazon Q. It can be GitHub Copilot. It can be Cursor. What I’m trying to say is that, as engineers, you can start introducing this to your workflow, as long as it aligns with the security policy of your company.
Let me see if I play this video. I’m actually a Manchester United fan. When I was preparing for this, was one of the football games on a Saturday afternoon. I wanted to watch the football match, and at the same time, I wanted to prepare for this. I wrote a prompt saying, give me this particular project that would perform this particular activity.
It generates the sequence of tasks as an LLM agent, and here I am for about five minutes just sitting and watching Manchester United. This is what we do as engineers. We tell the product manager and the engineering manager that, yes, I’m working remotely, just give me a bit of time. We’re chilling, sipping some coffee, whatever you drink. It’s October 1st, so you can get the idea.
Then, you wait for the LLM to generate the code. Definitely it generates code, but it doesn’t mean it’s accurate. It means that you still have to look through it by guiding it through the problem you’re trying to solve. What do I do after? That’s it. It generates my CloudFormation template. Generates the code itself, and I spend maybe another 10, 15, 20 minutes editing to make sure it aligns, and testing it. What would have taken me about three hours, I got it done watching a football match, and also in 20 minutes. Again, better problem solvers as engineers. Engineers are not going to be replaced in five years, we’re going to evolve and be better at the craft.
Code Explanation and Summarization
Code explanation and summarization. How many of us have people joining our teams regularly? Tell me if you actually enjoy explaining to them how the code works. Nobody does that. Is there anybody that loves to do that, to explain? I don’t enjoy it. When I joined HRS, I led the data platform team, and we built it from scratch together, the Kinesis data ingestion and data lake, and S3, and here we have about four years after someone was asking me a question, and I’d left the team. I was like, how do I start explaining how this works? I got tired.
Then comes code summarization and explanation. What I’m showing you right now is an architectural blueprint that you can give a try to. We have repositories that exist on GitLab, Bitbucket, GitHub, you can put that in an S3 bucket securely, making sure that there is no public access. That’s very important. Then put that in the Amazon knowledge base with OpenSearch, either the OpenSearch with instance, or OpenSearch Serverless, and then with a foundation model, the new team members can ask questions to it. Very easy.
Even within GitHub Copilot and Amazon Bedrock, you can also have that in there, in the code repository itself. They can highlight the code and ask you to explain. This way, you can easily generate documentation that’s really closest to the code as much as possible. You can have interesting automation in here.
Example, think about when you do a git push, you can attach a webhook to it to automatically update the documentation and make publishing it as an HTML, where they can go to and see. Think of platform engineering teams as well that are maybe publishing CDK templates or CloudFormation templates or Terraform templates. These can really help onboard engineers into what you’re building. This is something that you can get started with without writing as much code as possible.
Support Case Investigation
This is the bonus material. I work in a B2B environment. How many of us get someone saying there was a customer somewhere that complained about an issue in production, but our infrastructure monitoring never picked it up? What happens is, when a customer raises an issue, sometimes it takes a couple of days for us to find the underlying issue. One of the things that we realized was that the information about the issue reported by the customer exists in different systems. If you’re not using super complicated architecture, super complicated login systems, it might be difficult to find.
Then the customer service agent with the application support, the product manager are all pissed and wondering, why can’t you find it? Here is a simple architecture. Let’s say you have a case management system as an example. Someone logs in and creates a case saying, we have this issue in front of customer x, customer y, you can easily trigger a webhook similar to the git integration earlier, which invokes an agent. What that agent does is that it goes through your login system.
You can do that with Python or Java SDK, both with three SDKs, saying, search CloudWatch with CloudWatch Insights for this particular string. If you use Prometheus, you can use OpenTelemetry to query it. If you have a database that you have read only access, you can also put that in place to do a read.
Then, once you find all that information, what LLMs are good at is that they are good at really synthesizing unstructured data. What inspired this was what I described earlier, but I think late last year, just to give you a bit of background context, we had colleagues in our HR department. They run surveys multiple times every year where they ask colleagues, how do you feel about the onboarding process? How do you feel about this? This is give or take, they receive 240 responses, unstructured, in an Excel file. Who can go through 240 and still have not drank maybe five or six cups of double espresso?
What I did was to sit with the HR colleague, this is to say that, even with using LLMs as elevating developer experience, you can also extend to other parts of the organization. Sat with the HR colleague and said, let’s look at all these files. Let’s put them in a large language model, and let’s map the result of this file to our onboarding process. What would have taken seven hours, we did it in five minutes.
Again, developers are not going away. We’re here to stay. Tell the suits that. What I’m saying here is, support case triggers a webhook, API gateway invokes a lambda function, queries all possible locations that you have your logs, and then sends the information back into the support case application. What used to take about three days, nine hours, whatever time, with this approach, you can do it in five minutes.
Then nobody has to sit on my neck and start asking me, where’s the answer to the question. This is very important. Like I said, you can think about multiple use cases and try to reshape this based on your organizational context, your current problems, but I can tell you that this would elevate your experience as an engineer, because now you focus on the actual problem solving. It’s good to explain code to the colleagues, but what we love is to create value with software.
Questions and Answers
Participant 1: Have you gathered any experience using GenAI for when you have a refactoring, like the upgrade you mentioned, that you have to roll out across 200 services, and services are different enough that you cannot really script it. Have you ever had any use cases like that?
Elesin: Yes. One example was that we wanted to put a REST interface in front of an internal gRPC protocol. We put it in GenAI, and it messed up bad. What we did was to think about the problem we’re trying to solve and then write it by ourselves. We couldn’t trust GenAI in that regard. In some cases, it behaves really well. In this case, it was really terrible.
In fact, I could not believe the result myself, because then I had to sit on the call with the engineer, saying, can you please show me which went through? I was like, we have to do this on our own. I know, it happens. It’s not perfect, but what it does is that it gets us a head start.
Participant 1: If you have to roll that out across a lot of repositories, you could probably make the change once, show it to the AI and tell it to do the same thing everywhere else.
Elesin: At least what I know is that in Amazon Q Developer, you can integrate your own code and train the underlying model with your own code, but due to our own security compliance, we haven’t done that.
Participant 2: Can you give me an idea of what an AI code review looks like? You make a 2000-line code pull request, what does it say?
Elesin: One of the things that it looks like is that it doesn’t have context of the entire application itself. What we realize is that with the new models, which say they have 200,000 token size, which is about a book of maybe about 500 to 600 pages, you can actually give it your entire book itself, and then it’s able to understand the context. You can give it the entire repository, and then it understands the context. What it does is that it says this particular line behaves like this. This is the expected result, maybe adjust it in this way to simplify the runtime.
Then it comments one after the other that way. Very human readable and very understandable. Like I said earlier, I have a team that is using this integration as we have it, and every day it’s about four to five pull requests, anytime I open the instant messaging. On the other hand, we have teams that are not using this, so I did more like A/B testing with the team, and this is the result.
Participant 3: How much do you spend approximately on the tokens for inaudible 00:21:13. You might not need to measure it, because you’re free on tokens, I assume. Was it more in the dollars or the tens of dollars? Because the repository can get arbitrarily long when it comes to the tokens you need to consume to even embed it.
Elesin: I didn’t see it pop up in my AWS costs. I know that if I exceed the budgets, my FinOps colleagues will reach out to me. So far, no.
Participant 4: Could you share some rookie mistakes which you did when starting to build on an AWS Bedrock. What are the things which you would recommend if I’m starting out now, not to do?
Elesin: What not to do when starting out with Bedrock? Number one, check with your security, check with your compliance. Number two, and this is my personal understanding, anything that we consider as high proprietary company information, intellectual property, don’t put it in there unless you have the compliance check from your security department. I don’t think it’s more of don’t do, but I think it’s more to do. What I would say to do is to try it out once you have some validation, immediately. We’re in business travel as HRS, and then we’re also heavy on sustainability, which is helping companies to align their company strategy when it comes to business travel with building a sustainable planet.
One of the things that we have, it’s called a Green Stay, which is about sustainability classing for the hotels that work with us. That information is very voluminous. One of the engineers on the team asked me a question, how can we simplify this? I said, you don’t need to train an AI model, so put all this information in a Word document, because it’s public information.
Let’s go to the Bedrock UI, put it in there, let’s start asking it questions. We have the version of that chatbot in a development area where someone could log into and actually play with it. This engineer had no experience with machine learning at all or AI at all. Check security. Get started immediately. Get into validation as quickly as possible. Do’s, not so many don’ts.
Losio: You mentioned before, one example is, I have Java 8, I want to bring it to Java 17, whatever it was. The scenario, usually, you have Java 8, or whatever else it is, you have probably a project that’s been running for many years. A person that has been probably quite senior, old, whatever adjective you want to have, a project that maybe has been there for many years. Think of that scenario. You want to convince the team to do that, because it’s easy to say, yes, Amazon did it, they migrated 10,000 projects.
Automatically, if you go to the developer team that has been playing with that code for the last 10 years, probably are going to tell you, I’m not going to put my entire code base and see 50 changes in every single class, all committed together tomorrow morning. How do you convince your team to follow you on your machine learning journey?
Elesin: How do you convince experienced engineers to use this tool to accelerate? It’s a difficult one. In my area, we’ve all come to the realization that this journey is almost impossible without some automation in place. That’s the first part. The second is, I had to show. Like I said, we have a team that is using it, and the colleagues see the efficiency gain with the team using it. That’s one of the proof points. The first fact that we came to the realization that this is almost impossible for us to do. The second is that we have a team that is using GenAI in one way or the other, and we’re seeing accelerated value delivery with software.
The question is, why not go in this route? Right now, when I’m hiring engineers on the team, I’m actually expecting them to tell me that they use GenAI to solve a problem, and then we go into detail. To your point, it’s change management. It’s difficult. For me, the communication is that, as an engineer, it’s not about someone getting rid of you in five years. It’s about you evolving to solve problems faster and get value into the hands of the customer quicker. It takes time, but yes, once they see the benefit, then they begin to understand. Because I myself also demonstrate how to use it. I think that’s what I do.
Apart from the developer experience, I also do product managers as well. Who loves estimations? We have refinement, now we have to do estimate. Then you get into the meeting the product owner or the product manager says, I’m not sure, I’ve broken them down into user stories, but let’s start discussing. Who enjoys those kinds of meetings? I ran a session just to also explain to you that this also takes time, not only from an engineering perspective, but also from product ownership perspective. I had a session with our product owners and said, I know that this is a problem, and I know that you might not trust this software, but let me show you the benefits of this.
First, map out who are the users of this and let’s take the problem that you have, put it into this large language model and say, generate user stories based on this problem, acceptance criteria, and then you’re better prepared to have discussions with the engineers in the refinement sessions. Now we have better refinement sessions because this helps better preparation.
The development experience, from that perspective, it also took time. Because I’ve mentioned it to the colleagues before, it took them time to understand, now they understand this, now it’s accelerating. The change management is difficult, but it is a journey that we keep on trying and improving over time.
Participant 5: You mentioned so many use cases about generative AI. One case you didn’t mention is generation of test cases. Is that it can’t handle Bedrock, or is that you deliberately allow it?
Elesin: I actually did this myself. Because I’m in business travel, like I said, we had a use case where we had so much work for our QA engineer to do, and nobody was taking care of this. Because I understood the context of the problem we wanted solve, we simply took that and put it into the large language model, in this case I think it’s Bedrock, one of the models on it, closed source models, and it generated the test cases. It was then based on this test case that we created tasks for other people, so that we could parallelize the work. It’s possible, and I’ve actually done it myself. It does work.
Participant 5: If it does, then how can it generate test oracles for your program? How does it know that it is doing what it’s supposed to do? It doesn’t know. You feed in code, which could be wrong, but that the test case needs to be coming from the requirements, for example, that A plus B is equal to C, and you have programmed A plus B is equal to D, how the Bedrock or any algorithm would know that actually it needs to write the test?
Elesin: When it comes to generative AI, it’s generative. Then, for it to generate something, you need to give it a prompt. In my case, I understood the context of what was expected. There were policies with regards to the expected behavior of the system which I had clear understanding of. It was basically my clear understanding of the problem. I said, this is the expected outcome of the user, this is the expected outcome for the problem itself.
Based on this, generate test cases. Then we looked at the test cases, some of them made sense, quite a number of them didn’t make any sense. Then we ruled out the ones that didn’t make sense. The one that made sense, we simply took them and said, let’s create the work for the colleagues to do.
Participant 5: You need a human in the loop?
Elesin: Definitely, yes. You saw, it was starting from engineers, then people are there. There’s always a human in there. In this case, the human was me.
Participant 6: Maybe one question on the specific topic you had with the repository and embedding it. How did you handle the generally weird structure of repositories being deeply nested. Because normally if you embed a document, you have this one embedding for this one document. Did you start building a hierarchy, or is it something that’s just natively handled in Bedrock?
Elesin: I didn’t care. I simply uploaded it into S3 but I excluded the irrelevant parts. What RAG, retrieval augmented generation, does is that it converts all text form into some vector representation and then stores that in OpenSearch. For me, it doesn’t matter if it’s nested or not nested. It’s simply making sure that it goes into the vector database, in this case, OpenSearch.
If you are planning to do maybe something more complicated that might require hierarchical structuring, and I think there have been techniques like the graph RAG, graph which is connecting nodes and edges based on the retrieval augmented generation, you can also try that. For me, I wanted to get it to work as quickly as possible. First phase this. If it requires more fine-tuning based on the work, then you can add that hierarchical structure in there. First, don’t care about the hierarchical structure.
Participant 6: I was just wondering if it handles it internally.
Elesin: Yes, it did it quite well.
Participant 7: Did you measure that, because inaudible 00:33:26 with installing the stuff, at least for Java code, not using Amazon but customly just build embeddings and just started to search through the Java code, and it provided completely unsatisfactory results. Because if you just think about Java code, they are just tokens that are not relevant to the language directly and just there are indeed should be some techniques, you overcome these limitations. Basically, yes, it’s interesting. What were the results? Basically, at the end these new team members were satisfied.
Elesin: How do you measure the accuracy of the response from large language models. Because it’s not a classical machine learning problem, that is, you have an outcome or an output that you can predict. It’s difficult to say the accuracy is this. This is another point where you need the human in the loop to do a thumbs up or a thumbs down, and say, this was relevant, this was not relevant. In this case, it was 60% relevant for the queries that we issued. In some cases, the remaining 40%, it wasn’t relevant at all because it was missing the context.
Especially we saw this in the pull request example, where, when we first tried the pull request use case, we were only giving it the snippet of the code that was written, not the entire context of the classes that were involved. That’s the case. In the 60% of this time, about 60%, there was a thumbs up that the responses were relevant.
Participant 8: You mentioned about the test scenarios generation, which is nice, during the refinement sessions, they will help. Is it also helping scripting the tests, such as that automated test, because there are some development environments which generate unit tests very easily, but in terms of unit test or API layer automated tests, it would be really useful to have a level of automation where afterwards developers can make changes and make it workable. Maybe also connected question in terms of test automation coverage, because if it is intelligent enough to detect the areas around pull request reviews, maybe also in terms of coverage, it can be useful then.
Elesin: Can it generate unit tests? What’s the unit test coverage? Can it also increase?
The team that uses GitHub Copilot, my expected outcome for them was to increase the unit test coverage of the projects that they have in their purview, the target was 70%. This was really almost impossible for colleagues that joined newly to write unit tests and also still ship value with code. They optimized by using GitHub Copilot to generate unit tests, which then increased our test coverage from around 0 to about 50%. We’re still getting there, but we’re not there yet. It also increases the test coverage. I think they improved efficiency by 15% to 30%, so there’s unit test generation, which is now increasing the unit test coverage of the projects in their purview.
Losio: You mentioned at the beginning that you didn’t use Amazon Q, I think you said, just for the SSO configuration. I was wondering if that’s the only reason, or actually you got better results as well with Copilot. What’s your feeling about it?
Elesin: That was the reason. Now we have it enabled. The main reason, and the only reason we didn’t use it initially, was because of the access reason. Amazon SSO, we had Azure SSO already. Why maintain two things? Because that’s a security exposure on its own. That was the reason. Now we solved that.
The other side of GitHub Copilot is that we didn’t get statistics of the usage, so we didn’t know how many people were using it, and what they were using it for. Now we’re switching to Q. It gives us the cost visibility. It gives us how many times that engineers are accepting the recommendations from Q Developer itself. Gives us the number of times that recommendations are rejected. It also gives us visibility into how the cost is developing. We just started rolling that out. The main reason was the fact that we had to use AWS SSO, which at that point we didn’t want to use.
Losio: I was quite curious about the integration of different services in that sense. I was curious if there was as well a choice of like, try to pick up the best of every provider, or integrate different providers, how it works as well, or it’s more an effort to them.
Elesin: It was more of the security. Now, also to your point, we want to do a comparison to be sure of what to roll out at scale for the entire organization. Because, for us, we want to double down on this as much as possible. It wasn’t the comparison. It was more of the limitation at that point in time.
See more presentations with transcripts

MMS • Anthony Alford
Article originally posted on InfoQ. Visit InfoQ

Google DeepMind’s AlphaGeometry2 (AG2) AI model solved 84% of the geometry problems from the last 25 years of International Math Olympiads (IMO), outperforming the average human gold-medalist performance.
AlphaGeometry2 is a new iteration of DeepMind’s earlier geometry AI, AlphaGeometry (AG1), which could only solve 54% of the IMO problems. Both models operate by using a domain-specific formal language to describe the problems and a symbolic deductive engine to generate proofs. The new model’s improvements include a more powerful LLM based on Gemini, which translates the natural language form of the problem into formal language. AG2 solved 42 of the 50 IMO geometry problems from the years 2000 to 2024, while the average gold medalist solves about 41. Flagship commercial reasoning LLMs, such as OpenAI’s o1 and Gemini Thinking, cannot solve any of the problems. According to DeepMind,
Despite achieving an impressive 84% solve rate on all 2000-2024 IMO geometry problems, there is still room for improvement…AG2 has not solved all IMO and IMO [short list] problems. We hypothesize that breaking problems into subproblems and applying reinforcement learning approaches could close this gap. Finally, in this paper we reported progress on building a fully automated geometry problem solving system, which takes input in natural language and outputs a solution reliably without any hallucinations. Despite good initial results, we think the auto-formalization can be further improved with more formalization examples and supervised fine-tuning.
AG2, like AG1, solves geometry problems by stating them in a formal language which consists of predicates: for example, acompute a b c d means “Find the angle between AB and CD.” AG2’s predicates can cover 88% of the IMO problems; the model will not attempt to solve the other problems.
But first, the problems written in natural language must be expressed in this formal language. To do this, DeepMind uses a Gemini LLM with few-shot prompting: the prompts contain “several dozens” of examples of problem translation. This approach is “very consistent and makes almost no mistakes” on the easier problems.
Once the problems are specified as formal predicates, they are solved using a symbolic engine called Deductive Database Arithmetic Reasoning (DDAR). If the engine fails to find a proof, AG2 uses a language model and tree search algorithm to generate auxiliary constructions, then it re-runs the DDAR engine; this loop is repeated until a proof is found.
Writing on X, Berkeley CS PhD student Yuxi Liu said,
AlphaGeometry2 is pretty cool, but clearly not bitter-lessoned. It has a very 1950s auto theorem proving feel, with handcrafted representation language, logical inference engine, etc…They are just doing autoformalization (succeeding 30/39) and proposing auxiliary constructions during tree search. Many of them require just a single auxiliary construction! Though there are cursed examples that required 12.
Oxford University ML researcher Simon Frieder also wrote on X:
AlphaGeometry2 was published, 2.5 months since we released Newclid without much fanfare (in true scientist style! :D) and two months after TongGeometry. It seems no code was provided for AG2. So now we have two closed systems, AlphaGeometry2 and TongGeometry that we cannot compare. Newclid…is fully open-source, fixed many AlphaGeometry bugs and slightly improved it in terms of performance – and we also have GeoGebra support for better input.
Although the AG2 code has not been released, the code for AG1 is available on GitHub.

MMS • Craig Risi
Article originally posted on InfoQ. Visit InfoQ

AWS has introduced a new capability for AWS Organizations members, allowing administrators to centrally manage and restrict root-user access across multiple AWS accounts. This update enhances security and governance by providing organizations with greater control over the most privileged access within their cloud environments.
Administrators can now get a consolidated view of root- user access across all accounts within an AWS Organization. This includes insights into whether multi-factor authentication (MFA) is enabled, helping security teams enforce best practices.
With the new functionality, AWS Organizations can enforce service control policies (SCPs) to regulate root-level actions, either restricting them entirely or allowing them under specific conditions. This strengthens security by preventing unauthorized use of the root user across accounts and ensures compliance by enforcing critical controls, such as requiring MFA before executing sensitive actions. By mitigating the risk of misconfigurations or accidental privilege escalations, these policies help maintain a more secure and well-governed cloud environment.
AWS recommends keeping root access to a minimum, using it only for essential operations, following the concept of least-privilege access, and preventing any user from having access to full -admin capabilities.
With centralized management, organizations gain greater control and visibility over root- account activity. They can now monitor when and how root accounts are accessed, tracking usage across all accounts to detect potential unauthorized access or security threats. Security teams can also audit compliance by ensuring that root users adhere to organizational policies, such as requiring multi-factor authentication (MFA) or restricting high-risk actions. Additionally, administrators can enforce MFA and apply service control policies (SCPs) to limit root-user privileges, ensuring access is restricted to only essential actions and reducing the risk of misuse or compromise. Should a person need to be granted root access to perform a specific task, there is still a provision of a root session that can provide this access temporarily without needing to provide a person with this level of access permanently.
Previously, organizations in AWS had to manage root-user access at an individual account level, increasing the risk of inconsistent policies and potential security gaps.
Both Azure and Google Cloud also provide hierarchical management structures and centralized identity and access management through their respective Management Groups and Identify and Access Management systems, and this update brings AWS up to standard with these approaches.
This feature is available to all AWS Organizations customers. Administrators can configure root access policies within AWS Organizations and use AWS IAM policies and SCPs to enforce restrictions.
Presentation: A Zero Trust Future for Applications: Practical Implementation and Pitfalls

MMS • Ashish Rajan
Article originally posted on InfoQ. Visit InfoQ

Transcript
Rajan: I’ve been in cybersecurity for a little over 14 years. For the past 7-plus years, I’ve been primarily working in the public cloud space, so AWS, Azure, Google Cloud, Kubernetes, cloud native. That’s primarily the space I’ve been. I’m fortunate enough that I’ve worked with a lot of Fortune 500 companies. I’ve worked with them on strategy or how to get those things implemented.
A lot of the things you would hear are learnings from what a lot of people have tried doing, what we have failed at. Hopefully that comes across as well. My last one was a CISO. I’m still a CISO for now in an EdTech company called Kaizenteq. I recently discovered my love for cloud security training. That’s the Cloud Security podcast I run with my co-founder.
Zero Trust (ZT) – Basics
I know we had a few different levels in terms of experience and people who’ve seen zero trust. I did want to start by leveling up the playing field for everyone once. Don’t worry, I would not try and bore you guys with a government diagram, which is a very tiny one there.
Essentially, the way I would describe zero trust, as much as there’s a negative connotation to it, and a lot of companies try and change it to something more better than zero trust, because I think the initial thinking behind this was that we do not trust where the communication is coming from, so I want there to be a “trust zone” created between whether it’s my network, whether it’s my identity in the network, whether it’s the applications that are running in the network, or whether it’s my devices that are in the network. There are different ways to describe this.
Another common term that is very used quite often is ZTA, or zero trust architecture that a lot of people talk about. The idea is that you’re using the principles of zero trust to build an architecture. When you Google it, that’s where most of your terms come in from, ZTA, ZT, this one called ZTNA, which is network.
I did want to show this. This is the NSA diagram for how they describe zero trust. This is 8th of March 2024. You probably find that a lot of the information you find, or at least when we started googling for it, because we are trying to find a consulting company that could do it, but then most of your time, you land on a government document. The reason for this was because when America, as they do, they started saying, we need zero trust everywhere, especially when the presidential order came in.
A lot of the government documentation got updated, and they had a limit, or at least they had a timeline they had to do something about within the next two years. You’ll find a lot of updated documentation, if you’re looking for a reference point, primarily from the government organizations, and they’re trying to do this at their end. How much of it is done? It’s a work in progress. I’ll probably go through the pillars in a bit. For anyone from the UK, I definitely covered, try and finding a local source, but NCSC has not updated their diagram since 2021. NCSC has some documentation, but there hasn’t been updates since 2021.
The point being, they all still rely on the same basics that I called out earlier. If you are thinking in terms of where it is important, and maybe you are not in a public sector at this point in time, but there’s a very popular analyst firm which makes all these predictions and makes our life difficult with more acronyms. Gartner is the company that I’m talking about. They came up with a prediction that by 2025, 60% of the public sector would be at least doing something in zero trust.
In fact, if you talk to most government organizations across the U.S. and some parts of UK, they’d already have some projects already in line, starting with zero trust. There’s a lot of conversation just comes up, “We’re doing this for zero trust. We’re working towards zero trust”. Mostly the public sector. I personally have not seen a lot of the private sector talk about it as much, and different reasons for it. Most of them are busy with GenAI, but maybe zero trust will come in soon.
In terms of the market cap, and I feel I should have an asterisk next to it, because this is a second prediction from Gartner on how big the market would be by 2027, it’s $3.99 billion. If I go back to my previous slide, public sector usually has a lot of money, so I can imagine that’s where the money is coming from. I’m sure it’s all a mix of private and public. That’s where the number is coming from.
Zero Trust – Practical Foundations
Now that I’ve laid the foundation for zero trust, and at least everyone understands it, I wanted to add a few more layers onto that diagram that I was talking about earlier. This is the simplest way to understand zero trust. When people talk about zero trust, they usually talk about these five pillars. They talk about identity. They talk about device, network, environment, application workload, and data. A lot of you are already quite experienced, and I don’t need to explain what each one of them mean.
In the context of zero trust, this is probably more referred in the context of how these five foundational pillars are applied. We’re not going to go into the diagram. If you just focus on the middle, there’s a thing called policy enforcement point. The whole idea of zero trust is that across these five pillars, we have some policy engine or policy enforcement point that helps us make the call for, is Ashish the right person to authenticate? Yes, he or she has the right username, password. Is Ashish authorized to log in into this? That’s another policy call. Is Ashish coming from a trusted network? That’s another policy call.
The idea is that you would be able to use a policy engine and hopefully get to a point where you can automate a lot of that policy to be able to do zero trust. I do want to call it out. It’s an ideal scenario. This is an ideal diagram. I don’t know how many people have policy enforcement points. We have a lot of policies and procedures that we have seen and we’ve worked with for a while. This is what they would like people to go.
Again, this is a diagram from 2021. I do want to keep you guys updated in terms of the timeline, in terms of how quickly things are moving. At this point in time, the idea behind those five pillars is to be able to put this through a policy engine that helps us make the call, because we don’t really want to be doing manual approvals for every time Ashish wants to access this HR application.
This is the foundational piece. I’m not going to talk about the five pillars, because I think you guys are smart enough for that. I wanted to start by what they mean by what should the zero trust journey or architecture stand on? I’m assuming everyone has some understanding of IAM. Everyone knows IAM? All of us know authentication. Everyone knows username, password. Everyone knows the fact that we need to have single sign-on, if we want it to be like a federated authentication. The reason I bring this up is, identity has become the new perimeter across cloud, across your on-premise, even though network used to be the first one.
The way I would describe identity these days is how on-premise used to be, at least for people who are from a security background. The way we joke about this is that on-premise was like a castle. You have one bridge in, one bridge out. If you get in, you can access everything you want. If you have to get out, there’s this one path out to the internet. I would describe public cloud as an amusement park. Imagine Disneyland, we have multiple entries, multiple exits, you have no idea who’s coming in, but everyone wants a VIP pass as well.
They want to get on every ride that’s possible. You’re trying to go, I get it, but why don’t we just start with the limited pass first and then start adding a lot more? The point being, identity is a lot more complex. When we talk about identity in 2024 where we have cloud environments, we have on-premise environments, we even have OT, IoT devices that are out there as well.
We have started having a lot more conversation about not just human users, but non-human users as well. I’m sure people who are in the cloud space already know this. Whether you’re Azure, AWS, or GCP, you’re already dealing with non-human users, machine users that are going to just do a task, servers that have identities, that have permissions that can be used for it as well. The foundation pillar for zero trust in 2024, at least the way we have envisioned it has been more around the fact that human users, non-human users, for example, for humans we know MFA works.
For humans, I want to know that, yes, Ashish has authenticated. What does that look like from a non-human user perspective? What is machine user doing already? As I say that, I will also say, this is probably the place where most people start. A lot of us have already proved we know IAM. A lot of us have been doing IAM for a long time. We’ve already started on the zero trust journey without even knowing it. We wouldn’t let any random person on the internet just authenticate to our application. Hopefully I’m keeping some part of my promise where all of us are at least walking away with doing zero trust, at least starting to do it.
The second thing that we will talk about after this is, yes, I’ve made sure that Ashish has the right username, password, but is he coming from the right device? Do I trust the device he’s coming from? That’s the second layer of foundation that people talk about, where, if you’re trying to implement this in your organization, the very first tier, the reason people start with identity is because it’s probably the most understood so far. It’s also the place where we have the most maturity in most organizations, since we’ve been doing it for so long. Unless you’re using a custom application. I don’t know if someone works in mainframe? Those things use a numeric password still, like a 4-digit numeric password. Hopefully you’re not in mainframe.
Outside of it, primarily, you’ll find we have a good handle on identity as a general tech community. The next layer that is spoken about these days is non-human users. The second one after that is identity of the actual endpoint, or the device, or the server, or the laptop that we’re coming in from. That’s our second layer of foundation, where, once we have started working on the identity piece, we have some sense of, I’m pretty confident, identity is good. I’ve got MFA for human users. I have hopefully least privilege or role-based access control in place to have some confidence that only the right people have access to the right information.
The next layer of zero trust you probably would think about is, the device they’re coming in from, is that trusted? The second foundation is the unified endpoint management. To start in that journey is when a lot of people start doing your network segmentation, is another word people use. They said, on-premise, I had this super DMZ zone that I’ve maintained for a while. I’m able to look at this and go, I’ve got a demilitarized zone. Anyone can do anything in there, but I have a private zone. The point being, you need to know what your identities are and what your endpoints are going to be that you trust. I will talk a bit more about this in a bit later slide when I talk about the use cases.
The next one, this is probably the hardest one to go for, resource ownership, tools, and processes. Everyone I’m sure has an asset management system which is very mature, very dynamic, which is more than Excel sheet. For context, I was on a call with a CISO for a FinTech company, and they have over 400 AWS accounts. They were using Google Excel for recording 400 AWS accounts. I’m going, “This is great. You’ll be fine”. My hope is you guys have a better one.
The reason I say this is the hardest one is because of the complexity of environments these days, you have on-premise, hybrid cloud, multi-cloud, cloud native. There’s Kubernetes self-hosted, Kubernetes which is managed. Complexity in compute as well. Now we have multiple CI/CD pipelines. A lot of you are dealing with multiple languages being used in the organization as well.
At the same time, you have to find the balance on being developer friendly, because you don’t want to limit their speed. These days it’s not easy for you to keep at least a log of how many real-time assets you have. If you were to go down the path of doing zero trust from a foundation perspective, I personally feel this is probably the hardest piece. Like, identity, we got this.
Endpoint, to an extent, we have all these endpoints that we know that at least someone in corporate network knows how many devices we have. Someone in DevOps team or cloud security team or cloud engineering team would know how many cloud accounts we have. It gets a bit muddy at the more complex environment. At least for me personally, I found this is the hardest one. Because even if you get the resource, next hardest part is, who’s the product owner? Who’s the owner for this? Is the owner still in the company? No idea. The longer the organization has been in existence, the more complex this third one goes. That’s why, at least for me, it’s the third one.
Data classification, of course, is a security thing, so we have to talk about data as well. At least since I moved from Australia, I find data is even more important in the Europe and UK regions. GDPR is actually a thing. I’ve been using it wisely. A lot of organizations sometimes don’t even have a data classification. I was joking about the whole GenAI and AI space earlier. I’ve been fortunate enough, through the podcast, I get to talk to a lot of people.
A lot of people have over 200 GenAI or AI related applications already, they’re working on today. Kubernetes or containers first, is a very real strategy for a lot of organizations that are trying to go fast, even if it’s doing AI projects. You’ll find what at the moment is not spoken about is the incident response for it. Being security people, we’re a bit paranoid. There’s a reason for that paranoia sometimes, where, if an incident does happen, do we know the risk that we are exposing ourselves to? Is it a high risk? Is it a low risk? Is it really something that we should be worried about, if it’s just public data? I’m like, yes, we have the website, as long as it’s not defaced.
If it’s PII or personal identifiable information, like my driving license or my passport number? Think about this from a developer perspective, the application you’re building, you would not want that to have any secrets or anything which is customer data sensitive, because that’s where the trust of the actual customer is coming from. This one is partially easy, because you can have a data classification for. I think the simple ones, if you’ve never done this before, it’s literally what is confidential, what is private, what is public.
Those three are the simplest data classification to go for. Anyone can do it. As an organization, it’s very easy to tell what is confidential, what is private, what is public. The hard part over here is, imagine if you are an organization that’s been there for years. This is before internet, and there are a lot of companies that have done this, before data center was a thing, and now there were data centers, and now they moved to cloud, and now they’re doing multi-cloud and the data center.
One of the biggest challenges we found was that a lot of data from certain number of years ago is no longer relevant. Would you have the time and money to spend on going back on that mountain of data that has been left for years? No one has classified it. No one knows it’s even relevant. No one wants any data that is probably about something which is not even a system these days. Is the business ready to spend money on a project that’s going to go through data for 25, 30 years?
At what point do you draw a line for, actually, I only care about data for the last 10 years or 5 years, whatever that tenure is. As you can see, the complexity and the practicality keeps going down as I keep going down. The intent is to at least have you informed for what is practical. Hopefully this gives you that information.
From data classification implementation or foundationally, when we spoke about, we have a data classification, we know confidential, private, public. How do we find out about this data? The first challenge we had was, how much data are we ok to classify? The second challenge we had was data sprawl. Maybe all of you have done a big data project when people were talking about big data before GenAI.
A lot of that conversation was primarily around, we have a data scientist who happens to work for a university who wants access to the data, and I promise they’ll delete this after. Give it to them. They said they deleted it. You have no idea if they’ve actually deleted it. That’s just one scenario, that’s called data sprawl. That’s a very real thing as well. There’s a whole category again Gartner has created called DSPM, if people are interested. It’s a data security posture manager which helps you identify any sensitive information across your network. The idea being, it is a real challenge.
Even if you were to just classify data, just to identify where’s my data, if you still want to call that, but that’s basically the other part which a lot of people struggle with. I didn’t even know where to start if I were to just go 20 years, 10 years, 5 years back. How many projects have we done? We have 400-plus applications with many contractors that came in and went, is that in their personal laptop, not in their personal laptop? Who is going to answer that question? At least that’s where I felt that’s the complexity for that.
The other one is unified logging. This is kind of like IAM. A lot of us already do logging for performance monitoring. A lot of us do logging for error management, troubleshooting. Some of you may already have security logging being separate as well. The idea behind zero trust foundation is also to have unified logging for data lake. I feel like this data lake’s work has been thrown everywhere. Everyone has a product called data lake as well. I think this is where it came from, unified logging.
The idea behind that all of us, instead of just basically having these multiple sources for logs, we should probably be able to have all of that into a central storage called data lake. As we do that, we can use it for security, we can use it for performance monitoring. We should be able to use it for troubleshooting and any other unusual activity that you have to monitor on a day-to-day basis for application. That’s where the unified logging comes in for.
The biggest challenge you find here is that there is no unified framework for how do I differentiate between an application log versus a security log versus a memory log? There is no common framework that brings all of it together, so I can just type in a query for, Ashish logged into linkedin.com today, and he basically made a post, which was weird, because he’s at the moment speaking, but somehow there’s a post out. Someone has to go and find out that in the log.
Separating that information, what that query would look like, that’s a lot more complex question to answer. There are some answers. Cybersecurity specifically has an open-source cybersecurity framework for logging. Security logs can be generally categorized into a known template so you can use it.
Last one, this is probably my favorite topic these days, at least for 2024, for two reasons. One, in general, there has not been a lot of conversation about incident response in the public cloud, cloud native world. Most organizations that I speak to, they believe that their on-premise incident response plan works one to one in the cloud context. Even though we’ve now used Kubernetes for building the same application, we’ve rebuilt the application using cloud native, somehow we have complete confidence that incident response plan from our on-prem is going to work in cloud.
The idea behind including incident response in zero trust is that, if you were to successfully go through all the initial ones, to an extent, over time, the number of incidents should reduce. You should technically get to a point where you should be able to connect without a VPN onto any network, and they should be able to validate that, yes, Ashish is coming from a trusted device. Doesn’t matter if he’s on the internet, but we know that device. We know Ashish has the right credential. We can maybe ask for a second form of authentication if we just want to level up the trust.
Primarily, we trust where he or she is coming from. That level of trust over time should mean that the number of incidents that you have to respond to should reduce. Initially, you’ll definitely find that building up what would an incident response look like in the new world of zero trust, if you’ve gone through all of that, significantly changes.
Even something as simple as, there’s an incident, how do I give access to my SOC team or incident response team to that environment? That’s a very difficult question to answer when people have multiple ways of doing zero trust. That was the foundational pieces. Hopefully, that at least sets some foundation for how real some of these foundational pieces may be in your organization.
Zero Trust – Misconceptions
I want to talk about some misconception as well. Some of you have tried doing zero trust. Some of you may have heard about zero trust. The first misconception that I’ll talk about is that asset management thing that I was complaining about earlier with an Excel sheet with 400 AWS accounts. It’s not a bad place to start. At least you know what you’re looking at. As much as I was complaining about it, the myth is that you should need to have a perfect inventory for you to start doing zero trust.
Let’s not aim for perfection, because even zero trust people themselves don’t want to aim for perfection. It’s supposed to be a journey. If you have an inventory that you feel is of critical applications that you want to enable with zero trust, I think it’s a good place to start. You don’t have to have a perfect inventory to go for. The other one is, you can buy a product for zero trust. There’s a lot of vendors. They would say, you can just buy zero trust, including Microsoft.
A lot of vendors have started calling out that we’ll be that one solution for zero trust. We all know how realistic that is. There is no product that can solve zero trust for you. Even let’s just go to foundation pieces, there’s no machine out there that can solve that problem for everyone. Even if they tried to, it’s only part of it that would be solved. To make sense all of it together, it’s not even practical. That is definitely a big misconception. Hopefully people don’t have that in mind.
The other one is end-state vision. I was talking about perfection on my first one. Most of the zero trust work we have done has been around the fact that we have a North Star. We go with the fact that we want to at least have our identity, which is at least our new perimeter that we’re dealing with. We trust that should be zero trust. What that looks like may differ for the risk level of your organization.
Some people are ok with the fact that, for me, zero trust is that Ashish can log in from any device he wants, as long as he has the right username, password, has MFA. Some people may say, actually, that’s not good enough. I want Ashish to be able to come from a laptop that is issued by the company and has the right software, so we can check for endpoint security as well. Or I have a device log that checks for what that device is. It depends on how you would want to approach it.
Depending on how flexible your organization is with that definition of what that could look like for zero trust for you, feel free to make the choice, because no one has really set out that it’s prescriptive architecture for this is how you build zero trust. You would find various examples of people trying to implement their version. The best version is the one that you find that works for your organization, so the best tool for the job.
Obviously, I’ve been caught talking about identity perimeter. The network perimeter is still important. We still have on-premise environments. We still have data centers a lot of us that work in. Network perimeter isn’t gone away. It’s just that now you need additional context around it as well. Some of you may be getting out of network perimeter soon. If you feel network perimeter is not needed, it is still very much needed, because that’s what your trusted zone would be. That was all the misconception. I’m sure there’ll be plenty more. I just wanted to add that in. Don’t buy a product which says zero trust. It’s probably the theme over there.
Zero Trust – Business Use Cases
Business use cases. I did promise you guys, at least you’ll be able to walk away knowing whether you can implement zero trust today, if you have implemented, what else you can add to it. I’m going to go through a few use cases. The first one, which, again, links back to identity. Already, as you can see, there’s a theme coming, human to application. We’ve been doing human to application, username, password, for a long time. To a large extent we trust that the username, password with an MFA or some other verification for trust, were good enough to validate that, yes, this is a really good use case for you to start building zero trust at least with identity in mind.
As I said, based on the risk level of your organization, you may already be doing zero trust because you’re doing IAM with MFA, or federated identity with MFA. Service to service is the next business use case. This is not just your application to application, but it’s also application to your cloud service provider. For people who use Terraform, it could be to your Terraform cloud. It could be to your CI/CD pipeline. There’s a lot more services in play these days, which is not custom, which is not a thick application, which is basically something installed on a server. A lot of the applications these days they have APIs. They allow for you to have a programmatic access to them.
Everything that you do on the cloud service provider is enabled by API. It has authentication. It has allowance for MFA. A lot of the business use case, you would find that they start with human to application, which gets to a point of comfortable, human to application. In this context, humans are internal employees, but for your case, it may be external customers as well who use your SaaS application. Or maybe you’re a bank and I’m a user who complains on the internet about you guys.
That’s probably the most well aware business use case people have been doing for a while. It’s about adding layers based on the foundation we spoke about, what layers would you want to add for zero trust. The same goes for service to service as well. A lot of us may have been doing mutual TLS as a thing between services, two microservices want to talk to each other. You probably want to authenticate them.
If there’s a backchannel, you can use SSL certificates for it. Or you could go down the path of saying, we probably want to have some user that we rotate passwords for. That’s a bit more complex, but mutual TLS usually does the job. What layer would you want to add to that? Would you want to have a trustful network? Would you want to have a data classification down there that, for any external third party, we only want non-personal, non-PII data to go out. Again, how you want to do it for your organization, but that’s what this business use case is about.
Essentially, the OT and IoT environments are probably the next business use case where it’s an environment which has become a very software defined network. You can have APIs there. You can communicate with devices which traditionally would have just been, someone has to walk up there, plug in a laptop and plug a USB and update the firmware. We’ve come a long way from it. There are sensors these days that are available on devices. Think about any physical thing. I’m thinking of a tractor for people in the farming industry.
People in the road whenever you pass the toll gate, that’s basically automated as well. I did an interesting project in Australia where we have motorways, we used to have this massive boom gate, so you never have to tap in, tap off anywhere. Most cars these days are smart enough to have that little tiny box in your car which detects the fact that you’ve passed a gate. The project that we were involved in doing security was around, how do you make sure the information that’s collected on those toll gates on the road, on the highway or the motorway, how do you ensure the integrity of the information coming across?
Because literally, no one’s really standing at every toll gate to make sure no one’s plugged in a laptop in there. It’s like, how do you trust that information? How do you trust the fact that, yes, Ashish did jump into the highway or the motorway in the first gate, came out in the 10th gate, or the second gate? Can he bypass it? There was a lot of interesting scenarios that came up as part of that project. I think for me, that was the closest experience that I had with it. I’m sure your scenario would be a bit more different. We didn’t end up doing zero trust there, I think, but had some principles. That was an interesting project that I got to be part of.
Next one is operator to infrastructure. Operator to infrastructure is a lot more about the fact that I have automation, I don’t want to be, maybe NoOps is a thing, apparently, but we’re very far from it. As much as I’ve tried seeing it, it’s not there. I personally feel that all of us do generally want to get to a point where humans don’t have to intervene a lot in things that can be automated. Even things that are automated these days still requires a human to trigger the action.
The idea behind this is that if there is a known set of processes that we can work with, we should just be able to use them, and it can be just scheduled. I think my alarm at 7 a.m. every morning should just happen every day, for Monday to Friday. I don’t have to schedule it. The use case over here, and I personally have not seen it worked out, but they’ve said some people have done it, but you’re able to get to that point where we are already doing a lot of automation.
How do we get to that automation through zero trust that we don’t have to level up our trust every single time, but we can still do automation. Many of you who may have tried doing automation with MFA, I can tell you, super hard. As engineers would ask you, it gets a bit difficult in a complex environment. I personally haven’t seen this, but it’s a business use case that has been shared publicly on the internet, so I wanted to put that in there.
Last one is human to data. This is more in that privacy space, as well as knowing what data we have access to. I spoke for the data classification earlier, so I probably would not share a bit about this. It’s very obvious what that human data piece is. Last one is probably more important, which is custom applications. A lot of us have legacy applications. Hopefully, no Windows 2000 anywhere, but if there is, you can remove it now. I give you permission to remove Windows 2000 from your environment.
The idea being, we have a lot of legacy applications that we unfortunately still carry, because it’s very critical for our business. A lot of focus for what I spoke about earlier has been around microservices, cloud environments, Kubernetes, containers, but these legacy applications are still required. They are still going to be there for another 20, 30 years. Mainframe probably even longer.
Once you’ve done most of the use cases on the top, you should be able to eventually get to the point where you can actually work on custom applications that you have developed internally, which may just use username, password, which may not even have MFA. How do you develop trust for that? I haven’t seen anyone do that, but it’s the ultimate dream for people who would be able to get there.
Zero Trust – Where to Start
I spoke about foundation. I spoke about misconceptions around zero trust. I’ve spoken about the use cases as well. For some of you who probably have not started this, this is probably a good one to at least set the foundation for where or how you want to go with this. The first one being, reasonable zero trust project goal.
As I was talking about the risk level, your organization may just say, we got the identity pillar covered because we have applications that are authenticating. We have applications that are authenticating and require MFA. No one who’s not an authorized user can access the application. You’re happy with that coarse grain. You don’t want to go fine grain in terms of how much level of checks you’re doing in terms of the user authenticating.
That could be a good enough reason as well. It’s about articulating what that North Star would be for your organization if you were to walk that journey of being zero trust, or at least, when I say, being zero trust more like you want to walk that journey of getting to that end point. What does that North Star look like for you? That is probably the first place I’d recommend you start. Ideally, the identity, my personal favorite is because a lot of us already understand it. It’s already been done.
Human to application, again, another one that is a lot more. We’ve done that for years. Those two are the easiest ones. Networking and data classification, as I’ve spoken about challenges, data, the moment you start the conversation, I at least found that the last time we tried having the conversation, it got shut down really quickly, because who’s going to spend the money to go back to 20 years of data.
That just basically just meant dead end. I totally understand this. It’s not a business use case. Is the upside of having us go through maybe one or two resources looking at 20 years of data, classifying how much we actually care about, how much we don’t care about, and probably spend a whole year on it. Is it really worth the investment, or should we just try and go on this GenAI train? Maybe. It’s a question for the organization. I would say the big two, if my personal recommendation, would be that identity piece and the human to application. Those two are usually good ones to start, because we’ve been doing it for a while.
We already talk about service to service in terms of microservices already. A lot of it has already been done. The foundation is already there. We just need to add what our North Star for zero trust looks like. That would be probably my recommendation for big two. You could totally go down the data path and do other things as well.
This is probably a hard one for everyone, even if developers, security people, documentation is not a thing. Hopefully GenAI is supposed to make it easy. Maybe this is possible now, that we don’t have to document anymore. At least the way I see it is that a lot of us have started walking the path now that most of our environments are supposed to be dynamic. A lot of us may be mature enough that we make changes in environments on a daily basis, on a weekly, monthly, quarterly, much more frequently than what we used to. If you are one of those organizations, you probably are already doing a living architecture diagram where it just keeps changing, or maybe it doesn’t change.
The idea behind zero trust is that, more than the living part, it’s more the flexibility part, where it provides you flexibility to add more layers, rather than make it more stringent. That’s where the living architecture piece comes in. It’s not from a perspective that we have defined that there’s a three-tier web architecture which is going to do all these microservices, API, whatever. It’s more around, how can we give flexibility that we still continue to work at the speed that we have always worked as an organization. That’s where that comes in from.
Other one being, realistic scope to building authenticity. This goes back to the first point, which is the reasonable expectation for what you want your North Star to be. That would help you scope for what you want to cover. Again, if you start the journey for zero trust, it’s not a six-month project. It is definitely going to be a while. I think I have been involved with projects I’ve been running for at least two years, and we’re still doing at least the networking part. Segmentation is what we’re working on still.
Having a realistic scope and the realistic timeline for how long it may take, especially if you’re an enterprise that has been there in the industry for a long time, I would probably have a realistic scope from that perspective. Say, the business wants to know, what’s the return of investment for us going on the zero trust path, because you believe in it, how quickly can you show some points on the board for, yes, we are making progress. Have some realistic scope around it.
Retrofitting versus modernization. This is taken from the whole conversation about moving to cloud, but it’s still relevant. You can retrofit into an existing environment, which is where it goes back to the North Star again, for how you classify zero trust. Whatever you’re doing today, you could add layers of zero trust in there, and that becomes like a business use case.
This is what we understand zero trust to be in our organization, and we’re doing this. This is what we are retrofitting it in. Or, you could do what a lot of other organizations are doing or being forced to do in public sector because of the government narrative, they have to completely change how the network works. They have to completely change how the architecture is done. The choice is made based on the North Star that you would probably end up on. I’m going to throw over DevSecOps here, but it actually is something which we talk a lot about in the security community, because we want to work with developers.
I think the idea is all of us want to create great quality code written for great applications that we all can be proud of. Making security integrated, not by making it a stopper, is definitely something all of us believe in. The same policy or the same thought process is required for the zero trust as well. You would find that not everyone is on the zero trust board. One of the organizations had to reword the zero trust because somehow it came across negative to a lot of people. It was more like, “You don’t trust us. We’ve been working together for 20 years and you don’t trust us?” I’m like, no, I didn’t mean it in the English context, but in general.
Some people had to change the name. You may hear the word high trust sometimes as well. There’s a lot of different variations on it, but the idea is the same, where we believe that this would mean that we’ll have fewer incidents. This would also mean that we can give a lot more flexibility to developers. It doesn’t really matter if it’s work from home or work from wherever, the level of trust can still be heightened easily, and the number of threat scenarios also change quite dramatically as well. That is also the point that I want to call out.
Zero Trust – Business Metrics
Business metrics. Some of you are obviously leaders. Some of you may want to share this with your leaders. This is some of the metrics we basically started working on to start showing the ROI for what’s the point of all this money being invested, especially if it’s going to be invested for a long time. Some of the metrics that helped us, in terms of coverage, was how many applications are already zero trust enabled? If your zero trust North Star is, I want to have all my human to application identity covered for that.
How much coverage do you have across the board for your applications? Maybe you have a data lake. Maybe you don’t have a data lake. Is there a centralized telemetry you can have for all your applications? Do you have a security data lake? Maybe adding another layer. The other part where people also get interested in at least from a security perspective, is that, if zero trust means more security, does that mean that the number of detections I can do for threats are higher? Do they reduce over time?
If they do reduce over time, then the ROI is the fact that before zero trust, we used to be 60 incidents in a day, now we’re down to 10. That’s a great ROI to show to organizations as well. Having some historic comparison for the number of security events, that definitely goes a long way in showing that before the zero trust implementation, this is how good we got. Last one, for security people, is probably a pain, which is the number of false positives we get in the environments. This is across the board for most security products out there as well, that sometimes the first time you implement something, there’s a lot of false positive. Over time, the intent is to reduce it.
Zero Trust – What’s Next?
What is the next step after this? My hope is at least you have some practical understanding of where you want to get to with zero trust. You have some idea of where you may already be for zero trust, and some idea for how you can have a positive impact in your organization for implementing zero trust.
See more presentations with transcripts

MMS • Sergio De Simone
Article originally posted on InfoQ. Visit InfoQ

The latest release of the Go language, Go 1.24, introduces several important features, including generic type aliases, weak pointers, improved cleanup finalizers, and more. It also enhances runtime performance in map
default implementation, small object allocation, and mutexes handling.
A type alias in Go provides a synonym for an existing type, which can be useful for readability and conciseness. Now, Go 1.24 allows creating type aliases for generic types, that is, a type alias can specify a type parameter.
type ComparableVector[T comparable] = Vector[T]
type ComparableVectorOfInts = ComparableVector[int]
type ThisWouldBeAnError = ComparableVector[[]int]
It’s worth recalling here that Go provides a similar syntax for defining a new type based on an existing type, e.g. type NewInt int
. Albeit the syntax only differs in the missing =
, the implications are great since NewInt
cannot be used in place of int
.
Interestingly, the discussion about whether introducing generic type aliases and their implications on the language has been going on for over three years.
Weak pointers do not increase the reference count of an object, so when an object is referenced only by weak pointers, the garbage collector can free it. As a consequence, you should check a weak pointer is not nil
before attempting to use its value:
var strongInt int = 5
var weakInt *int
weakInt = &strongInt
...
weakInt.Value()
Weak pointers may be useful when you want to implement, for example, an object cache to avoid objects being retained for the mere fact of being included in the cache.
Go finalizers serve the purpose of cleaning up things when an object is garbage collected. Prior to Go 1.24, this could be accomplished using [runtime.SetFinalizer](https://tip.golang.org/pkg/runtime#SetFinalizer)
, which has several caveats, including the impossibility of defining more than one finalizer on the same object, the fact that finalizers will not work on objects involved in a reference cycle, and so on. To overcome these limitations, Go 1.24 provides a new runtime function, AddCleanup
, which can be used to register a cleanup function with an object:
runtime.AddCleanup(objPointer, cleanupFunc, resourceToCleanUp)
...
func cleanupFunc(resourceToCleanUp CleanUpArgType) {
...
}
The cleanup mechanism fixes the issues with finalizers mentioned above. Additionally, it ensures all cleanup functions are called sequentially in a separate goroutine.
As mentioned, Go 1.24 improves the runtime performance of maps. In particular, it adopts SwissTable as a base for map
implementation and uses a concurrent hash-trie for the implementation of sync.Map
.
Using SwissTable brings 30% faster access and assignment of large maps, 35% faster assignment on pre-sized maps, and 10-60% faster iteration depending on the number and size of items in the map.
Similarly, adopting a concurrent hash-trie enables the new sync.Map
implementation to beat the old one on almost every benchmark.
Go 1.24 includes many more improvements and changes than what can be covered here, including new functions in bytes and strings packages, omitzero json tag, directory-limited filesystem access, etc. While the release notes are as usual quite terse you can find great video summaries on Reddit user GreenTowel3732’s YouTube channel.

MMS • Aditya Kulkarni
Article originally posted on InfoQ. Visit InfoQ

GitHub recently announced the public preview of Linux arm64 hosted runners for GitHub Actions. Free for public repositories, this update provides developers with more efficient tools for building and testing software on Arm-based architectures.
A changelog post on GitHub Blog summarised the announcement. Arm64 runners are hosted environments that enable developers to execute workflows, eliminating the need for cross-compilation or emulation. These 4 vCPU runners, using Cobalt 100 processors, can provide up to a 40% CPU performance increase compared to the previous generation of Microsoft Azure’s Arm-based virtual machines.
The addition of arm64 runners aligns with the increasing demand for arm-based computing, driven by the energy efficiency and performance advantages of the architecture.
Native arm64 execution provides benefits such as faster build times and more reliable testing outcomes compared to emulated environments. At the time of arm64 release on GitHub Actions in June 2024, GitHub had supported Ubuntu and Windows VM images for these runners, enabling a straightforward start for users building on Arm. However, back in June, these runners were available to GitHub Team and Enterprise Cloud Plans customers only.
To use the arm64 hosted runners, include these labels in your workflow files within public repositories: ubuntu-24.04-arm
and ubuntu-22.04-arm
. These labels are only functional in public repositories; workflows in private repositories using these labels will fail.
Standard runners usage limits, including maximum concurrency based on your plan, apply to all runs in public repositories. Developers are advised to expect potentially longer queue times during peak hours while the arm64 runners are in public preview.
The tech community on Hacker News welcomed this development as we saw interesting discussion threads. One user highlighted how this feature could encourage a broader shift toward ARM-based cloud workflows, mentioning the cost-effectiveness of arm CPUs compared to x64.
Another thread enquired about pricing differences between arm64 and x64 instances. One HN user agartner also provided an example of how to use the native GitHub Actions arm runners to accelerate docker builds.
This capability is particularly beneficial for projects targeting arm devices, such as IoT applications, mobile platforms, and cloud-native services. GitHub has encouraged users to share their experiences and suggestions by joining the community discussion.
For further details, interested readers can visit the documentation and also view a list of VM images from GitHub partners.

MMS • Michael Redlich
Article originally posted on InfoQ. Visit InfoQ

This week’s Java roundup for February 17th, 2025, features news highlighting: the release of Apache NetBeans 25; the February 2025 release of the Payara Platform; the second beta release of Hibernate Reactive 3.0; and the second release candidate of Gradle 8.13.
JDK 24
Build 36 remains the current build in the JDK 24 early-access builds. Further details may be found in the release notes.
JDK 25
Build 11 of the JDK 25 early-access builds was also made available this past week featuring updates from Build 10 that include fixes for various issues. More details on this release may be found in the release notes.
For JDK 24 and JDK 25, developers are encouraged to report bugs via the Java Bug Database.
Spring Framework
It was a busy week over at Spring as the various teams have delivered milestone releases of Spring Boot, Spring Security, Spring Authorization Server, Spring Integration, Spring AI and Spring AMQP. There were also point releases of Spring Framework, Spring for GraphQL,Spring Session, Spring for Apache Kafka and Spring for Apache Pulsar. Further details may be found in this InfoQ news story.
Payara
Payara has released their February 2025 edition of the Payara Platform that includes Community Edition 6.2025.2, Enterprise Edition 6.23.0 and Enterprise Edition 5.72.0. All three releases provide critical bug fixes, component upgrades and a new feature that ensures Docker images shutdown gracefully to allow applications to cleanly terminate without data loss or corruption.
A notable critical issue was an IllegalStateException
due to Spring Boot 3 applications failing to deploy to Payara Server 6. This was resolved by ensuring proper initialization of Contexts and Dependency Injection (CDI) during deployment. More details on these releases may be found in the release notes for Community Edition 6.2025.2 and Enterprise Edition 6.23.0 and Enterprise Edition 5.72.0.
Apache Software Foundation
The release of Apache NetBeans 25 delivers many improvements that include: enhancements in support for Java code completion for sealed types in switch
statements; improved behaviour with the CloneableEditorSupport
class such that it will no longer break additional instances of the Java DocumentFilter
class which may be attached to an instance of the Java AbstractDocument
class. Further details on this release may be found in the release notes.
The release of Apache Tomcat 9.0.100 provides a resolution to a regression with the release of Tomcat 9.0.99 that caused an error while starting Tomcat on JDK 17. The regression was a mitigation for CVE-2024-56337, a Time-of-Check-Time-of-Use vulnerability in which a write-enabled default servlet for a case insensitive file system can bypass Tomcat’s case sensitivity checks and cause an uploaded file to be treated as a JSP leading to a remote code execution. More details on this release may be found in the release notes.
Hibernate
The second beta release of Hibernate Reactive 3.0.0 ships with resolutions to notable issue such as: a ClassCastException
from an instance of the ReactiveEmbeddableForeignKeyResultImpl
class due to use of the Hibernate ORM EmbeddableInitializerImpl
class instead of its reactive version, namely the ReactiveEmbeddableInitializerImpl
class; and a NullPointerException
when retrieving an entity using a Jakarta Persistence @ManyToOne
composite table with additional properties in the Jakarta Persistence @IdClass
annotation. This release is compatible with Hibernate ORM 7.0.0.Beta4, and an upgrade to Vert.x SQL client 4.5.13. Further details on this release may be found in the changelog.
JobRunr
The release of JobRunr 7.4.1 ships with bug fixes and new features such as: the ability to switch between different date styles in job table views, e.g., the timestamp when an instance of the Job
class was enqueued; and an enhanced display for more complex job parameters on the job details page. More details on this release may be found in the release notes.
Gradle
The second release candidate of Gradle 8.13.0 introduces a new auto-provisioning utility that automatically downloads a JVM required by the Gradle Daemon. Other notable enhancements include: an explicit Scala version configuration for the Scala Plugin to automatically resolve required Scala toolchain dependencies; and refined millisecond precision in JUnit XML test event timestamps. Further details on this release may be found in the release notes.

MMS • RSS
Posted on mongodb google news. Visit mongodb google news

Two critical flaws in the open-source Mongoose Object Data Modeling (ODM) library for MongoDB and Node.js, along with proof-of-concept (PoC) exploits for both vulnerabilities, were detailed in a blog post by OPSWAT on Thursday.
The flaws are tracked as CVE-2024-53900 and CVE-2025-23061 and have critical CVSS 3 scores of 9.1 and 9.0, respectively.
CVE-2024-53900, which was first discovered and patched in November 2024, can lead to remote code execution (RCE) on the Node.js application server via search injection. CVE-2025-23061, fixed last month, is a bypass of the original patch for CVE-2024-53900, and can lead to RCE by slightly altering the exploit code.
Mongoose helps streamline interactions between MongoDB and Node.js applications and is widely used by application developers, having more than 27,000 stars and 3,800 forks on GitHub, and more than 19,000 dependents in the NPM package repository.
Both vulnerabilities involve the $where operator, which can be used with the populate() function to filter which data is retrieved from MongoDB documents to replace references and populate the application. The $where allows the execution of arbitrary JavaScript code to define specific data retrieval criteria, meaning malicious code could be executed if an attacker controls the input after the $where operator.
Attempting to execute malicious code on the MongoDB server using the $where operator would typically result in an error, noted OPSWAT, as execution on the MongoDB server is restricted to a predefined list of basic operations and functions.
However, OPSWAT Critical Infrastructure Cybersecurity Graduate Fellow Dat Phung, who discovered both vulnerabilities, found that malicious code under the $where operator could be passed to a function within populate() known as sift(), which would lead the arbitrary code to be executed locally on the application server.
By crafting a query to ensure the request will be passed to sift(), while avoiding triggering the MongoDB server error by including a variable MongoDB does not recognize, Phung was able to achieve RCE on a Node.js application server.
The Mongoose maintainers fixed the CVE-2024-53900 vulnerability in version 8.8.3, by disallowing the use of $where within the match property passed to populate(). However, Phung found that the $where operator could still be passed to populate() if it was nested within an $or operator, constituting the flaw tracked as CVE-2025-23061.
Phung developed a new PoC exploit showing that RCE could still be achieved on an application server by nesting $where inside an $or clause, despite the initial patch. This bypass flaw was fixed in Mongoose version 8.9.5, and OPSWAT recommends developers upgrade to the latest Mongoose versions to resolve both flaws.
Article originally posted on mongodb google news. Visit mongodb google news

MMS • RSS
Posted on mongodb google news. Visit mongodb google news

Two critical-severity vulnerabilities in the Mongoose Object Data Modeling (ODM) library for MongoDB could have allowed attackers to achieve remote code execution (RCE) on the Node.js application server, cybersecurity platform OPSWAT reports.
Widely adopted in production environments, Mongoose enables the mapping of JavaScript objects to MongoDB documents, leading to easier data management and validation. However, the function that improves working with relationships between documents could be exploited for RCE.
The first of the critical-severity flaws in the library, tracked as CVE-2024-53900, could allow an attacker to exploit the $where value to potentially achieve RCE on Node.js. The second issue, tracked as CVE-2025-23061, is a bypass for CVE-2024-53900’s patch.
As OPSWAT explains, $where is a MongoDB query operator that enables the execution of JavaScript directly on the MongoDB server, but with certain limitations.
When processing retrieved data, one of Mongoose’s functions would pass the $where value to a function imported from an external library, which would process the queries locally on the application server, without performing input validation.
“This lack of input validation and restriction introduces a significant security vulnerability, as the ‘params’ value- directly controlled by user input – can be exploited, potentially leading to code injection attacks,” OPSWAT notes.
The patch for CVE-2024-53900 added a check to disallow passing the $where operator to the vulnerable function, thus preventing the execution of malicious payloads.
However, the patch could be bypassed by embedding the $where operator in the $or operator supported by both MongoDB and the vulnerable function.
“As a result, an attacker can nest $where under $or to evade the patch’s single-level check. Because Mongoose inspects only the top-level properties of each object in the match array, the bypass payload remains undetected and eventually reaches the sift library, enabling the malicious RCE,” OPSWAT notes.
The cybersecurity organization has released proof-of-concept (PoC) exploit code targeting both vulnerabilities and recommends updating Mongoose to version 8.9.5 or later, which contain complete patches for the two bugs.
Related: Atlassian Patches Critical Vulnerabilities in Confluence, Crowd
Related: OpenSSH Patches Vulnerabilities Allowing MitM, DoS Attacks
Related: Chrome 133, Firefox 135 Updates Patch High-Severity Vulnerabilities
Related: Critical Vulnerability Patched in Juniper Session Smart Router
Article originally posted on mongodb google news. Visit mongodb google news