Category: Uncategorized
![MMS Founder](https://mobilemonitoringsolutions.com/wp-content/uploads/2019/04/by-RSS-Image@2x.png)
MMS • Bruno Couriol
Article originally posted on InfoQ. Visit InfoQ
![](https://mobilemonitoringsolutions.com/wp-content/uploads/2025/01/generatedHeaderImage-1736559495871.jpg)
The Express.js team has released version 5.0.0, 10 years after the first major version release in 2014. The release focuses on stability and security with a view to enabling developers to write more robust Node.js applications.
Express 5 drops support for old versions of Node.js. The release note states:
This release drops support for Node.js versions before v18. This is an important change because supporting old Node.js versions has been holding back many critical performance and maintainability changes. This change also enables more stable and maintainable continuous integration (CI), adopting new language and runtime features, and dropping dependencies that are no longer required.
Following a security audit, the team decided to introduce changes in how path route matching works. To avoid regular expression Denial of Service (ReDoS) attacks, Express 5 no longer supports sub-expressions in regular expressions, for example /:foo(d+)
.
app.get('/:id(d+)', (req, res) => res.send(`ID: ${req.params.id}`));
Blake Embrey, member of the Express.JS technical committee, provides an example of regular expression (e.g., /^/flights/([^/]+?)-([^/]+?)/?$/i
), that, when matched against '/flights/' + '-'.repeat(16_000) + '/x'
may take 300ms instead of running below one millisecond. The Express team recommends using a robust input validation library.
Express 5 also requires wildcards in regular expressions to be explicitly named or replaced with (.*)
** for clarity and predictability. Thus, paths like /foo*
must be updated to /foo(.*)
.
The syntax for optional parameters in routes also changes. Former Express 4’s :name?
becomes {/:name}
:
app.get('/user/:id?', (req, res) => res.send(req.params.id || 'No ID'));
app.get('/user{/:id}', (req, res) => res.send(req.params.id || 'No ID'));
Unnamed parameters in regex capture groups can no longer be accessed by index. Parameters must now be named:
app.get('/user(s?)', (req, res) => res.send(req.params[0]));
app.get('/user:plural?', (req, res) => res.send(req.params.plural));
Express 5 additionally enforces valid HTTP status codes, as a defensive measure against silent failures and arduous sessions of debugging responses.
res.status(978).send('Invalid status');
res.status(978).send('Invalid status');
Express.js 5 makes it easier to handle errors in async middleware and routes. Express 5 improves error handling in async. middleware and routes by automatically passing rejected promises to the error-handling middleware, removing the need for try/catch blocks.
app.get('/data', async (req, res, next) => {
try {
const result = await fetchData();
res.send(result);
} catch (err) {
next(err);
}
});
app.get('/data', async (req, res) => {
const result = await fetchData();
res.send(result);
});
While the Express team strives to keep the breaking changes minimal, the new release will require interested developers to migrate their Express code to the new version. Developers can review the migration guide available online.
Express.js is a project of the OpenJS Foundation (At-Large category). Developers are invited to read the full release note for additional technical details and examples.
![MMS Founder](https://mobilemonitoringsolutions.com/wp-content/uploads/2019/04/by-RSS-Image@2x.png)
MMS • Shane Hastie
Article originally posted on InfoQ. Visit InfoQ
![](https://mobilemonitoringsolutions.com/wp-content/uploads/2025/01/logo-big.jpg)
The Agile Alliance has officially joined the Project Management Institute (PMI), forming the PMI Agile Alliance as of December 31, 2024. The partnership aims to enhance global project management by integrating Agile principles with PMI’s resources and reach. While many celebrate the expanded opportunities for collaboration, professional development, and innovation, critics express concerns about the potential dilution of Agile values and the loss of independence for Agile Alliance.
According to the announcements, this partnership is expected to have a significant impact on the project management and Agile communities. It represents a step towards integrating traditional project management approaches with Agile methodologies, potentially reshaping how projects are managed and delivered across various industries. There has been a lot of commentary in the agile community in particular; while some see this partnership as an opportunity for growth and integration in the project management field, others express concerns about the future direction of Agile practices and principles.
The rationale for the partnership is presented as:
- Evolving Project Management Landscape: The partnership recognizes that modern project delivery requires fluency across various delivery practices
- Global Reach: PMI Agile Alliance will benefit from PMI’s worldwide presence and resources
- Expanding Agile Influence: The collaboration aims to broaden the understanding of Agile and agility beyond the tech industry and software development
- Enhancing Project Success: By combining PMI’s structured approach with Agile Alliance’s adaptive principles, the partnership seeks to empower professionals to achieve greater project success
Financial considerations are not mentioned in the press releases, but Mike Cohn, one of the original founders of the Agile Alliance, has <a href="http://stated that the decline in revenues from conferences post-COVID is considered a key factor.
According to Teresa Foster, managing director of the Agile Alliance, key aspects of the partnership are:
- Agile Alliance will now operate as PMI Agile Alliance
- Foster and her team will transfer to PMI
- PMI Agile Alliance will maintain its existing membership structure and elected Board
- The core mission, values, and principles of Agile Alliance will be preserved
- PMI members will gain enhanced access to expanded agile thought leadership, tools, and resources, enhancing their agile mindset
This partnership is expected to have a significant impact on the project management and Agile communities. It represents a step towards integrating traditional project management approaches with Agile methodologies, potentially reshaping how projects are managed and delivered across various industries.
Jim Highsmith, another of the Agile Manifesto signatories and a founder of the Agile Alliance, sees the partnership as a step towards Bridging the Divide: Integrating Agile and Project Management.
Concerns raised about the partnership include:
- Fear of Dilution: there is concern that the merger might dilute Agile principles or lead to a return to more traditional project management approaches
- Loss of Independence: There are worries about the Agile Alliance losing its independent voice and becoming subsumed under PMI’s structure.
- Relevance Concerns: Some argue that Agile Alliance was already struggling to remain relevant, and this move might not address underlying issues
- Cultural Clash: There’s apprehension about potential conflicts between PMI’s structured approach and Agile’s adaptive principles
According to Dave Westgarth, there is irony in Agilists reacting to the merger with the same fear and resistance that traditional project managers once showed towards Agile.
Podcast: Intentional Culture and Continuous Compensation: An Interview with Austin Vance
![MMS Founder](https://mobilemonitoringsolutions.com/wp-content/uploads/2019/04/by-RSS-Image@2x.png)
MMS • Austin Vance
Article originally posted on InfoQ. Visit InfoQ
![](https://mobilemonitoringsolutions.com/wp-content/uploads/2025/01/Austin-Vance-twittercard-1734511571569.jpg)
Transcript
Shane Hastie: Good day folks. This is Shane Hastie for the InfoQ Engineering Culture Podcast. Today, I’m sitting down with Austin Vance. Austin is the CEO of a company called Focused Labs. Austin, welcome. Thanks for taking time to talk to us today.
Austin Vance: Yes, thanks for having me, Shane. I’m really excited to be here.
Introductions [00:49]
Shane Hastie: So we normally don’t talk to CEO folks on The InfoQ Podcast because we want to talk to technologists, but I’m told you are deeply a technologist. So whose Austin?
Austin Vance: Yes, well I hope I’m still a technologist. I spend a ton of my time programming still. My firm is not massive and I kind of think the title of CEO is wonderful for a business card, but also probably CEO level decisions are not a big part of a midsize firm. We’re not making large strategic decisions all the time. Instead, a large part of my job is setting a bar for engineering interests and engineering consideration and excellence.
And so at one point I had gotten so far from programming, I got frustrated and I decided to live stream myself programming, started doing that about seven or eight months ago, and now I have 20,000 subscribers on YouTube where I livestream myself coding, and I grew up coding and talking through the code I’m writing so a lot of just random stuff. I’m like, “I want to experiment with this API or this new technology or do a RAG”, or something like that. So I try to code as much as I can, but also have a nice firm.
Shane Hastie: So one of the things that put us together was a conversation about developer onboarding. Why is it so hard and how can we do it better?
Challenges and opportunities in developer onboarding [02:09]
Austin Vance: A good question. I don’t know if it’s hard, might be my first reaction to that question. I think it requires intentionality and a lot of times I think when we bring people into a culture or a company, we are not intentional about how we want them to assimilate. And what happens is a company or group or a team assumes through some form of tribalism, series of notion docs or a wiki that someone will be able to figure out what’s going on and then the missing portions will be hit or will be covered through managerial one-on-ones. But the truth is a culture at any company, no matter the size, is the sum of all of the people that are at the company and have been at the company.
And so as soon as a new person joins, the culture shifts a little bit and it requires intentionality to drive however that person shifts the culture back towards what you want the culture to be as a leader or as someone there. And so sitting down and really understanding, first defining what you want your culture to be and what you want… And culture’ so many things we could talk about that, but what you want your culture to be and then how you can communicate that to a person so they can be the most proficient and effective is really like that’s it. It’s really not that hard, it just requires concentration or intentionality and sometimes we just get too caught up in the day-to-day that we aren’t intentional about bringing that person in.
Shane Hastie: So what does this intentionality look like and what is the experience of being part of that in intentional onboarding?
Intentionality in the onboarding experience [03:34]
Austin Vance: I’ve worked at a handful of places and I think the best I’ve ever experienced, hopefully besides my firm, but the best I’ve ever experienced was at Braintree. I ran a division of engineering at Braintree and I had a great time. I came in, I was the first of a new layer of management and they put me through the same onboarding as all the developers. So developers had some that was theirs where you talked about how do we do CI and what’s our testing culture and how do you do a pull request and feedback on that. So there was that kind of stuff. But then there’s also a bunch of onboarding about what it means to be at Braintree.
And one of the things I really loved about that is over the course of the first, if I remember correctly, it was about a month, you spent about an hour to an hour and a half, a couple times a week with a different person who’d been at the company, and they didn’t always have a big title, but a different person had been at the company for a little while talking to you about something that they loved or cared about deeply at the company.
And for engineering, like I said, it was some of the technical norms that you would expect out of the communication between teams. But then it was also what is meeting culture like, how do we communicate with each other? And then also just like what’s the history of the company? Who are we? What do we want to be? And I thought that level of intentionality and that level of time where it was different people, different opinions, all kind of speaking towards one mission and one kind of personality, which was the personality of the organization, really made you feel like you were a part of something early, early on.
And, I mean, of course I got great swag when I started and other stuff like all companies do, but I don’t think that swag made me feel as part of a new culture as onboarding did. We’ve copied that a lot. And so we over the course of a few weeks go through a handful of presentations, decks and conversations ending with a retrospective with all of our new hires on what we expect out of them, who we are, what our values are, what our operating principles are, what it means to be a developer or a salesperson or a marketer or anything like that at Focused.
Shane Hastie: You made the point that culture is the sum of everyone who is or ever has been in the organization and that you need to deliberately shift it to where you want it to go. What does that intentional culture again look like and feel like? What’s the experience of being in that culture?
Intentional culture design [05:57]
Austin Vance: I mean, the most human answer is when you’re at a place where culture is intentional, it feels good, right? It does. It just feels like things work. When you come into a place where culture doesn’t feel intentional, it can feel chaotic, it can feel misguided. When culture is intentional, what you see is you see common traits between all of your colleagues, coworkers, professionals around you. And it’s the work ethic, it’s the ethos that is the company. It’s not that you all have the same background or you all went to Harvard or you all have an MBA. That stuff doesn’t matter as much. It’s like you all approach work with the same level of rigor, the same level of thought, care, and even more so maybe the company takes the same level of care to how each other approach each other’s work.
It doesn’t mean that there’s a good culture or a bad culture. Some places might be super cutthroat, action oriented, top dog wins, eat what you kill, and that can be okay if that’s what the company wants. And if someone comes in there and they enjoy that culture, they will feel like they belong because it’s obvious what that is to them. And other cultures could be more like team oriented, collaborative, when we win, I win kind of thing, and when I win, we win. And other people might love that, but a person who’s a top tier non-collaborative performer might feel really ostracized or feel like they’re not getting the recognition they deserve. And they would maybe select out if the culture’s intentionally towards more collaborative or more the other way, a more winner takes all kind of culture or something like that.
Shane Hastie: So what is the culture that you’ve tried to instill at Focused Labs and how have you communicated that culture?
Communicating culture [07:31]
Austin Vance: It’s always evolving. Like I said, it’s a sum of all at the company and all that have been. And part of that is the people that come and go really shift the culture too. I never really liked values as a thing that companies had. I’ve worked for companies that had values and they essentially were Wi-Fi passwords, so why do we have them? Excellence or something like that. And people are probably really familiar in this podcast with the Netflix culture deck, Netflix has this great culture deck and it starts with values are not something you just put on a wall, they are the litmus test for who you hire, who you fire and who you promote. And that really hit me in a real way. And so really early in the days at Focused, I sat down with my team of, I think at the time we were five or six people and I was like, “Who are we? And then who do we want to be?”
So values can be slightly aspirational, but they can’t be so aspirational that they’re not true. And we landed on three and so our three values ended up being love your craft, listen first and learn why. A lot of times you see companies with values that are like one word, it’ll be like excellence, integrity, that kind of stuff. And we picked those three because we thought they were a little bit more action oriented. When I see love your craft, I want to work with people that love what they do. And in tech that’s so easy or maybe not so easy, but in tech we see it a lot. I’m very familiar. I came the Ruby community and I’m really familiar with the software as a craft kind of thing, but I want my recruiters to treat talent acquisition like a craft.
I want where they’re honing how and where and why and what they’re doing to find the best talent. And I want my sales teams to treat customer acquisition like a craft where they understand messaging and communication and empathy with all their customers and my support teams to treat support like a craft. And I want my design teams to treat design like a craft. Just every practice at my company should be treated like a craft from the top down. The leader should be a master craftsman where everybody else is learning from them. And the lesson first falls out of that where in order to hone your craft, you have to be open to learning.
And it’s funny, I’m on a podcast where I spend most of the time talking and not you, but my firm were consultants and I have a story actually from Braintree, it might be one of the more embarrassing professional experiences in my life where I sat down in a room really early at Braintree and I had a large organization and a big title. And I sat down in a room full of principal engineers at Braintree and I was on cloud nine because I was a big boy on the organizational chart and they were pitching moving Braintree’s APIs from REST to GraphQL for some specific things, and it was pretty early in the GraphQL days.
And I kind of was like, “Well, that’s stupid, GraphQL’s never going anywhere. No one even knows what it is”. And I kind of talked my way into sounding like an asshole. And I left the meeting and one of the principal engineers pulled me aside and told me that I talked my way into sounding like an asshole. And over the course of the next few weeks he was like, “I actually think that GraphQL’s probably the right choice. You should spend some more time learning about. It’s your first meeting, ever hearing about it. Maybe you should spend some more time figuring out what’s going on here”.
And over the course of the next few weeks, understanding how people use the Braintree APIs, understanding what was going on, understanding more depth around GraphQL, I came to believe that GraphQL is the right decision too. And I wish I had listened first. And so not that we can’t have opinions and shouldn’t, but I think one of the important decisions that we make as individuals when we communicate on teams is like, when do we listen and when do we share our opinion? And then finally learn why is at the very end of all that is this deep and entrenched curiosity and how and who we are.
And so coming back to your question, I think the most important thing about honing a culture is really knowing what you want it to be and then being able to really clearly articulate how someone embodies those traits and then constantly repeating that, that is the most important thing. And I really do believe that if you can articulate, repeat, and share that, performance follows really quickly, actually, like the results just speak for themselves.
Shane Hastie: No, I know from our conversation before we started recording that these values are one side of it. You also then have something else that makes it more concrete.
From values to operating principles [11:52]
Austin Vance: Yes. So values, they’re still a little high level I guess. And so what we wanted to do, and this is specifically for our engineering teams, but we’ve defined a series of operating principles and these operating principles, I kind of think of them like you have the Boy Scout motto and then the things that make a Boy Scout, a Boy Scout like thrifty, kind, clean, reverent, those types of things. And our operating principles are kind of like the things, I forget what part of the Boy Scouts it is, but they’re like the things that make a Boy Scout, a Boy scout. And so we try to define a little bit more clearly what it means outside of a value, so a personality trait or what you should be doing.
And so one of my favorite values that we have is be exothermic. And so since we’re an engineering world, I get so much feedback on this operating principle because everybody’s like, “What is exothermic?” But since we have a lot of nerds on this podcast, I think most people know what it means. But it means when given energy, you create more heat than you have or you’re putting out heat. And in a professional world, what I think that means is bring energy to the situation.
In Wedding Crashers, they say fit in by standing out. And so the way you come into a meeting is not by being quiet in the corner but come with a passion and energy and a charisma that makes other people want to care as much as you do. And that becomes infectious. And so we have these series of operating principles that are guiding principles on how everybody can interact with their customers and each other, and that’s really important.
Shane Hastie: I know that you also have some pretty unusual approaches to compensation. Can we dig into that?
Continuous compensation management [13:26]
Austin Vance: Yes, I absolutely despise the way traditional compensation is managed. The majority of my career, before I started my own firm, I worked at big companies, big, big companies or small companies that were acquired eventually by big, big companies. And so we dealt with compensation in a really traditional way. And if people aren’t familiar, the way compensation is traditionally managed is some sort of raised budget is designed for departments and the whole company.
And that gets kind of passed down through tiers of upper and middle management, distributed in some kind of finger in the wind sort of way based on performance investment and being reasonable based on a 2% raise generally for the mass set. And then on the other side of that, a manager and peers do reviews of each other and they do some form of meets expectations, exceeds expectations or falling behind. And then this manager has a raised budget and they get to dole that out to say they’re 10 or so direct reports based on meeting expectations or not, and it happens once a year. That’s the last thing. Happens once a year.
And I’ve always found that to just be a absolutely horrible way to manage performance. I have spent so much time in my life sitting in rooms with leaders at enterprises convincing them that they de-risk their software by releasing often nightly, daily, weekly if you want to de-risk the release of software, which means that if there’s any failure, you can fix it quickly or we can manage it quickly the same should be true for our people, by releasing often. And a big way we release often or a company releases with its people is by rewarding the people that are being excellent and managing the people that are not proactively. And the easiest way to do that is through not or by giving or not giving raises to people. If I find out only every 12 months, whether or not I’m doing well or not, it’s probably a pretty bad way to live and bad way for the company to handle reward and promoting.
So the way we do it, we run a raise cycle every 6 to 12 weeks for every single person in the company. So we have managers collect feedback constantly through one-on-ones. We have peers, we do daily pair review, feedback sessions, that kind of stuff. We have well-defined job description and growth pathways as you move from senior to staff engineer or junior to normal engineer or whatever. And every 6 to 12 weeks we say, “Where’s this person fit?” And we think they’ve gotten really good, they took a leadership position over these set of features with this customer, they’re actually performing now at this level, and we give a small raise.
And that raise doesn’t have to be thousands and thousands of dollars, instead it could be a few hundred or a thousand dollars or something like that. But if you get that every six weeks, it compounds throughout the year as a good job, a good job, a good job, you’re on the right track, you’re on the right track, and you watch over the course of the year, your compensation increases dramatically, but you’ve also continued to get pushed in the right direction, “Here’s what you’re doing well”.
Also allows us to correct really quickly for anyone who’s not performing, “Hey, we’re not going to give you a raise this time because we were really seeing you disengage. You’ve been showing up late a lot, it’s been really hard for you. You’ve been cameras off and not talking to the customer. You had a feature that you were stuck on for a few weeks and you didn’t communicate that”. Whatever it is, let’s get that fixed and in six weeks they fix it and they come back, they get a raise, right?
Compensation follows so you can actually correct behavior more proactively and bring employees back who might be lost versus I’m kind of sitting in this negative cycle for at most 12 months and then being lost. There’s a really positive way to manage people is through compensation and all the conversations that surround that compensation change. And so that’s how we do it. We continue to hone it. We tried to do it every four weeks. That was way too much overhead. We’ve found somewhere between 6 and 12 is right.
Shane Hastie: Completely different to I’m sure most organizations and potentially disruptive. I wonder you’ve made the point Focused is a relatively small company at the moment. How do you think that’s going to scale?
Can this approach to compensation scale? [17:41]
Austin Vance: I think it scales phenomenally. And the reason I think it scales phenomenally, and I get this feedback a lot, but the reason I think it scales phenomenally is management at scale is the formulation of abstraction layers over complex people systems, right? The same way we create an interface to talk between services, we create interfaces, which are managers, to talk between teams doing individual and important things. And if through culture, training, onboarding, onboarding of managers, everybody understands the value of the lean compensation model or continuous compensation model, then at each level the managers really understand what’s going on and it scales horizontally fairly simply.
The harder part about it and what has been interesting through the tech cycle recently is we anchor the midpoint compensation for each of our titles in the market average, well the 60th percentile or 65th percentile of the market. And so over the course of my career, the 65th percentile of the market has only gone up until about three years ago. And so watching how the mean compensation for a staff engineer has changed was always like, well, you could stay staff engineer and your comp could continue to go for the last 15 years, but then the last three, maybe it plateaued or maybe even a staff engineer off the street would make a little less than someone who was hired five years ago.
And so managing those conversations and understanding that has been a really interesting and really difficult part of the lean compensation, but it’s actually done really well because helped us proactively talk about whether or not the talent that we have has stagnated or is growing. And then the people who maybe are sitting at the same level they had been for a few years, how can we push them to grow more or is that the right talent for the firm?
Shane Hastie: Another thing that I know you have opinions on is the role of management. What is the role of a manager in a tech organization?
The role of the manager [19:41]
Austin Vance: Well, hanging off of compensation, I think a lot of places I see management fit into two buckets. One is very paternal style management where you do a lot of like how are you feeling at work kind of thing and then there’s other places that do a lot of performance style management. Are you hitting your numbers and your metrics? I think management can be both. But I see often organizations lean one way or the other. The first thing I’d say is at Focused and me in general, I believe that management, in order to be an effective manager, you must control the compensation of your direct reports. It is the primary means in which you communicate performance to them.
You cannot be a good manager or you cannot be an effective manager saying, “You’re crushing it. I’m so happy, you’re doing so well, you’ve grown so much, but I’m only able to give you a one and a half percent raise this year”. Because if that’s the case, all of a sudden what you’ve done is the manager is no longer representative of the firm. They’re just a friend of the person and they created a common enemy, which is the company.
And so to me, what management is, is the person who is most responsible or solely responsible for creating predictable attrition inside of their team. And so what I mean by that is their job is to retain the best talent and understand why they are being retained and to exit their poor-performing talent proactively and understand why they are being exited. A poor-performing manager is someone who has people leave their teams and their groups without being able to predict it and or does not remove people from the team who are not high performers. And so that is it. That is a manager’s sole distilled job, but all this stuff falls out of that.
So people are like, “Well wait, so my only job is to fire and give raises to people?” No, no, no, no, no. Your job is to retain the best talent because of the right reasons and exit the correct talent for the right reasons and know when that’s happening. And so how do you do that? Well, your best talent needs new challenges. They need new opportunities. They need growth. They need compensation. They need more leadership. They need maybe a new tech stack. Your most high potential talent needs coaching. It needs one-on-ones. They need mentorship.
Your bottom-performing talent needs direct feedback, needs, needs active management, need performance improvement plans. Like all of the stuff that managers do centers and boils down to I want to keep or exit to the right people at the right time. And so that’s how we train our managers. But all of the tools that we learn about in the manager tools, books and podcasts and all that stuff, those are all for that and we should be using all of those tools to do that. That’s my take on management at the end of the day.
Shane Hastie: I suspect that there’s not many management training classes that teach you about that.
Austin Vance: I think they hide from it. They don’t, and I think they hide from it and we take it head-on in our first management training materials. We say that and we say it’s not heartless. I was sitting around at dinner yesterday talking about this, and people are like, “When I finally have made the decision to let someone go”, the whole team is like, “Wow, why’d it take you so long?” And managers often are the last to make that decision because they don’t have the proactive conversations with each other about how to curate and cultivate the appropriate talent. And no, I don’t think they approach it directly.
Firing is one part of it, but firing is a very dramatic and last resort answer to a lot of other failed things. An interview failed, coaching failed, one-on-ones failed, performance improvement failed. But those things had to be tried and eventually it’s kind to allow someone to go someplace else or makes it so so they can go someplace they can be successful because normally they’re not because the culture’s not right, not because they’re bad.
Shane Hastie: Austin, a lot of deep and interesting thoughts there. If people want to continue the conversation, where can they find you?
Austin Vance: I am pretty active on social, so you can find me, Austin BV on LinkedIn, X, all over the place, but any social with that handle is me. So reach out, please. I’d love to talk to you.
Shane Hastie: Thank you so much for taking the time to talk to us today.
Austin Vance: It was an absolute pleasure. Thanks for letting me ramble.
Mentioned:
.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.
![MMS Founder](https://mobilemonitoringsolutions.com/wp-content/uploads/2019/04/by-RSS-Image@2x.png)
MMS • Frank Fodera
Article originally posted on InfoQ. Visit InfoQ
![](https://mobilemonitoringsolutions.com/wp-content/uploads/2025/01/frank-fodera-medium-1730999256777.jpeg)
Transcript
Fodera: I’m going to be talking about how CarGurus leverages our internal developer portal to achieve our strategic initiatives. I want to know, how many folks actually know what an internal developer portal is? An internal developer portal is really like a centralized hub that allows us to improve developer experience, developer efficiency by reducing a whole bunch of cognitive load. It’s usually internal, but the whole point is to centralize information into it.
We’ll talk about launching Showroom. Showroom is our internal developer portal that we built at CarGurus. How we actually achieved critical mass of adoption with our internal developers. Then, the foundation that we really built that helped us to set ourselves up for the future of our strategic initiatives that we’re trying to accomplish.
My name is Frank Fodera. I’m a director of engineering of developer experience. I’ve been at the company for about six years. My background is primarily in backend development, architecture, and platform engineering. Sometimes I’ll jump into the frontend as needed. I always found myself on customer facing teams, but CarGurus really helped me find my passion for improving the developer experience. That’s currently where I’m at. I do love staying technical. I found enjoyment in coaching and helping others grow, as well as achieving their strategic vision. Really like making a wide impact at the company, so I unexpectedly started gravitating towards leadership roles.
Developer Experience (DevX) – Our Mission
Before we jump into the actual tool, I want to talk a little bit about developer experience and what we’re trying to accomplish with our goals. Our mission statement was really to improve the developer experience at CarGurus by enabling team autonomy and optimizing product development. We do this in a couple of different ways. We have an architecture team that really invests into making sure that we have scalable architecture system design, and they do that for both the frontend and backend.
We have Platform as a Service team which really invests into providing a platform offering which provides a great developer experience for the developer workflows, our environments, and really all the day-to-day that we’re doing with how we’re working. We have a pipelines and automation team which focuses on build and delivery and how we actually get everything into production, and then all the quality gates that we’re investing into in order to make sure that we do that very seamlessly. Then we have our tools team, which is really the focus of this talk, which is internal developer portal, and the internal tools that are helping us improve that developer experience at our company.
Launching Showroom
Launching Showroom. When we first started Showroom, it really was just a catalog. We knew we had a problem, but we found that it eventually evolved over time, into what we call an internal developer portal, so it’s our homegrown solution. In this presentation, we’re going to talk a little bit about what problems justified why we created this ourselves. What outcomes did those solutions provide? How did we actually achieve critical mass of getting people to use this product? Then throughout, we’ll talk about a lot of these strategic initiatives that we piggybacked off of in order to invest into this product. Then at the end, we’ll wrap it up with a little bit of a foundation that we really created that helps us to continue to invest into this, as well as leverage this to move faster on achieving those strategic initiatives.
Our journey really started in 2019. Many of the talks have talked about tech modernization, or trying to optimize the way that we’re doing stuff into the cloud, and that’s where we started with our journey. We wanted to start moving into microservices. We called it our decomposition initiative. Our monolith was starting to get slower to develop in. We wanted to actually develop more features, but we found it difficult. We knew something needed to change. We needed to transform the way that we approached our development. Thus, we embarked on our microservice journey. Our problems that we had were that we knew we were going to have hundreds of services. We already had thousands of jobs, and ownership was unknown, and some of them were even unclear.
One of the engineers on my team actually said something in our Slack, where he said, everything that has no owner is going to eventually be owned by the infrastructure. We were an infrastructure team, so it was definitely a motivation for us to make sure that we had clear ownership, because we didn’t want to end up owning everything. Production readiness was another problem we had. We didn’t always know what platform integrations, if we were ready for production as we were trying to introduce these new services. There really wasn’t much transparency into that. Overall, we found that it was very difficult for us to create new artifacts. It was a very heavy platform investment. It took us a lot of handholding to get it across the line. It was very time consuming. We knew that there were some big problems that we wanted to solve.
This is actually what our monolith looks like. We actually bought a tool to try to go and look at our dependency graph within our monolithic architecture. We had all of these packages, all these modules, and we had almost no restrictions on how they could actually call each other, and we ended up with a big ball of mud. We knew we were dealing with something that was pretty bad, and we knew it was a difficult challenge for us to actually go and solve this. We thought our architecture looked like this.
We thought, yes, we have this monolith which has this frontend, it has all these packages it can call, and then it relies on a database. Nice and clean. That wasn’t the reality, as we just saw. What we were targeting was actually what we call these vertical slices. These vertical slices had everything that it needed from the frontend all the way to the database to really do what it needed to do, but it really only depended on the minimum set of things. We wanted those to be more isolated. We wanted to go into that microservice architecture, and provide a more decoupled way of operating.
We also knew that we were going to have a lot of these services getting created, so we prepared for the service explosion by making it more clear who owns things, but also enforcing that we had registration and ownership. We started very basic. We started with our catalog. We started with our service catalog here. It was a JavaScript React based frontend. Developers could go in there, see what things were owned, pulled out a whole bunch of information for them. They needed to contact the team, they even had a link to the channel in Slack to go and talk with them. We had an API layer that was REST based with good documentation from Swagger. It was talking to a containerized Java Spring Boot application, a service that was running in Docker and Kubernetes. Then, at the end, we actually had a MySQL database in RDS.
That really isn’t anything special. It’s pretty basic. Allowed us to catalog things, but really didn’t provide anything other than just centralizing that information. Where the big secret came in was this concept called RoadTests. RoadTests was something that we introduced into our CI system, which is Concourse, and it ran when you actually opened a PR. What this did was it used one of the APIs on our cataloging system and said, is this new artifact that you’re generating in our monorepo? We actually have a monorepo, so that worked to our advantage here. Said, is that service actually registered in our catalog? We use the concept of canonical names.
Those canonical names enforce that, if it’s not registered, we’re actually going to block your PR from getting merged in. You need to go into the catalog, register your service. We made it easy. We didn’t want to actually add too much overhead to developers, so it was a few button clicks. You could register it right there. This actually helped us to maintain and enforce that ownership as we were continuing to develop hundreds of new microservices.
We talked about jobs. Jobs was another issue that we had where we actually did have a whole bunch of jobs that were cataloged, but they were spread across four different instances between regions or environments. They were in this system called Rundeck, which is where we ran a lot of these batch jobs. What we did was we leveraged the Rundeck APIs, and said, we’re going to take a different approach. Instead of actually having these manually be added in as individuals were adding new ones, we already have a system that has all these, but you have to remember which ones to look into. We scheduled a nightly job. We used the Spring Batch framework within our code base, had a few APIs that pulled out the various pieces of information that we felt like it was going to be relevant for our developers.
On a nightly basis, we had a sync. It just synced all of those, put them into our catalog. Really what helped us with this is that we actually developed a way to intelligently classify our ownership. We did it with pretty good accuracy. We had about a 90% accuracy rating on how many we were actually able to classify the ownership with. Then those just ended up in our catalog. If we weren’t able to, we still have this banner at the top, which, as developers were going into the UI, they could go and say, there’s some jobs that actually don’t have ownership. Maybe I should go and classify it. Click on that link, and then they could see all the ones that don’t have owners, and they can manually claim them.
What did we accomplish with this? We knew our problem, services and jobs were unknown. We had unclear ownership. What were the outcomes? One-hundred percent of our services and jobs were now cataloged. We had zero undefined ownership, meaning infrastructure doesn’t own everything now. Service registration was enforced and job catalog was automatically synced. This is where we really started with our first two pillars. Our first two pillars of Showroom were discoverability, the ability to have that consolidated critical information in one place, so our services and jobs catalog. Then our governance, our RoadTests. Those RoadTests allowed us to ensure that we were continuing to maintain that ownership as we were continuing to develop at a pretty rapid pace.
Achieving Critical Mass
That didn’t help us to achieve critical mass. That was all great. It’s a great foundation. If you’re not looking for information about what services or jobs or owners, you’re not really going to be enticed to go into that UI. What did we have to do in order to achieve critical mass? We focused on another problem, still within that decomposition initiative. We had manual checklists that people were going through, either Excel or wiki docs, where they’re saying, have you actually done all of these checks? Are you actually going and incorporating all of these different things before you bring your service to production? We said, we know we can do better. We use that same Spring Batch framework. We introduced something called compliance rules. These compliance rules ran on a nightly basis as well.
It would check things like, have you actually documented your APIs? If you’re in a separate repository, are you using the right repo settings? Or, if you’re in the monorepo, are you actually in the right folder to make it easy to find where your service exists? Is your pipeline configured appropriately? Are you reporting your errors in all of the different environments to our Sentry system? Are you using sane memory defaults or memory settings for Kubernetes? What’s your test coverage like? Are you actually reporting test coverage? Where are your metrics going? Do you have metrics? Are you flying blind on observability? What really made this powerful was the fact that we made it extremely pluggable, so anything that had an API, you could go and easily extend this and introduce a new rule.
That made it so anybody, our own team who’s maintaining this, and even external developers to our team, were able to introduce these as we were going through and trying to come up with things that we were considering more golden path, best practices, things that you want to make sure you’re actually designing for and incorporating before you go into production. This really helped us to focus on that standardization, so now we knew who was actually following the best practices and who were not. You got this compliance score right in this UI. What we found was our developers cared about the score. They thought of it as a little bit of a game. They wanted to get to that 100%. They wanted to get to the high green 90s in here, and that helped to bring a little bit more traffic to our product.
What did this actually accomplish? Integrations were now transparent. Services were automatically scored. No longer did we have to keep these checklists. Production readiness was seen upfront. You didn’t actually have to do this after you were in production and check everything. You got to see this right as you were introducing the service, and the first time that this service was actually going and being run in production hardware. This was an enhancement to our governance pillar. Really making sure that as we’re developing new features into our internal developer portal, are we actually investing into things that we feel like should be there. We were like, yes, this made sense. It’s governance. It wasn’t as much of a hard enforcement as the RoadTests. You still were allowed to merge in. You still were allowed to go into production. We found that this was very useful, because our developers cared about making sure that they could observe their systems, that they had proper logs, as they were going into production.
Next, we started off with a feature called workflows, and this was all about self-service. We had that problem where it was taking very long to get our services into production, and we wanted to make that faster. What we did was we introduced this concept of workflows where it consisted of steps. Propose your service at the beginning and bring it all the way to production in an automated fashion. What we would do is use a Spring Batch framework here as well, so that way you can keep track of the progress as you’re going through all of these different steps, because there’s a lot of things to do as we’re going through here. We’d start off with collecting information. You no longer had to worry about saving your service into the catalog, because this automatically did it for you. If you needed approval from your manager, if you wanted to make sure your service canonical name was actually accurate and that you weren’t going to change that, they would check it upfront.
If you needed additional approvals, we can go and incorporate that. Then, best of all, it would notify others that this new service is being created, so it provided visibility into all of these new services. You want to move forward. Your stuff looks good. You get your approvals. We provided some templates to go and make it a little bit easier, so you didn’t actually have to worry as much as time went on about those platform integrations, because our templates provided a lot of those out of the box. If you’re using our best practices, using our templates, you get a lot of those features automatically. We cloned that template for you. We go and update those variables, set up your development environment, and say, it’s ready to start using it. Start testing it out, make sure it works as you want. Then you can move forward when you’re ready. We’re ready to move forward, so we go into our staging environment.
We automatically generate that pipeline for you. We verify that that pipeline is going to be successful, and we let it deploy into staging. We sync a whole bunch of data, and we’ll talk a little bit more about why that’s important later on. Then say, yes, go and start using staging. Make sure everything looks good. Then when you’re ready, come back and move into production. They come back and say, is your service going to be P1? P1 is priority 1, has a little bit of additional checks that you’re going to need to introduce. If so, if the person tells us, yes, we believe that my service is going to be P1, we’ll add that label for you, and it triggers off a whole bunch of other process. We’ll verify that the production pipeline is set up. We’ll use ourselves to actually deploy your service into production, verify that it’s up and running in Kubernetes. Then if you told us, I needed a database, we’ll actually go and create a database schema for you. Then notify everyone, this brand-new service is in production.
What did this accomplish? We knew that it was complex to introduce new artifacts. We know it required heavy platform integration, and it was very time consuming. We brought service production time from 75 days to introduce a new service down to under 7 days. It was completely self-service: minimal handholding, no tickets. Nobody needed to depend on another team just to get your service in production, fully self-service. It was really great. Helped with developer happiness. You could innovate faster. You can introduce your services into production and get them right and rolling. This is our third pillar. We said, self-service. We wanted to make sure that our internal developer portal actually invested into team autonomy. We said what our mission was: team autonomy, productivity. This allowed for faster iteration. This was our third pillar here, self-serviceability.
Our next main initiative was our data center and cloud migrations. A bit unexpectedly, we ended up having to migrate out of our data center in 2019. However, we knew that wasn’t our long-term play. We knew we wanted to be in the cloud, so we ended up doing a lift and shift model into a new data center to get us there. Then we lifted and shifted again into getting us into AWS. That way we had that time in between 2019 and 2022, when we finally moved to AWS, to really prepare ourselves to do that. Some of our problems that we faced were, we were going to be changing host names quite often, because we were going from data center 1 to data center 2, and then eventually into the cloud.
Our developers used a lot of these bookmarks and things like that, which would help them find their services, but that was going to become stale very quickly. We knew that was going to be a problem. Deployments were very error prone, and we were now going to be deploying in twice as many regions, across multiple data centers. We were going to experience even more issues with human error and actually causing deployments to be complicated. Then, what we realized from our data center 1 to data center 2 migration was we really lacked the ability to have more dynamic and configuration management because host name changes were actually complex. We invested into that in between our second data center to our cloud.
First, we started off with data collection. I talked about bookmarks. Our data collection feature is essentially a centralized bookmark repository that’s visible to everybody. What we did, we leveraged that Spring Batch framework, kicked off a nightly job, or in this case, you could actually run it on demand. You click into a service, into what we call our service details page, and you would go and find that it would collect all of this information for you. Where is my pipeline found? What’s my repo, or what folder am I in within the monorepo? Are there any associated jobs that are connected to the service that I should be aware of? Where do my Sentry errors go? Where am I reporting metrics to? How do I find my logs for all of the different configurations that I have, for all of the different environments, the regions? Where can I find that? What are my host names, my internal host names that I can use to start testing?
Once again, we made this very easy to extend so really anything with an API, we can go and start collecting this information. That wasn’t all. We could collect a whole bunch of information in an automated fashion, but we also had the ability for individuals to go into this page and add some custom bookmarks. Maybe there’s a really important documentation page that we wanted to actually have. What this allowed us to do is have those developers pay it forward. Next time your teammate was looking for that critical information, or a runbook, or what happens when the service goes down, you could have a link to that actual page that goes and tells you, here’s how you can go and start triaging things. Your mind is not really all there when you’re in this emergency situation, but if you know you have the centralized place to go to, you can find all of that critical information. It was really very helpful for our developers.
What did this develop? We knew information was quickly becoming stale. What were the outcomes that we were able to accomplish with this? We automatically collected thousands of relevant links and provided it all in one spot, relevant to the specific service that you wanted to look at. We had all these services, thousands of links, you could search, filter, find which exact one you wanted. It was extremely helpful. No longer had to bookmark things. No longer had to worry about remembering the syntax query for which log statement you were trying to find. It was all just there for you. This was our fourth pillar, transparency. Providing transparency in a single pane of glass for awareness and visibility, and data collection was our first feature of it.
Next was deployments. Deployments, we talked about a lot of human error. What we found was that when people were trying to roll back, sometimes they chose the wrong build. When people were trying to choose build, sometimes they didn’t check if it had actually passed all of our integration tests or all of our different checks that we had, in an automated fashion. What we did was we integrated with GitHub to get all the list of the builds. You could even view your impactful commits. It got even more complicated in monorepo, because in monorepo, when you’re making a commit, it actually needed to be intelligent to know which artifacts are you impacting. We actually had a very intelligent way to determine that. Now developers could click this link, see exactly what changes they’re going to be deploying in that build, less likely to deploy something that they didn’t want to.
They could very easily check integrating with our CI system, has it actually passed all of the different checks? Alex talked a little bit about CarGurus concept called Pit Stop Days. Pit Stop Days is where we have one day about a month, where we really allow developers to brainstorm new ideas and innovate. The funny thing is this project actually started with that. We brainstormed this particular feature and said, this is a big problem. We know that we could do better eliminating this human error, so we invested into a design. We got a team together in our next hackathon that we had, we actually invested into this. We talked about the value that it was going to be providing. We talked about how it could benefit our strategic initiative that we’re investing into. It was extremely successful in the hackathon, we actually got a very functional prototype. Then we were given that buy-in as part of the strategic initiative to go and invest into this.
A developer comes in here, hits deploy. What happens under the covers? Once again, we use Spring Batch, but this time it was a little bit more complicated. We now were living in an environment where you had monolith and microservices. Depending on which one you were in, you actually would use a different system to deploy. You would deploy either through Rundeck or deploy either through Concourse. To the developer, it didn’t matter. We were able to completely abstract that away and provide the exact same developer experience to them, regardless of whether you are working on a monolith, working on a microservice, and then later on, even working into a separate repository. We provided a lot of convenience features. You wanted to see your logs, you know you were deploying a canary server for this service at this time, with this build, that log link dynamically generates it for you. You could see the logs right in the UI.
If anything was looking bad at the end of your deployment, you’d have this rollback button. It’s as simple as that. You would just click, roll back. Picks the build for you, knows exactly what build you were previously on. If everything looks good, you make sure your service is up and running in Kubernetes, you could go and proceed forward. We also had this culture at CarGurus where developers really wanted to get notified through Slack. We have a very Slack heavy culture, so we also integrated with notifications, where, as you were progressing through the various phases, we would notify you with good notifications, custom that we’ve made in Slack about what commits, who’s impacted it, how many commits are going out, with a link back to our service to actually go and help us.
This not only had people familiar with their Slack workflow, but encouraged them to come back to our UI to look at this as a visual status, rather than as a Slack notification. What did this accomplish? We actually eliminated human error during deployments almost wholesale. We found that there was almost no human error because we were able to design it away. Saved us about 7000 developer hours just in the time that it was launched. Really a huge success, and something that we’ll talk about why this was so critical to achieving critical mass.
This is our last pillar. The last pillar was operational efficiency. We really wanted to minimize fragmentation and cognitive load. We didn’t want developers to have to remember to go to all of these different regions to deploy. We really wanted to make sure that they were deploying by just clicking a button. They had the commits that were going out. They knew that upfront. They got to choose which commits were going out, rather than just blindly picking one because it happened to be the latest version, which may or may not have actually passed its checks. Then we provided really good ease for the log statement, so that way you could actually see those right in the UI.
We talked about configuration management. We knew host names were going to be changing. We knew that it wasn’t easy to actually manage these. We went through that painful process, through our first data center migration. What we did was we introduced this concept of configuration management. We introduced a service, primarily through CLI called Mach5. We like our car puns and naming. We had this Mach5 service which did really three things. It managed our environments for us, automated our dev deployments, and actually staging and production, so that way we contained parity. Then it introduced the concept of configuration management, both static and dynamic.
We introduced this UI within our internal developer portal that allowed developers to go in and change their configuration for the things that were static, things that weren’t going to change across the different environments, the different regions that you were running. This was just injected right into your service as we were starting up. The Mach5 service handled that for you. Then we also had dynamic configuration. The dynamic configuration really took away the whole need to even know or care about what your host name was going to be for a particular service in an environment within a region. It just said, I’m deploying in North America. I’m deploying for my production environment. I depend on service x, and it automatically knows what the host name is for service x. Completely obfuscated that away.
For our development workflows, it provided us this opportunity to make it so we had development, staging, and production all deploying in Kubernetes in a very similar way, and having environmental parity as close as we could get. We didn’t get perfect environmental parity, but as close as we could get. That allowed us to find a lot of these issues that prior to this were just, it worked in staging, it worked in development, but now it didn’t work in production. It eliminated a lot of that. Then these environments were fully managed for you, so you didn’t have to worry about it. That’s where we created the second feature here, which is the environmental visibility.
Because Mach5 was primarily CLI based, we had the ability to show visually, what services do you have deployed in your personal namespace? What services does your service depend on? You can click a button, click into your service and say, I’m experiencing an issue with my service right now. Is it actually some of my code that I wrote, or is a service that I depend on actually having an issue at the moment? Visually you could see, there’s a big red dot right there. That service is probably having an issue. Let me actually click into that service and see who owns it, so I can go and talk to them and see if their stable version that I’m depending on is not stable at the moment.
What did this accomplish for us? We had now proper configuration management enabled for all host names to be dynamic. Our static configuration was all centralized, so it was one place. We actually eliminated the pull request process. It was fully self-service as well. We launched three successful migrations, one for North America, one for EU, and then, once again, into AWS. Three successful migrations, which is a pretty huge feat. We didn’t miss anything because we had everything cataloged and we knew what we had to lift and shift. That also was huge. This is where we enhanced two more, as we were going through and developing these new features, we constantly had to ask ourselves, did it align with what our vision was for this internal developer portal? Yes, these did. It provided a self-service ability to configurations. It eliminated that manual process that we had to do to go and approve your pull requests. It provided transparency into your service, so you actually had now visibility into your environment, and you knew exactly what was running, what you were depending on, if that service was having an issue. We provided more insights to those developers.
One of the more recent initiatives that we launched, and we primarily were operating in a monorepo, but we really wanted to move to what we called multi-repo, so multiple repositories. It was 2022, we were officially in the cloud. We had many microservices at this point, but coupling remained to be an issue. We were like, I thought microservices were going to solve everything for us. No, that’s not what happened. We actually did make a good amount of attempts to go and ensure that proper build and compile time was isolated, but it was still proving difficult while everyone was really operating on a single monorepo. We found that more microservices made it difficult to find real-time information. We couldn’t easily create new libraries and repositories. It was complex and time consuming.
Overall, most importantly, it was proving to be very inefficient from a build, deploy, and development perspective, to operate in a single repository. We had this architecture. In monorepo, we introduced this concept that we called embankment, which was really trying to mimic a multi-repo environment in a monorepo. It encouraged us to prevent ourselves from having dependencies on what we called mainline, which was where all of our existing artifacts had lived. It also allowed us to introduce reusable libraries that were more properly versioned. It wasn’t good enough. We wanted to move to what we call this multi-repo, where you had each artifact being produced out of one repository. Those artifacts could depend on each other. Then you could depend on a whole bunch of reusable libraries that are properly SemVered.
We had this real-time information. Services were now further spread apart from each other, and we really wanted to provide a single pane of glass for all the information that you needed. We said, let’s go and integrate with Prometheus to find how much CPU memory your service is using. If we could find your service in Kubernetes, we’d go and tell you how many pods you have. What the status of those pods are. Are they restarting? We also wanted teams to feel empowered to improve their workflows and get more efficient. We actually implemented all four DORA metrics and provided visibility wholesale for all of the microservices, so now they could see what my change failure rate was, what my lead time for change is, and so on. We also allowed you to see, are you on the latest build?
Did somebody check in something and then nobody really remembered to deploy it, because we still were deploying manually. We also cared very deeply about security and quality, so we integrated with our security system and our quality system to go and provide that all upfront. Now you could see, do I have vulnerabilities? Do I have good coverage? Are there bugs that I have that I could go and fix? Really centralizing all of this. Now you had health, DORA metrics, build, security, quality metrics all available and easy to find, all in one place. This once again aligned with our pillars here. We had our governance, which was security analysis, allowed us to go and have that visibility more upfront, shifting further left. We had those statistics which was all about transparency, really reinforcing what we had.
Then we had a reusable workflows framework. This looks very familiar. We had steps and tasks, but what we learned this time was that we could generalize it, so we invested into refactoring this. We made it more robust, easier to extend by having these options to really introduce and ask any questions, collect any information, catalog that information, seek any approvals, execute any type of thing, notify. You could verify that things were actually set up successful. Really, you could run any arbitrary task, and that allowed us to introduce a whole bunch of new workflows for self-service, things like creating a new library, creating a new backend service directly in multi-repo.
New application, that’s what Alex’s talk was all about. All about creating those new applications in Remix. We actually even provided an ease of use to say, during a certain period of time, we’re going to allow you to have an automated way, as automated as we could, to migrate yourself to multi-repo. Then, if you forgot to actually say that you needed a database, or you didn’t know when you were introducing the service, you can now self-service that at any point in time and create a database or create a dashboard for yourself.
This allowed us to actually further enhance that time to production for services and libraries. We went from that 7 days down to 2.5. Library were manually created, taking about an average of 10 days from start to publishing it. We saw that it was taking one day now to introduce a bare bones library that was published. Really, some huge successful wins. Once again, going into our discoverability pillar, really helping us to go and invest into that cataloging. We introduced library cataloging, and then actually team cataloging too. Overall, though, I do want to talk a little bit about what this multi-repo project provided from an outcomes perspective. We helped to accomplish a lot of these with this internal developer portal.
A lot of work was put in across all of the engineers at CarGurus to really invest into this tech modernization. What did we accomplish? Our lead time for changes went down by 60%. Change failure rate dropped to 5% from 25% for our monolith. Build times were 96% faster because we didn’t have to worry about that centralized pipeline that was on the monorepo. Deploy times were on average, 70% faster. Best of all, we found that our developers were actually 220% more efficient. They were happier because they were able to move faster, accomplish more, less roadblocks, less overhead, very powerful for them.
Foundation for the Future
Now we had a foundation. We had these five pillars. These five pillars really allowed us to continue to invest. You’ll see, I’ve added a few more here that I haven’t talked about in our talk, but we had a whole bunch of different features that we’ve invested into that really kept aligning with these five pillars. We’ve stayed true to that to continue to do that. We’re not just adding everything because we have a centralized portal, but because it aligns with our mission and our vision of providing that autonomy and providing that productivity boost for our developers.
I also want to talk a little bit about what we’re currently working on. We’re currently working on another initiative called time to market acceleration. The problems here is, yes, it’s 60% faster to get services into production, but it still takes days from commit to production. Quality issues are often found too late in our development cycle. Deploying to production is still manual. You still have to click a few buttons to do it. We plan on heavily leveraging our compliance rules to determine if you’re actually ready for CI/CD. We plan to leverage labels within our catalogs to track the migration of who’s moving over to this new full CI/CD model, which is the goal of this project.
Then continuing to provide a great experience by having that single pane of glass, regardless of what type of deployment model you’re in. The outcomes we predict here are getting our lead time to changes down to under 60 minutes. Maintaining a change failure rate of 5% despite moving multiple times faster. We’re hoping to lower our defect escape rate. Then have an improved developer efficiency by eliminating that manual deploy step that we have to do at the end.
Secondarily, we talked about how we lifted and shifted into the cloud from our data center. Not great, but it helped us do it very quickly and very successfully. Another initiative that we’re launching is cloud maturity. We’re operating fully in AWS, but we’re not really fully leveraging all the offerings that we have. Our services are not actually always built with the cloud-first mindset, so we can do better there. It’s actually difficult to understand the cost implications. It’s hard to understand the cost implications of a design decision now that we’re operating in the cloud. We plan to use our catalog to know what’s available, so you can reuse stuff instead of developing net-new. We plan to invest more into our workflows to help us self-service infrastructure provisioning, making it easier for developers, while still providing that 20% for those power users who want it.
Data collection and real-time statistics to provide cost transparency, hopefully even upfront, although we learned how difficult that might be. Then integrations with our catalog to ensure that we’re doing proper cost attribution as we’re investing into more cloud offerings. We hope to accomplish faster adoption of cloud features, more services built with a cloud-first mindset, cost transparency upfront, which hopefully should overall reduce the cost of operating in AWS. Faster time to market, by easier provisioning of that infrastructure. Then, overall, once again, our goal is always improved developer efficiency and experience.
The big question we always get, though, is, what would you have done differently? There’s two things, find that daily feature sooner. It really wasn’t until we released that deployments feature that we achieved critical mass. Because that is a daily activity that developers had to do, so it provided them to go into that experience in UI every day. My recommendation would be, find that feature that makes sense to invest into, that’s going to drive traffic in there on a daily basis. I still strongly believe that the right foundation is starting with those catalogs, because that’s how you’re going to actually know about all of those systems and provide that value. I think getting to that daily feature sooner is really important.
Secondarily, minimize the usage of team names. Teams change. What we found in our experience was that service canonical name, which we embedded everywhere, was very likely to not change. It actually stayed pretty consistent. Whereas teams changed, a reorg happens. People change their team names. They shift under different managers, and all of a sudden you find that your infrastructure where you’re organizing things is out of sync with your actual catalog and system of record. I’d recommend, really lean into service canonical names and minimize your dependency on team canonical names, so that way it’s just easier. This is everything from Kubernetes to even just how you organize things in folders.
Questions and Answers
Participant 1: The numbers and the outcomes that you’ve shown were nothing short of amazing, the testimonials too. In hindsight, everything is 20/20. What was your process to handle pushback, especially in the beginning of this process?
Fodera: Early on, we started little. We actually only had one developer working on this for, I think, all of 2019. We had some spot assistance from a frontend developer to help us. If you’re getting resistance from investment into this, how did we continue to do this? I think that it’s really important to start small. Don’t try to sit there and say, this is a six-person team. We’re going to invest out of it all at the gate. I need a couple million dollars to do this. That’s not going to win. What we found was, leverage our innovation days. I talked about how we used the hackathon to prove the value of how important it is to eliminate human error.
Also, piggybacking off of the initiatives that the investment’s already being made into, and showing how you can accelerate those initiatives. If you can show that we’re already investing into a data center migration. If you go and approach your leadership and say, I can make that a lot smoother, higher chance of success or faster by investing into this feature in parallel, you’ll help with getting that investment.
Participant 2: In one of the slides, you show that developer productivity increased by 220%. How do you measure that?
Fodera: We did leverage the DORA metrics pretty heavily to show, from a flow perspective, as a team, how much faster you’re working. We got a lot of developer testimonials as well, which, from a qualitative perspective, would allow us to do that. What we found was that if given the exact same task that you needed to do in the monolith or even monorepo, versus that exact same task having to be done in a multi-repo service, they were able to do it about two times more quickly. That was pretty much how we leaned into it. A lot of that was eliminating what we call developer waste. We also outlined generally, what it would look like working on a feature in a sprint, and how much faster could you accomplish that with removing a lot of that developer waste.
Participant 3: How do you incorporate operations into this? When I say operations, I’m talking about infrastructure, infrastructure of services, architecture. Do you incorporate any service templates or architecture templates into this developer platform that speeds up the teams?
Fodera: We have the advantage that our team is part of our platform and infrastructure team, so we sit very closely with a peer of mine who runs more of the cloud infrastructure, so I’m constantly collaborating with them. I think that collaboration helps us really go in lockstep. I think what was really most important is that, like our templates, we did ensure that we had all those integrations out of the box. As we were having those templates be created, we made sure that it worked well from an infrastructure perspective. Really staying in lockstep: he’s a peer of mine, and he works very closely with me. I think that also helped from that perspective, from an organizational perspective, where we were set up for success.
Participant 4: I noticed that you showed a lot of UI based tools. However, I also know that infrastructure as code is important, especially when it comes to deployments and configuration, in which cases did you use the UI or infrastructure as code, and how did you combine the two?
Fodera: The cloud maturity one is an initiative that we’re actively working on, and that is a question that actually comes up. There’s actually a great talk by another company that talks about how you want to go with that 80/20 rule. What we found, and this was actually still true with our developers as we’re working with them. One, talk with your customers, who are your developers, in this case, and see what they want. It’s not going to be a one-size-fits-all for all companies.
What we found at CarGurus was that about 80% of people just wanted few button clicks to go and introduce a new service or get some database or whatever, and they really didn’t want to have to worry about learning Terraform, which is what we’re using under the covers for infrastructure provisioning. Lean into that 80/20 rule: 80%, whatever your 80% of your customers want, cater to that. If they want the UI, lean into that. If not, go with that approach of providing them the ability to self-service. If you have a company that everybody knows Terraform, probably not worth abstracting it away with the UI.
Participant 5: A part of your journey was migration from a monolith to microservices. Can you tell us a little bit more about your journey and what went well, and lessons learned? What recommendation could you give to other people who are going through this journey right now.
Fodera: Actually, the vertical slice model that I showed, that actually didn’t work well. I actually have a blog post that talks about how we failed a few times in our microservice journey, on cargurus.dev. It actually talks about our journey specifically for the monolith to microservices. The vertical slice approach didn’t work. That was trying to actually go and make it so we could vertically slice, detangle that big ball of mud. That actually proved to be very inefficient. That’s where we started with more of the strangler fig pattern, where we made it very easy to introduce new services instead of trying to detangle the existing.
Then we tried to enforce a culture where we said, as you’re introducing new features, do you actually need to introduce it into the monolith, or could you introduce it as a new service? Then we started with backend services only. That worked really well. Then we used the embankment approach that I talked about to help with the frontend services, and that helped a little bit. Then our shift to multi-repo, where we invested into a Remix template, was really that solidifying factor to help us decouple from a frontend perspective.
See more presentations with transcripts
The companies that had tightly packed the technology they had developed through “patents …
![MMS Founder](https://mobilemonitoringsolutions.com/wp-content/uploads/2019/04/by-RSS-Image@2x.png)
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
![ID Tech company Hopae's digital identity authentication solution is a third-generation digital identity technology that cannot be falsified and escaped privacy infringement problems, enabling easy and convenient identity authentication. [Picture = Hoppae]](https://mobilemonitoringsolutions.com/wp-content/uploads/2024/12/news-p.v1.20241231.0261a95c2ef9465c87b4861c1c4eda3a_P1.png)
The companies that had tightly packed the technology they had developed through “patents” have changed. Open Source, which discloses important technologies and even detailed codes so that anyone can see them, is becoming the basis of a new industry in the software (SW) market.
Open source is free, but many companies are looking for ways to generate revenue through this, creating a business model and even generating revenue.
The U.S. Red Hat is a representative. Red Hat is a global provider of enterprise open-source solutions, including Linux, Cloud, Container, Kubernetes, and more. Red Hat developed an enterprise-class Linux distribution to introduce a subscription model and ensure thorough quality control and long-term support. Later in 1999, IBM loaded Red Hat Linux on corporate servers, and various Wall Street financial institutions also began introducing Red Hat to reduce costs.
Since the first premium enterprise Linux launch in 2002, it became the first open-source technology company to surpass $1 billion in sales in 2012, and IBM acquired Red Hat in 2019 for about $34 billion, the largest in the history of the software company’s acquisition. In particular, Red Hat currently accounts for about 17.5% of IBM’s total actual system sales. This is a significant increase from 9.2% at the time of the acquisition in 2019, meaning Red Hat has a growing share of IBM’s software division.
MongoDB is an open-source database-based cloud service provider. Having grown around the developer community, MongoDB has established a monetization model, providing enterprises including advanced security features, audit functions, and professional support services, operating certification programs.
Since then, it has attracted $150 million from its Series F investment in 2013 and attracted more than 30 customers among the top Fortune 500 companies in 2014. MongoDB’s sales as of the third quarter of 2024 amounted to about 580 billion won.
Companies based on excellent open-source technology are also emerging in Korea. ID Tech company Hopae, which provides digital identity authentication solutions, uses open source as a market entry strategy. The open-source code created by the Hopae team has more than 2 million global downloads and is actively used for various projects.
According to the Software Policy Research Institute, the global open source market is expected to grow from $27.7 billion (about 38.4 trillion won) in 2022 to 75.2 billion dollars (about 104.2 trillion won) in 2028.
Article originally posted on mongodb google news. Visit mongodb google news
![MMS Founder](https://mobilemonitoringsolutions.com/wp-content/uploads/2019/04/by-RSS-Image@2x.png)
MMS • Anthony Alford
Article originally posted on InfoQ. Visit InfoQ
![](https://mobilemonitoringsolutions.com/wp-content/uploads/2024/12/generatedHeaderImage-1734791589464.jpg)
Researchers from InstaDeep and NVIDIA have open-sourced Nucleotide Transformers (NT), a set of foundation models for genomics data. The largest NT model has 2.5 billion parameters and was trained on genetic sequence data from 850 species. It outperforms other state-of-the-art genomics foundation models on several genomics benchmarks.
The InstaDeep published a technical description of the models in Nature. NT uses an encoder-only Transformer architecture and is pre-trained using the same masked language model objective as BERT. The pre-trained NT models can be used in two ways: to produce embeddings for use as features in smaller models, or fine-tuned with a task-specific head replacing the language model head. InstaDeep evaluated NT on 18 downstream tasks, such as epigenetic marks prediction and promoter sequence prediction, and compared it to three baseline models. NT achieved the “highest overall performance across tasks” and outperformed all other models on promoter and splicing tasks. According to InstaDeep:
The Nucleotide Transformer opens doors to novel applications in genomics. Intriguingly, even probing of intermediate layers reveals rich contextual embeddings that capture key genomic features, such as promoters and enhancers, despite no supervision during training. [We] show that the zero-shot learning capabilities of NT enable [predicting] the impact of genetic mutations, offering potentially new tools for understanding disease mechanisms.
The best-performing NT model, Multispecies 2.5B, contains 2.5 billion parameters and was trained on data from 850 species of “diverse phyla,” including bacteria, fungi, and invertebrates as well as mammals such as mice and humans. Because this model outperformed a 2.5B parameter NT model trained only on human data, InstaDeep says that the multi-species data is “key to improving our understanding of the human genome.”
InstaDeep compared Multispecies 2.5B’s performance to three other genomics foundational models: Enformer, HyenaDNA, and DNABERT-2. All models were fine-tuned for each of the 18 downstream tasks. While Enformer had the best performance on enhancer prediction and “some” chromatin tasks, NT was the best overall. It outperformed HyenaDNA on all tasks, even though HyenaDNA was trained on the “human reference genome.”
Besides its use on downstream tasks, InstaDeep also investigated the model’s ability to predict the severity of genetic mutations. This was done using “zero-shot scores” of sequences, calculated using cosine distances in embedding space. They noted that this score produced a “moderate” correlation with severity.
An Instadeep employee BioGeek joined a Hacker News discussion about the work, pointing out example use cases in a Huggingface notebook. BioGeek also mentioned a previous Instadeep model called ChatNT:
[Y]ou can ask natural language questions like “Determine the degradation rate of the human RNA sequence @myseq.fna on a scale from -5 to 5.” and the ChatNT will answer with “The degradation rate for this sequence is 1.83.”
Another user said:
I’ve been trialing a bunch of these models at work. They basically learn where the DNA has important functions, and what those functions are. It’s very approximate, but up to now that’s been very hard to do from just the sequence and no other data.
The Nucleotide Transformers code is available on GitHub. The model files can be downloaded from Huggingface.
Flutter 3.27 Promotes New Rendering Engine Impeller, Improves iOS and Material Widgets, and More
![MMS Founder](https://mobilemonitoringsolutions.com/wp-content/uploads/2019/04/by-RSS-Image@2x.png)
MMS • Sergio De Simone
Article originally posted on InfoQ. Visit InfoQ
![](https://mobilemonitoringsolutions.com/wp-content/uploads/2024/12/flutter-3-27-released-1735554656368.jpeg)
The latest version of Google’s cross-platform UI kit, Flutter 3.27, brings a wealth of changes, including better adherence to Apple’s UI Guidelines thanks to a number of improved Cupertino widgets, new features for CarouselView
, list rows and columns, ModalRoutes
transitions, and so on. Furthermore, the new release makes the Impeller rendering engine the default, with improved performance, instrumentation support, concurrency support, and more.
Cupertino is a collection of widgets that align strictly with Apple’s Human Interface Guidelines. Flutter 3.27 updates a few to increase their fidelity, including CupertinoCheckbox
, CupertinoRadio
, and CupertinoSlidingSegmentedControl
. It also extends CupertinoCheckbox
, and CupertinoSwitch
to make them more configurable, and brings CupertinoButton
on a par with the latest customizability options introduced in iOS 15. Other improvements affect CupertinoActionSheet
, CupertinoContextMenu
, CupertinoDatePicker
, and CupertinoMagnifier
.
On the front of Android-native Material UI, Flutter 3.27 extends CarouselView
with CarouselView.weighted
, which allows you to define more dynamic layouts using the flexWeights
parameter to specify the relative item weight occupied within the carousel view. Additionally, SegmentedButton
can be aligned vertically and a few widgets have been fixed to align better with Material 3 specifications.
Flutter 3.27 also improves ModalRoutes
, text selection, and rows and columns spacing. ModalRoutes
, which have the peculiarity of blocking interaction with previous routes by occupying the entire navigator area, now enable the exit transition from a route to sync up with the enter transition of the new route, so they play nicely together. Text selection now supports Shift + Click gesture to move the extent of the selection to the clicked position on Linux, macOS, and Windows. Rows and Columns may use a spacing
parameter which makes it easier to offset them from each other.
After over one year in preview, the new Impeller rendering engine has been promoted to default on modern Android devices, replacing the old one, Skia. Skia is still available as an opt-in in case of compatibility issues. Impeller attempts to do at compile times a number of tasks that Skia does at runtime, such as building shaders and reflections and creating pipeline state objects upfront, and improves caching to make performance more predictable. It also improves debug support by labeling textures and buffers and being able to capture animations to disk without affecting rendering performance. When necessary, Impeller can distribute single-frame workloads across multiple threads to improve performance.
Going forward we will continue to make improvements to Impeller’s performance and fidelity on Android. Additionally, we intend to make Impeller’s OpenGL backend production ready to remove the Skia fallback.
Other improvements worth mentioning are improved rendering performance on iOS, support for the Swift Package Manager, as well as edge-to-edge and freeform support for Android. Do not miss the official announcement to get the full detail.
Podcast: Key Trends from 2024: Cell-based Architecture, DORA & SPACE, LLM & SLM, Cloud Databases and Portals
![MMS Founder](https://mobilemonitoringsolutions.com/wp-content/uploads/2019/04/by-RSS-Image@2x.png)
MMS • Daniel Bryant Thomas Betts Shane Hastie Srini Penchikala Ren
Article originally posted on InfoQ. Visit InfoQ
![](https://mobilemonitoringsolutions.com/wp-content/uploads/2024/12/the-infoq-podcast-logo-small-1734599940404.jpg)
Transcript
Daniel Bryant: Hello and welcome to the InfoQ podcast. My name is Daniel Bryant. I’m the news manager here at InfoQ and my day job, I currently work in platform engineering at Syntasso. Today we have a treat for you as I’ve managed to assemble several of the InfoQ podcast hosts to review the year in software technology, techniques and people practices. We’ll introduce ourselves in just a moment and then we’ll dive straight into our review of architecture, culture and methods, AI and data engineering, and the cloud and DevOps. There is of course no surprises that AI features heavily in this discussion, but we have tried to approach this from all angles and we provided plenty of other non-AI insights too. Great to see you all again. Let’s start with a quick round of intro, shall we? Thomas, do you want to go first?
Thomas Betts: Yes, sure. I don’t think I’ve done my full introduction in a while on the podcast. My day job, I’m an application architect at Blackbaud, the number one software provider for social impact. I do a lot of stuff for InfoQ, and this year a lot of stuff for QCon. So I’m lead editor for architecture design, co-host of the podcast obviously, I was a co-chair of QCon San Francisco this year and a track host for QCon London, so that kind of rounds out what I’ve been up to. Next up, Shane.
Shane Hastie: Thanks, Thomas. Really great to be here with my colleagues again. Yes, Shane Hastie, lead editor for culture and methods. My day job, they call me the global delivery lead for Skills Development Group. Written a couple of books deep into the people and culture elements of things, the highlights for this year. Unfortunately, I didn’t get to any of the QCons, I’m sad about that, and hopefully next year we’ll be there in person. But some of the really amazing guests we’ve had on the Engineering Culture podcast this year have been, we’ll talk a bit about it later on, but that’s probably been some of my highlights. Srini.
Srini Penchikala: Thanks, Shane. Hello everyone. I am Srini Penchikala, my day job is, I work as an application architect, but for InfoQ and QCon, I serve as the lead editor for data engineering and AI and ML community at InfoQ. I also co-host a podcast in the same space, and I am serving as the programming committee member for QCon London 2025 conference, which I’m really looking forward to.
Real quick, 2024 has been a marquee year for AI technologies and we are starting to see the next phase of AI adoption. We’ll talk more about that in the podcast and also I hosted a AI/ML Trends podcast report back in September, so that will be the biggest reference that I’ll be going back to a few times in this podcast. Definitely there are a lot to look forward to. I am looking forward to sharing in today’s podcast the technologies and trends that we should be hyped about and also what is all the hype that we should be staying away from. Next, Renato.
Renato Losio: Hi everyone. My name is Renato Losio. I’m an Italian cloud architect living in Germany. For InfoQ, I’m actually working in the Cloud queue, I’m an editor. And definitely my highlight of the year has to be being the chair of the first edition of the InfoQ Dev Summit in Munich. Back to you, Daniel.
Daniel Bryant: Fantastic. Yes, I’ve already enjoyed the Dev Summit. I was lucky enough to go to the InfoQ Dev Summit in Boston. I’ve worked in Boston for a number of years or worked with a Boston-based company and the content there was fantastic, staff plus in particular sticks in my mind. I know we’re going to dive into that, but I also did the platform engineering track at QCon London this year, which is fantastic. Great Crossing paths with yourself, Srini, I think I’ve met almost everyone this year. Maybe not yourself, Shane, this year. Actually, I can’t remember exactly when, but it’s always good to meet in person and the QCons and the Dev Summits are perfect for that kind of stuff, and I always learn a ton of stuff, which we’ll give the listeners a hint at tonight, right?
Did our software delivery trends and technology predictions from 2023 come true? [04:10]
So today as we’re recording, so I just want to do a quick look back on last year. Every year we record these podcasts, we always say we want to look back and say, “Hey, did our predictions come true?” And when we pulled out this time for last year, we said, “2024, could the use of AI within software delivery becoming more seamless and increasing divide between organizations and people adopting AI and those that don’t, and a shift towards composability and improved abstractions in the continuous delivery space”. So I think we actually did pretty well this year, right? I’m definitely seeing a whole lot of AI, as you hinted at, Srini, coming in there and as you say, every year, Thomas, the AI overlords have not stolen our jobs yet. So not completely in terms of software engineering.
Thomas Betts: I think we’re all still employed, and I’m surprised that quote in there about the separation, I think that’s true. We’re seeing the companies that are doing the innovator early adopter are still doing new things. I think the companies that are more late majority are like, “We want to use it”, but they’re not quite sure how yet. I don’t know if, Srini, you have any more insight into how people are adopting AI?
Srini Penchikala: Yes, that’s very true, Thomas. Yep, definitely AI is here, I guess, but again, it’s still evolving, right? So I definitely see some companies are going full-fledged, some companies are still waiting for it to happen. So as they say that the new trend, and I will talk more about this later in the podcast, the agentic AI. The AI agents that cannot only generate and predict the content and insights, they can also take actions. So the agentic AI is a big deal. So as they say, the AI agents are coming, so whether they’ll overtake our jobs or not, that’s to be seen. But speaking of the last year’s predictions, we had talked about the shift towards AI adoption, right? Adoption has been a lot more this year, but I think we still have some areas where I thought we would be further ahead and we are not. So it’s still evolving.
Shane Hastie: Yes. I see lots and lots of large organizations that are not software product companies putting their toes in and bumping against privacy, security, ethics and not sure how to go forward knowing that they absolutely need to, and often that governance frame slowing things down as they’re exploring, “Well, okay, what does this really mean for us?” And a lot of conservatism in that space.
Daniel Bryant: It’s really funny, Shane, compared to what Renato and I are seeing. So I went to the Google Cloud Summit in London and I only heard AI, AI, AI, AI. If you listen to the hype, and I think, Renato, you covered re:Invent for us recently. I think if you sort listen to the hype, you believe everyone, even the more late adopters are covering AI, Renato, right?
Renato Losio: Yes, I mean, just to give a order of magnitude, I don’t know if they changed the number during the conference, but at a conference like the re:Invent, there were over 800 session about AI/ML. By comparison, there were just about 200 about architecture and even less about serverless. So that gives a bit of direction where the conference was going.
Surprisingly, the itself was not so generative AI focused. They tried to make it difference, probably go back to that later on, but I find it interesting what happened in the last year in the space of AI cloud. But I don’t take responsibility for the prediction of last year because I was not there. But I have to admit that I love to start with looking back at the prediction actually, when I see tech prediction for 2025, actually, I tend to go to the bottom of the article, use it as a reference to the article of the year before because I to see a year later what people predicted if it still does. I love to go back to those topics.
What are the key trends in architecture and design? [08:08]
Daniel Bryant: That’s awesome, Renato, and you will get the privilege this time at the end to talk about your predictions, right? So we’ll hold you to it next year. So enjoy it for the moment, but I think that’s a perfect segue, you mentioned serverless there. Architecture and design is one of our sort of marquee topics. To be fair all the things we’re going to talk about today are marquee topics, but we often look to the InfoQ through an architect lens. And Thomas, you sort of run the show for us there. I know you’ve got a bunch of interesting topics that you wanted to talk about. I think the first one was around cell-based architectures, things like that.
Thomas Betts: Yes, so this was I think something we highlighted in the A&D trends report back in April I think it was, and we ended up having an e-mag come out of it and had a lot of different topics. Some of those were from various articles or presentations at QCons. And just this happens in architecture that we have these ideas that have been around for a while, but we didn’t have the technology to make it easy to implement. And then it becomes easier and then people start adopting it. So the idea of putting all of the necessary components in one cell, and so you minimize the blast radius.
And if one goes down, the next cell isn’t affected, and how do you control it and how do you maintain it? Just like any microservices, distributor architecture, there’s extra complexity involved, but it’s becoming easier to manage, and if you need the benefits, then it’s worth the trade-offs. You’re willing to take on the extra overhead of managing these things. So like I said, that e-mag is full of a lot of different insights, different viewpoints. Again, architects always looking at different viewpoints and ways to look at a problem, so I like that it gives a lot of different perspectives.
Daniel Bryant: We’ve got to named check Rafal [Gancarz] there on that one just quickly, and Rafal did a fantastic job on that.
Thomas Betts: Yes, thanks for calling out Rafal. He did fantastic doing that. The other thing that I remember talking about a few people at QCon London this year, and I think QCon London last year as well, was the green software trends. So fantastic book just came out in April. I think the book signing was the release party at QCon London.
Daniel Bryant: Oh, yes, it was Anne Currie and team, Yes, fantastic.
Thomas Betts: Anne Currie, Sara and Sarah. Sara Bergman and Sarah Hsu were all there together and they actually credited InfoQ with being the force that made the book happen because that was how they were able to all get together and collaborate. So that book about Building Green Software, Adrian Cockroft has talked about this. He’s kind of championing it from here’s how you do it in the cloud, going back to serverless, he advocates make things serverless, make them small, only running when you need it. That kind of philosophy, I think we’re going to start seeing architects have to start thinking about that more and more. It’s going to become more important.
The point that I love about, and Sara, I had a great presentation on this, is that it just makes more efficient software. It makes better software, it makes more sustainable, more maintainable software, all the other abilities we look for, if you build with a green mindset, you get all those other benefits. So if you just say, “Oh, we need to make it green, we need to reduce our carbon footprint, and nobody really cares about that”. Well, all the other things you do care about, they come along for the ride. So start thinking about that way. How do you only run your software when you need to? How do you only write the code you need? So there’s a lot of ideas in there and I think we’re going to start seeing more of that hopefully in the next year. That’s definitely one of my 2025 predictions.
Renato Losio: I think you really raised a good point about software in the sense that it’s seen as well as a proxy to other things like cost for example. I think if you go to your CFO and you say, “We are trying to make our cloud deployment greener”. Probably he won’t care that much. Even if outside the company you might sell that message that you’re generating less CO2 realities. Often it’s really a proxy on cost optimization. When you talk about serverless run just what you need or whatever, choose region that are more carbon effective often are as well the cheapest data center because they’re more efficient. So it’s an interesting approach to looking through the lens of green and not just cost or architecture.
Thomas Betts: Yes, I know Anne has talked a lot about how one of the only measures we have sometimes is what’s our bill? What’s the AWS bill that we get or Azure bill? And if that’s the only measure you have, then use it, that’s better than nothing and it is actually still a pretty good proxy. The cloud providers are getting better at saying, “Here’s how that energy is produced”. But you have to dig for it. I think we’re going to start seeing that become more of a first class, “Here’s your bill and then here’s the carbon footprint of this because you ran it in this data center, which is in US East and it’s running on coal versus somewhere in Europe that’s on nuclear”. Right? So I think that’s going to be interesting to see if we can get to those metrics and people say, “Oh, we can run this somewhere else because there’s a better carbon efficiency and we save a lot of money by doing it”.
Srini Penchikala: All of that is important, I agree with you both the green software and the sustainable computing is a very good time to talk about it in the AI context because as we all know, the large language models that are the main part of GenAI, they do require a lot of processing power. So we have to be more conscious about how much are we really spending and how much value we are getting, right? So between the value of these solutions and what are we really spending in terms of money and energy resources and the environment, right? So how about you Shane? What do you think about green?
Shane Hastie: Green software, ethics and sustainability have been a drum I have wanted and have beaten for the last three years, and it’s been great to see more and more the ability to have those hard conversations. And the challenging within organizations where as software engineers, as the development community, we can actually start to ask for, “Hey, we want to do it this way”. And now as Thomas so smoothly says, we can actually use money as a good excuse, and that helps because without showing the measurable benefits, we’re going to struggle.
Thomas Betts: And Srini, you brought up the AI component and yes, the AI carbon footprint is enormous. Somebody will say it’s the next Bitcoin, it’s just spending a lot of money, but hopefully it’s producing value. The other aspect I thought was interesting, and this was a presentation at QCon San Francisco, was how GitHub Copilot serves 400 million requests per day, and it got into the details of how you actually have to implement AI solutions. So GitHub Copilot, two things. There’s GitHub Copilot Chat, that’s where you open a window. You ask a prompt, it responds, right? It’s like ChatGPT with code, but GitHub Copilot the autocomplete, it’s kind of remarkable because it has to listen for you to stop typing and then suggest the next thing.
And so all of the complications that go underneath that that I hadn’t considered, so they had to create this whole proxy and it’s hanging up HTTP two requests, and if you just use native things like engine X after a hundred disconnects, it just drops the connection entirely and so you’ve lost it. There’s all these low level details, and I think when we get to see AI become more standard as parts of big software, more companies that have these distributed systems are going to run into some of these problems. Maybe not to GitHub Copilot scale, but there’s probably these unintended things that are going to show up in our architecture that we haven’t even thought of yet. I think that’s going to be real interesting to see how AI creates the next level of architectural challenges.
Srini Penchikala: Also, maybe just to add to that, Thomas, maybe the AI can also solve some of those challenges, right? We talk about AI hallucinations and bias, but can AI also help minimize the environmental hallucinations and the ethical biases
Daniel Bryant: Said like a true proponent, Srini, the cause of and solution to most of the world’s problems, Ai, right? I love it.
Thomas Betts: Yes, I think we’re going to see how are we going to use AI as an architect? Can I use it in my day job? Can it help me design systems? Can it help me solve problems? I use it as the rubber duck. If I don’t have someone else that I can get on a call and chat with and discuss a problem, I’ll open up ChatGPT and just start a conversation and say, “Hey, I’m trying to come up with this. What are some of the trade-offs I should consider?” I haven’t gone so far as to say, “Solve this problem”. The hallucination may be completely invalid or it may be that out of the box thinking that you just hadn’t thought of yet, it’s going to sound valid either way. You still have to prove it out and make it work.
What are the key trends in culture and methods? [16:22]
So I think the other part of being an architect, I talked about using AI to do your job, but I think the architectural process has been a big discussion this year. All of the staff plus talks at QCons are always interesting. I think we have good technical feedback, but people love, I personally love the how do I do my job? How do I get better at my job, how do I level up? So we’ve seen some of that in decentralizing decision making. I talked to Dan Fike and Shawna Martell about that. They gave a presentation, they wrote up an article based on that presentation. And, Shane, I can’t remember if you talked to them as well or you talked to somebody else about how to do a better job. How do you level up your staff plus, how do you become a staff plus or principal engineer?
Shane Hastie: Yes, I’ve had a number of people on the podcast this year talking about staff plus and growth both in single track and dual track career paths. The charity majors still going, the pendulum back and forward. When do you make your choices? How do you make your choices? Shameless plug, I had Daniel as a guest talking about people process for great developer experience, technical excellence, weaving that into the way that we work. So all of these things leveraging AI, challenging the importance of the human aspect, that critical thinking, and Thomas, you made the point, the hallucination is there.
Well, one of the most important human skills is to recognize the hallucination and not go down that path to utilize the generative AI and other tools at your fingertips most effectively. Platform engineering, the myth still of the 10 Xx engineer, but with great platform engineering, what we do is we make others 10 times better and there’s a lot of the, I want to say same old people stuff still coming up because fundamentally the people haven’t changed.
Daniel Bryant: That’s a good point.
Shane Hastie: Human beings, we don’t evolve as rapidly as software tools, digging into product mastery. Really interesting conversation with Gojko Adzic about observability at the customer level and bringing those customer metrics right to the fore. So Yes, all of the DORA metrics and all of these others still really important. I had a couple of conversations this year where we’ve spoken about DORA’s great, and it is becoming a target in and of itself for many organizations, and that doesn’t help if you don’t think about all of the people factors that go with it.
Thomas Betts: There’s a law that once you have a named goal, it stops being a useful metric, I’m botching the quote entirely. But it’s like once you have that, it was useful to measure these things to know how well you’re doing, but then people see it as that’s the thing I have to do, as opposed to, “No, just get better and we’ll observe you getting better”.
Daniel Bryant: Goodhart’s law, Thomas.
Thomas Betts: Yep, thank you.
Shane Hastie: And W. Edwards Deming, if you give a manager a numerical target, they will meet it even if they have to destroy the organization to do so.
Thomas Betts: You mentioned the DORA metrics. Why I loved the QCon keynote Lizzie Matusov gave was talking about the different metrics you can measure for psychological safety and how to measure team success. And that’s how you can say, are these teams being productive? And there’s different survey tools out there and there’s different ways to collect this data, but I think she focused on if people feel they’re able to speak up and raise issues and have open conversations, that more than anything else makes the team more successful because then they’re able to deal with these challenges.
And I think that keynote went over really well with the QCon audience, that people understood like, “Oh, I can relate to that, I can make my teams better”. You might not be able to use all the new AI stuff, but you can go back and say, “Here’s how I can try and get my teams to start talking to each other better”. Like you said, Shane, the humans haven’t evolved that fast, software has.
Daniel Bryant: That’s a great quote. On that notion, are you seeing more use of other frameworks as well? Because I’m with you. In my platform engineering day job, I see DORA, every C-level person I speak to knows what DORA is, and for better or worse, people are optimizing for DORA, but I’m also seeing SPACE from [Dr] Nicole Forsgren, et al. I’m seeing DevEx from the DX folks, a few other things. And I mean space can be a bit complicated because there’s like five things, the S, the P, the A, the C and the E. but I think if you pick the right things out of it, you can focus more on the people to your point and the productivity, well, and the happiness as well, right?
Shane Hastie: Yes, we could almost go back to the Spotify metrics. The Spotify team culture metrics have been around for a long time and what I’m seeing is reinventions of those over and over again. And it’s fundamentally about how do we create that psychologically safe environment where we can have challenging conversations with a foundation of trust and respect. And that applies in technical teams of course, but it applies across teams and across organizations and the happiness metrics and there are multiple of those out there as well.
Ebecony gave us some good conversations about creating a joyous environment and also protecting mental health. Burnout is endemic in our industry at the moment, and I think it’s not just in software engineering, it’s across organizations, but mental health and burnout is something we’re still not doing a good job at. And we really need to be upping our organizational gain in that space.
Renato Losio: I think it’s been actually a very bad year in this sense as well with what you mentioned, Shane, about a manager that reached the goal, might destroy a team, make me think that this year one of the key aspects has been all the return to office mandate, regardless if it’s good or not for the company, has been a key element of a large company, became like the new trend.
Shane Hastie: There’s cartoons of people coming into the office and then sitting on Zoom calls because it’s real. The return to office mandates, and this is a strong personal opinion, they’re almost all have nothing to do with value for the organization, and they’re all about really bad managers being scared. And I’m sure I’ve just made a lot of managers very unhappy.
Daniel Bryant: Hey, people join this podcast for opinions, Shane, keep them coming.
Thomas Betts: So many times come across the Conway’s law and all different variations of this. I think Ruth Malan is the one who said that if you have an organization that’s in conflict, the organization structure and conflict of the architecture, the organization is going to win. And I’m wondering what the return to office mandates, how is that going to affect the architecture? I made the quote, I think it was last year, the comment about the COVID corollary to Conway’s law that the teams that can’t work effectively, once we all went remote, they weren’t able to produce good distributed systems because they couldn’t communicate.
Well, now we’re in that world. Everyone has adapted to that. I think we’re seeing more companies are doing distributed systems and the teams don’t sit next to each other. They have to form their own little bubbles in their virtual groups because they’re not sitting at the desk next to each other. If we make people go back to the office, but we don’t get the benefits of them having that shared office space, then what is that going to do to the software? I don’t know if I have an answer to that, but it seems like it’s not going to have a good impact if you’re doing it for the wrong reasons.
Srini Penchikala: Maybe I can bring a cheesy analogy to this, right? We started this conversation with the serverless architecture where you don’t need to run the software systems and servers all the time. They can be ideal when you don’t need them. I think that should apply to us as humans also, right? I read this quote on LinkedIn this morning, I really like this, it says, “Almost everything will work again if you unplug it for a few minutes, including you”. So we as humans, we need to unplug once in a while to avoid the burnout. I mean, that’s the only way we can be productive when we need to work, if we have to take that break or time off.
Daniel Bryant: Yes. Plus one to that, Srini. Changing subjects a little bit. But Shane, I know you wanted to touch on some of the darker human sides of tech too.
Shane Hastie: Yes, I will go there. Cybercrime, the use of DeepFakes, generative AI. I had Eric O’Neill on and the episode is titled, Spies, Lies in Cybercrime and he brings the perspective from an FBI agent and there are some really interesting applications of technology in that space and they’re very, very scary. Personally, I was caught by a scammer this year and they were the social engineering, it worked.
Fortunately, my bank’s fraud detection system is fantastic and they caught it and I didn’t lose any money, but it was a very, very scary period while we were trying to figure that out. And for me, and in Eric’s conversations, it’s almost always the social engineering piece that breaks the barrier. Once you’ve broken the barrier, then the technology comes into play. But he tells a story of a deepfake video in real time that was the chief financial officer of a large global company. So very perturbing, somewhat scary. And from a technical technologist perspective, how do we make our systems more robust and more aware? So again, leveraging the AI tools is one of the ways. So the potential for these tools is still huge.
What are the key trends in AI and data engineering? [27:01]
Daniel Bryant: Perfect segue, Srini, into your space. It’s slightly scary, but well-founded grounding there, Shane. But Yes, I think it’s almost a dual-use technology in terms of for better or worse, right? And, Srini, love to hear your take on what’s going on in your world in the AI space.
Srini Penchikala: Yes, thanks, Daniel. Thanks, Shane, that’s a really good segue. This is probably a repeat of one of the old quotes, Uncle Ben from Spider-Man movie, right? “With power comes responsibility”. I know we kind of keep hearing that, but with powerful AI technologies, how to come the responsible AI technologies. So in the AI space, again, there are a lot of things happening. I encourage our listeners to definitely watch the 2024 trends report we published back in September. We go into a lot of these trends in detail, but just to highlight a couple of things in the short time we have in today’s podcast, obviously the language models are going at a faster pace than ever, large language models, it seems like there’s no end to them. Every day you see a new LLM popping out and the Hugging Face website, you go there, you see a lot of LLMs available for different use cases.
So it’s almost like you need an LLM, large language model to get a handle on these LLMs. But one thing I am definitely curious about and also seeing the trend this year are what are called small language models. So these are definitely a lot smaller in terms of size and the data sets compared to large language models. But they are excellent for a lot of the use cases where again, talking about green software, you don’t want to expend a lot of computing resources, but you can still get similar accuracy and benefits also. So these SLMs are getting a lot of attention. Microsoft has something called Phi-3, Google Gemma, there is the GPT Mini I think. So there are a lot of these small language models are definitely adding that extra dimension to the generative AI. And also these language models are enabling the AI modeling and execution on the mobile devices like phones, tablets, and IoT devices.
Now you can run these language models on a smaller phone or a tablet or laptop without having to send all the data to the cloud, which could have some data privacy issues and also obviously the cloud computing resource and cost. So this is one of the trends I’m watching, definitely keep an eye on this. And the other trend is obviously the one I mentioned earlier called agentic AI, agent-based AI technologies, right? So this is where I think we are going to the next level of AI adoption where we have the traditional AI that’s been used for a long time for predicting the results. And then we have GenAI, which started a few years ago with the GPT and the ChatGPT announcements. So generative AI not only does the prediction, but it also generates the content and the insights, right? It goes to the next level. So I think the agents are going to go one more step further and they can not only generate the content or insights, but also they can act on those insights with or without supervision of humans.
So there are a lot of tasks that we can think of that we don’t need humans to be in the middle of the process. We can have the agents act on those. And also one of the interesting use cases I heard recently is a multi-agent workflow application where each of the agents take the output from the previous agent as the input and they perform their own task. But doing so, they’re actually also giving the feedback to the previous agent on the hallucinations and the quality of the output so the previous agent can pick a different model and rerun the task, right?
So they go through these iterations to improve the quality of the output and minimize the hallucinations and biases. So these multi-agent workflows are definitely going to be a big thing next year. So that’s one other area that I’m seeing. Also, recently, just a couple of days ago, Google actually announced Google Gemini 2.0. The interesting thing about this article is Google’s CEO, Sundar Pichai, he actually wrote a note, so he was kind of the co-author of this article.
So it was a big deal from that standpoint. And they talk about how the agentic AI is going to be impactful in the oral AI adoption and how these agentic AI models like Google Gemini 2.0 and other models will help with not only the content generation and insights, but also actually acting on those insights and performing the tasks. Real quickly on a couple of other topics, the RAG, we talked about this last year, retrieval augmented generation. So I’m seeing a couple of specialized areas in this.
One is multimodel RAG where you can use the RAG techniques for text content and also audio and the video images to make them work together for real-world use cases. And the other one is called graph RAG. Basically use the RAG techniques on knowledge graphs because the graph data is already interconnected. So by using RAG techniques on top of that will make it even more powerful. And I think the last one I want to mention is the AI-powered PCs. Again, AI is coming to everywhere.
So especially in line with the local first architectures that we are seeing in other areas, how much computing can I do locally on the device, whether it’s my smartphone or a tablet or IoT device. So this AI is going to be powering the PCs going forward. We already are hearing about Apple intelligence and other similar technologies. But Yes, other than that AI, like you all mentioned GitHub Copilot, so AI is going to be everywhere in the software development life cycle as well.
I had one use case where multiple agents are used for code generation, document generation and test case generation in a software development life cycle. It’s only going to grow more next year and be even more like a peer programmer. That’s what we always talk about. How can AI be an architect, how can AI be a programmer or how can AI be a QA software engineer. So I think we’re going to see more developments in those areas. So that’s what I’m seeing. I don’t know, Thomas, Shane or Renato, you guys are seeing any other trends in AI space?
Shane Hastie: So I’m definitely seeing the different pace of adoption. As I mentioned right at the beginning, the organizations for whom software is now their core business, but they still think of themselves as not software companies, the banks, the financial institutions and so forth. They’re struggling with wanting to bring in, wanting the benefits and really having to tackle the ethical challenges, the governance challenges, but overcoming those, recognizing those, the limited selection of tools that are available. So one organization, the AI tool they’re allowed to use is Copilot. Nothing wrong with Copilot, there are 79,000 at least other alternates. But even in that sort of big space, there’s a dozen that people should be looking at. And I think one of the risks in that is that they end up narrowing the options and not getting the real benefits.
Thomas Betts: I saw this in my company. We did a very slow rollout of a pilot of GitHub Copilot and they wanted those people to say, “Here, how do we use it? But we’re not going to just let everyone go through it”. And part of it once they said everyone can use it is you have to go through this training on here’s what’s actually happening. So everyone understood what are you getting out of it. Things like the hallucinations, you can’t assume it’s going to be correct, it’s only as good as what you give. But if it doesn’t know the answer, its job isn’t to know the right answer. Its job is to predict the next word, predict the next code. So it’s always going to do that even if that’s not the right thing. So you still have maybe even more oversight than if you had just doing a full request review or review for someone else’s code, right?
We’ve now done Microsoft Copilot as the standard that the company gets to use. I think this is probably the one you’re referring to, everyone can use the generative AI tool to start doing all the things. And because we’re mostly on the Microsoft stack, there’s the integrations with SharePoint and OneDrive and all of that benefit. So there’s reasons to stay within the ecosystem, but again, every employee has to go through just like our mandatory ethics training and compliance training and if you deal with financial data, you have to go through this extra training. If you’re going to use the AI tools, here’s what you need to know about them, here’s how you’re safe about that. And I think that training is going to have to evolve a lot year to year to year because the AI that we have in 2024 is not the same AI we’re going to have in 2026. It’s going to be vastly different.
What are the key trends in cloud technologies? [36:08]
Daniel Bryant: I think it’s a perfect segue from everyone talking about AI. Renato, our cloud space has lost its shine. We used to be doing all the innovative stuff, all the amazing things, and now with the substrate powering all this amazing innovation going on in the AI space. So you mentioned already about the sharp skew at re:Invent as one example. There’s many other conferences, but the sharp skew at re:Invent towards this AI focus. But I’m kind of curious what’s going on outside of that, Renato? What are you seeing that’s interesting in general from re:Invent and from CloudFlare and from Azure and GCP?
Renato Losio: I think that the key point that has been going on for already two, three years is that we have moved from really innovation as in any area to really a more evolutionary approach where we had new feature, but there’s nothing really super new and that’s probably good. I mean we are getting maybe more boring but more enterprise-wise maybe, but that’s the direction. Just an example, I mean, you mentioned re:Invent. People tend to go out of re:Invent, say, “This has been the best re:Invent ever”. Usually because they get more gadgets and whatever else, but that’s usually the main goal talking about sustainability. But even Amazon itself during the keynotes and during the week before re:Invent was highlighting the 10th anniversary of AWS Lambda and 10 years of Amazon Aurora, 10 years of… what was the third one?
I think container service, and then even I think KMS. And those were all announced 10 years ago at re:Invent. If you take a re:Invent today was a great re:Invent, but you don’t have something as revolutionary as Lambda. You have cool things, you have a new distributed database, yes, definitely it’s cool, but you don’t have the same kind of new breaking the world things. It’s more evolutionary thing as well in the AI space. That was of course a key part of it. But yes, there were some new financial models for Bedrock the school, they got better names that even someone like myself that is not the skilled in the area, I can say I can get when I should use a light approved or a micro model.
At least I know that the price probably follows that. But apart from that, it’s quite, I would say evolutionary. Probably the only exception in this space is Cloudflare, at least the way I see it because probably we used to consider just a CDN. We used to consider networking mostly, but actually in the last few years they’re fully fledged cloud provider. Quite interesting services that have been out there at the moment. The other trend I wouldn’t say is for 2025, but it is already here, at least in the data space, in the database space, in the cloud database space. I think this was the year that Postgres became the de facto standard. Any implementation of any cloud provider has to be somehow even pretending to be Postgres.
Daniel Bryant: Indeed.
Renato Losio: That’s the direction. Even Amazon doesn’t mention MySQL for new services for GSQL or even Limitless database earlier this year, used to be their first open source compatible database reference point. Now it’s not anymore. All the vector databases are pointing to us as well. So that’s the direction I see at the moment.
Daniel Bryant: Fantastic.
Srini Penchikala: Quickly, Daniel, I have a comment. Renato, you are right in saying that Postgres is getting a lot of attention. Postgres has a vector database extension called PG Vector, and that is being used a lot to store the vector embeddings for AI programming. And also the database are becoming more distributed in terms of the architecture and also hosting. So I’ve been seeing a lot of databases that need to run on on-prem and in the cloud with all the transactional support and the consistency support, so distributed events are kind of helping in this. So definitely like you said, just like cloud is a substrate for everything else to happen, database engineering, data engineering and databases are the foundation for all of these powerful AI programs to work. So we don’t want to lose focus on the data side of the things.
What are the key trends in DevOps and platform engineering? [40:34]
Daniel Bryant: I’ll cover the platform engineering aspects now. So for me, 2024 has definitely been the year of the portal. Backstage has been kicking around for a while. We had folks like Roadie talking about that. It’s got its own colo day at KubeCon. Now the BackstageCon, I’ve also seen the likes of Port and Cortex emerging, lots of funding going into this space and lots of innovation too. Folks are loving a UI, loving a portal, a service catalog, a way to understand all the services they’ve got in their enterprise and how these things knit together. Now I’ve argued in a few of my talks, there’s definitely a missing middle, which we’re sort of labeling as platform orchestration popping up. And this is the missing middle between something like a UI or a CLI, portal, that kind of good stuff. And the infrastructure layer, things like Terraform, Crossplane, Pulumi, cloud infrastructure in general.
Now I was super excited to see Kief Morris with his latest edition of his book, Infrastructure’s Code talking about this missing middle two and also Camille Fournier and Ian Nowland in their platform engineering book that’s just been published by Rily. Fantastic read. I thoroughly recommend folks get hold of that, but they were also talking about this missing middle as well. So I’m super excited over the next year to see how that develops. Just in general, I’m seeing platform engineering move more into the mainstream. We’re seeing more events pop up. I mean the good folks at Humanitec spun up PlatformCon in 2022. That one is going from strength to strength. There’s also Plat Eng Day at KubeCon now and KubeCon in London coming up next year, we’re going to see an even bigger Plat Eng day I think with two tracks. So I’m super excited about that. I’m definitely at QCon London.
We’re going to dive into platform engineering again, I’ve got a hat tip my good people that were on the track this year, Jessica, Jemma, Aviran, Ana and Andy did an amazing job talking about our platform engineering from their worlds. Topics like APIs came up, abstractions, automation, I often say three A’s of platform engineering, really important. And in particular, Aviran Mordo talks about platform as a runtime. And at Wix they built this pretty much serverless platform and it was a real eye-opener seeing at the scale they’re working at how they’ve really thought about the platform APIs, they’ve really thought about the abstractions to expose the developers. And a whole bunch of the stuff behind the scenes is automated Now it’s all tradeoffs, right? But Aviran, and I saw Andy said the same thing. They’re optimizing for developer experience and not just keeping people happy, but keeping people productive too.
And there’s lots of great research going on around developer experience. I’ve got to hat tip the DX folks, Abi Noda and crew and some great podcasts kicking off in that space. And I’m just really interested about that balance, that sort of business balance I guess with proving out value for the platform, but also making sure developers are enjoying their day-to-day work as well. There’s a whole bunch of platform engineering FOMO that I see in my day-to-day job and people spinning up platform engineering efforts, spinning up platforms without really having business goals associated with them, which I think is a danger. And I’ll hint at some more why I think that’s going later on.
What are our predicted trends for software delivery in 2025? [43:25]
Now it’s that time where we hold you to predictions you are going to make, and next year the InfoQ bonus is based on success or failure of these predictions. So I’d love to go around the room and just hear what you most excited about for 2025 and predictions and we’ll go in the same order if that’s all right. Thomas, we’ll start with you.
Thomas Betts: Yes, I don’t think there’s going to be some dramatic shift. There’s never dramatic shifts in architecture, but I think the sustainability, the green engineering, I think those concepts are just going to start becoming more mainstream. You look at how team topologies and microservices and all these things overlap. All these books start referencing each other, the presentations start talking about each other in the same ideas in different ways. I think we’re going to see architects that just look at it from, “I did this for all of these benefits and I learned to put all these benefits together because they were the right sustainable thing to do and it made my system better”. I want to see that presentation at QCon San Francisco next year of we chose to do some architecture with sustainability in mind, and here’s the benefits we saw from it. Shane.
Shane Hastie: I’m going to build on the people stuff. I think we’re going to see challenges with the return to office mandates. I hope we’re going to see some sensibility coming in that when we bring people together, we bring them together for the right reason and that we get the benefit of those human beings in the same physical space. Doing collaborative work generates innovation. You want to allow that, but you also want to give the space for the work that is more effective when we are remote.
So that combination of the two, and there’s no one size fits all, and organization shifting away from mandates to conversations and let’s treat our people as responsible, sensible adults and trust them to figure out what is going to be the best way of getting stuff done. I want to see the continuing evolution of the team areas, generative AI and other AI tools as partners. I think the agentic AI as a partner is going to be a huge potential and I think we’re going to start to see some good stuff happening in that space with the people. But again, the critical thinking, the human skills becoming more and more important. So what’s the prediction there? It’s maybe more of a hope.
Daniel Bryant: No, I like it, Shane, very positive. It’s very good. Srini, on to you.
Srini Penchikala: Yes. Thanks, Daniel. Yes, definitely on the AI side, I can take a shot at a couple of predictions. I think the LLMs are going to become more business domain specific in a sense, just like we used to see our banking standards, our insurance industry standards, I think we’ll probably eventually start seeing some finance, FinTech LLM or manufacturing LLM, because that’s where the real value is, right? Because a typical ChatGPT program only knows what’s out in the internet. It doesn’t know exactly what my business does, and I don’t want to share my business information out to the outside world.
But I think there will be some of these consortiums that will come together and share at least non-sensitive proprietary information among the organizations, whether it’s manufacturing or healthcare. And they will start to create some base language models that the applications in those companies can use to train, right? So it could be the large language model, like Llama 2 will have something more specific on top of that, and then applications will be built on top of that. So that’s one prediction for me. And the other one is agents. I think agents, agents and agents. Just like that movie Matrix, agents are coming, right? Hopefully these agents are not as nefarious.
Daniel Bryant: Indeed.
Srini Penchikala: They’re not villains, right? Exactly. But Yes, I think we’ll see more. I think it’s the time for all these generative AI, great content to put into action, not by humans but hopefully by computer programs. So that’s another area that I’m definitely looking forward to seeing. Other than that, I think AI will become, like you said, something like a boring cross-cutting concern that will just enable everything. We don’t even have to think about it. And maybe just like the toys we buy sometimes, they say batteries not included. Maybe in the future the application that are not using AI, which will be very few, they will say, this application does not include AI, right? Because everything else will be AI pretty much, right? So those are my couple of predictions.
Daniel Bryant: I like it, Srini, fantastic stuff. Renato, over to you.
Renato Losio: Well, first of all, I will say that these are my tech provision in the cloud for 2025 and beyond, as many good ones say. So I will always have the chance next year to say, “Well, there’s still time”. But I really think that next year will be the first year in the cloud space that for processor Intel won’t be the default anymore. Basically we won’t consider anymore using Graviton or anything else as the alternative would be the de facto on most deployment.
And the second one, giving as well how different cloud provider implemented now distributed database using their own proprietary network and basically taking advantage of the speed they have. I see the cloud provider going towards this distributed system with less regional focus. As an end user, I will probably… As a developer, I won’t carry more long term about the specific region, the geographical area, probably yes, for compliance and many other reasons. But if behind my database is Ohio or Northern Virginia or whatever else, I would probably not care so much.
Daniel Bryant: Thanks, Renato. I like the hedge there or is it a smart move. Well done. So all these, some predictions around the platform engineering space and the good folks at Gartner are saying we’re about to head into the trough of disillusionment in their model of adoption. And I also think this is true. My prediction next year is we’re going to hear more failure stories around platforms and ultimately that’ll be triggered by a lack of business goals associated with the engineering going on. Now, I think this is just part of the way it goes, right? We saw it with microservices, we saw it with DevOps as well.
Ultimately, I think it leads us to a better place. You go into the trough for disillusionment, you hopefully come out the other side on the plateau of productivity and you’re delivering business value and it’s all good stuff. And I think we’re going to bake in a lot of learnings that we sort of temporarily forgotten. This is again, the way of the world.
We often embrace a new technology, embrace a new practice, and we sort of temporarily forget the things we’ve learned before. And I’m seeing in platform engineering, definitely a lack of focus on business goals, but also a lack of focus on good architecture practices, things like coupling and cohesion. And in particular creating appropriate abstractions for developers to get their work done and also composability of the platform.
So I think in 2025, we’re going to see a lot more around golden paths and golden bricks and making it easy to do the right thing to code ship run for developers, deliver business value, and also allow them to compose the appropriate workflow for them. And again, that’ll be dependent on the organization they’re working on. But I’m super excited to see where platform engineering is going in 2025. As always, it’s your pleasure. We could talk all day, all night, I’m sure here, but it’s fantastic just to get an hour of everyone’s time just to review all these things as we close up the year. I’ll say thank you so much for everyone and we’ll talk again soon.
Shane Hastie: Thank you, Daniel.
Srini Penchikala: Thank you.
Renato Losio: Thank you, Daniel.
Thomas Betts: Thank you Daniel, and have a happy New Year.
Mentioned:
.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.
Java News Roundup: Spring AI 1.0-M5, LangChain4j 1.0-Alpha1, Grails 7.0-M1, JHipster 8.8
![MMS Founder](https://mobilemonitoringsolutions.com/wp-content/uploads/2019/04/by-RSS-Image@2x.png)
MMS • Michael Redlich
Article originally posted on InfoQ. Visit InfoQ
![](https://mobilemonitoringsolutions.com/wp-content/uploads/2024/12/java-istock-image-01-1735514008455.jpg)
This week’s Java roundup for December 23rd, 2024 features news highlighting: the fifth milestone release of Spring AI 1.0; the first milestone release of Grails 7.0; the first alpha release of LangChain4j 1.0; and the release of JHipster 8.8.
JDK 24
Build 29 remains the current build in the JDK 24 early-access builds. Further details on this release may be found in the release notes.
JDK 25
Similarly, Build 3 remains the current build in the JDK 25 early-access builds. More details on this release may be found in the release notes.
For JDK 24 and JDK 25, developers are encouraged to report bugs via the Java Bug Database.
Spring Framework
Ten days after introducing the experimental Spring AI MCP, a Java SDK implementation of the Model Context Protocol (MCP), to the Java community, the Spring AI team has released a version 0.2.0 milestone. This initial release features: a simplified McpClient
interface such that listing operations no longer require a cursor parameter; and a new SseServerTransport
class, a server-side implementation of the MCP HTTP with the SSE transport specification. Breaking changes include a rename of some modules for improved consistency. Further details on this release may be found in the release notes.
The fifth milestone release of Spring AI 1.0 delivers: incubating support for the Model Context Protocol; support for models such as Zhipuai Embedding-3 and Pixtral; and support for vector stores such as MariaDB and Azure Cosmos DB. There were also breaking changes that include moving the MilvusVectorStore
class from the org.springframework.ai.vectorstore
package to the org.springframework.ai.vectorstore.milvus
package. The Spring AI team plans a sixth milestone release in January 2025 followed by one release candidate before the final GA release.
TornadoVM
The release of TornadoVM 1.0.9 ships with bug fixes and improvements such as: support for the RISC-V 64 CPU port to run OpenCL with vector instructions for the RVV 1.0 board; support for int
, double
, long
and short
three-dimensional arrays by creating new matrix classes; and the addition of a helper menu for the tornado
launcher script when no arguments are passed. More details on this release may be found in the release notes.
Micronaut
The Micronaut Foundation has released version 4.7.3 of the Micronaut Framework featuring Micronaut Core 4.7.10, bug fixes and patch updates to modules: Micronaut Logging, Micronaut Flyway, Micronaut Liquibase Micronaut Oracle Cloud and Micronaut Pulsar. Further details on this release may be found in the release notes.
Grails
The first milestone release of Grails 7.0.0 delivers bug fixes, dependency upgrades and notable changes such as: a minimal version of JDK 17, Spring Framework 6.0, Spring Boot 3.0 and Groovy 4.0; and an update to the PublishGuide
class to use the Gradle AntBuilder
class instead of the deprecated Groovy AntBuilder
class. More details on this release may be found in the release notes.
LangChain4j
After more than 18 months of development, the first alpha release of LangChain4j 1.0.0 features: updated ChatLanguageModel
and StreamingChatLanguageModel
interfaces to support additional use cases and new features; and an initial implementation of the Model Context Protocol. The team plans a GA release in Q12025. Further details on this release may be found in the release notes.
Apache Software Foundation
The Apache Camel team has announced that the version 3.0 release train has reached end-of-life. The recently released Apache Camel 3.22.3, will be the final one. Developers are encouraged to upgrade to the 4.0 release train via this migration guide.
JHipster
The release of JHipster 8.8.0 features: upgrades to Spring Boot 3.4, Angular 19 and Gradle 8.12; experimental support for esbuild in Angular; and improved CSRF token handling for single page applications. More details on this release may be found in the release notes.
Similarly, the release of JHipster Lite 1.24.0 ships with an upgrade to Spring Boot 3.4.1 and new features/enhancements such as: a new module for configuring a Liquibase linter; and the addition of metadata to the preprocessor to resolve a ESLint cache error. Further details on this release may be found in the release notes.
Kubernetes 1.32 Released with Dynamic Resource Allocation and Graceful Shutdown of Windows Nodes
![MMS Founder](https://mobilemonitoringsolutions.com/wp-content/uploads/2019/04/by-RSS-Image@2x.png)
MMS • Mostafa Radwan
Article originally posted on InfoQ. Visit InfoQ
![](https://mobilemonitoringsolutions.com/wp-content/uploads/2024/12/generatedHeaderImage-1735505854564.jpg)
The Cloud Native Computing Foundation (CNCF) released Kubernetes 1.32, named Penelope a few weeks ago. The new release introduced support for the Graceful Shutdown of Windows Nodes, new status endpoints for core components, and asynchronous preemptions in the Kubernetes scheduler.
A key feature in Kubernetes 1.32 is the various enhancements to Dynamic Resource Allocation (DRA), a cluster-level API for requesting and sharing resources between pods and containers. These enhancements improve the ability to effectively manage resource allocation for AI/ML workloads that rely heavily on specialized hardware such as GPUs.
Some alpha features in version 1.32 include two new HTTP status endpoints /statusz
and /flagz
for core components such as the kube-scheduler and kube-controller-manager. This makes gathering details about a cluster’s health and configuration easier to identify and troubleshoot issues.
Another feature entering alpha in this release is asynchronous preemption in the scheduler. This mechanism allows high-priority pods to get the resources needed by evicting low-priority pods in parallel, minimizing delays in scheduling other pods in the cluster.
In addition, an enhancement to Gracefully Shut down Windows Nodes has been added to the Kublet to ensure proper lifecycle events are followed for pods. This allows pods running on Windows nodes to be gracefully terminated and workloads rescheduled without disruption. Before this enhancement, this functionality was limited only to Linux nodes.
The automatic removal of PersistantVolumeClaims(PVCs)
created by StatefulSets
is a stable feature in version 1.32. This streamlines storage management, especially for stateful workloads, and reduces the risk of unused resources.
This release also includes a generally available improvement to the Kubelet to generate and export OpenTelemetry trace data. This aims to make monitoring, detecting, and resolving issues related to the Kubelet easier.
Allowing anonymous authorization for configured endpoints moved to beta in this release. This enhancement is enabled by default in version 1.32 allowing cluster administrators to specify which endpoints can be accessed anonymously.
Additionally, recovery from volume expansion failure is a beta feature in the new release. This improvement allows recovery from a volume expansion failure by retrying with a smaller size, reducing the risk of data loss or corruption throughout the process.
The flowcontrol.apiserver.k8s.io/v1beta3
API related to FlowSchema
and PriorityLevelConfiguration
was removed in the new release. It’s part of the Kubernetes API functionality to deal with an overload of incoming requests. Users are encouraged to migrate to flowcontrol.apiserver.k8s.io/v1
which has been available since version 1.29.
According to the release notes, Kubernetes version 1.32 has 44 enhancements, including 19 entering alpha, 12 graduating to beta, and 13 becoming generally available or stable.
For more information on the Kubernetes 1.32 release, users can refer to the official release notes and documentation for a detailed overview of the enhancements and deprecations in this version or watch the upcoming CNCF webinar by the release team scheduled for Thursday, January 9th, 2025 at 5 PM UTC. The next release version 1.33 is expected in April 2025.