Podcast: The Ongoing Challenges of DevSecOps Transformation and Improving Developer Experience

MMS Founder
MMS Adam Kentosh

Article originally posted on InfoQ. Visit InfoQ

Transcript

Shane Hastie: Good day, folks. This is Shane Hastie for the InfoQ Engineering Culture Podcast. Today I’m sitting down with Adam Kentosh. Adam is a field CTO of digital AI and, Adam, a good starting point is who is Adam?

Introductions [01:03]

Adam Kentosh: Thanks for having me, Shane. Really, really appreciate the opportunity to be here. And just in regards to who I am. I’m a technologist at heart. I’ve spent over 20 some odd years, or close to 20 some odd years in technology, really cut my teeth on Hadoop, large scale Hadoop clusters, managing those with automation and purpose built for custom search algorithms. From there, went into the consulting business and had an opportunity to work with some Fortune 20 companies around the area, especially as they were transitioning from things like VM based architectures into cloud, containerization, Kubernetes, et cetera. From there, I went into HashiCorp. I was able to join HashiCorp very early on and got to experience just the really exciting growth that happened at that organization.

And now I’m at Digital.ai as a field CTO, and I’m helping to bridge the gap between the field teams and the product and marketing teams, making sure that I’m taking back anything that I’ve learned in the field to the product teams for potential enhancements or improvements, and then also back to the marketing team as we’re thinking about how we can have messaging that resonates with our customers.

Shane Hastie: We’re talking about shift lift and DevSecOps and so forth. Isn’t this an old, tired topic that’s dealt with already?

DevOps is a well understood approach, but still not implemented in most organisations [02:27]

Adam Kentosh: It does feel that way. We’ve been talking about this, I think, for quite a while. Even back in 2009 when I was still fairly young, I was doing work and we had conversations around Agile and what does DevOps mean to us? And at a point in time in my career, DevOps was essentially me sitting next to a developer as an operations person. Certainly not probably what we would think DevOps is, but just the fact that they moved us into the same room, they thought, “Yes, we’ve done it”. So I think we have seen obviously good evolution here. Just one interesting thing that I find fascinating though is I was reading the state of DevOps report in 2023 and they mentioned that 80% of companies are still in mid-transformation. And I just find that really interesting. I mean, I think we started talking about this 2009, 2010 timeframe, and a lot of companies really kicked off those initiatives between probably 2010 and 2012.

We’re now 10 to 12 years in on DevOps transformation and we’re still talking about it. So I think that just begs the question, why are we still talking about it? And for me, that’s a very complex, I’d say, question and complex problem. I think we’ve seen DevOps transformation happen at the team level, but I think that sort of enterprise-wide transformation is still very elusive to a lot of customers. And I’d probably pin the reason on technology changes and disruption. If we think about what’s happened over the last 10 to 12 years, not only did DevOps come around, but we also had really the just maturity of cloud. And then we’ve also had mobile devices become the number one way for customers to interact with applications these days. And so in that situation, I think that we offered up some autonomy to teams and we said, “Hey, go pick your tools. Figure out how to take advantage of cloud. Go pick your tools, figure out how to create a mobile application”.

And with that, we gave them what we would consider a DevOps team, and we paired dev teams with operations teams and we said, “Go forth and conquer on the technology that you choose”. And I think that was great for a time. It really did help people pivot. It helped people accelerate and have kind of speed to market and get better velocity. However, as you think about trying to scale that across the organization, well now you’re very challenged. It works for maybe five to 10, 20 teams. But when you start talking about hundreds of teams, we have to figure out where we standardize, right? Do we standardize at the tooling level? Are there processes that we can standardize around? Is there team makeup or structure that we can standardize around? So I think we’ve just had a compounding problem that was getting worse simply by, I’d say, the introduction of new technologies over time.

Shane Hastie: How much standardization is enough? As a developer who was given this freedom, now you’re taking it away. How do I come to terms with that?

The balance of standardisation vs team autonomy [05:35]

Adam Kentosh: I see two options, and I’ve seen this in practice at a lot of organizations and done it personally at a few. But number one, you don’t. And you build a interface layer on top of that and you’ve got something that basically integrates with all of these different technologies. Because really when we start thinking about standardization, it’s not so much that we necessarily care about what tool they’re using, it’s that we care about the metrics that we can view of that. How are my teams performing? Are they delivering at the rate that we would anticipate? Are they having more or less bugs than other teams? So I think the thing that really matters for me is sort of the metrics that can be driven out of that platform and then being able to level teams up, “Well, if we have a bit of an underperforming team, how do we take them from a two to maybe a five?”, for instance.

And I think DORA has done a great job of setting a framework and a context for us to have a level up type methodology. And so we’ve got four or five metrics. We can take a look at how that performance is across all our organization, and then we can focus in on specific areas. So I think the first approach is you don’t bother with standardization and you just care about the metrics and you figure out a way to gather those metrics and then help your teams perform better. I think the second approach is probably what we’re seeing materialize, is something like platform engineering where, “Hey, let’s create developer-oriented tooling. Let’s create standard interfaces and APIs that can be leveraged so that we can obviously make things either more productive or we can reduce just some toil from a developer standpoint”.

I feel like either one of those are viable options in many cases, though certainly I think as a business there’s probably more viability in getting some level of tool standardization. They still have to operate as a business, they have contracts that they have to worry about. There’s a lot of negotiation and renewals and things that happen to enable those contracts. So from their perspective, if we can get it down to a reasonable amount of pool set that does get the job done, I think there’s probably usefulness in that.

Shane Hastie: So finding that quite difficult balance. The other thing that certainly you and I mentioned in our conversation earlier, but also I’ve seen a lot is as we have done the shift lift, we are seeing more and more cognitive load on developers today.

The impact of cognitive load on developers [08:04]

Adam Kentosh: Definitely. It’s very interesting. There’s been a couple of reports that have come out. GitLab has published a report just around global DevSecOps and GitHub has done a lot around developer experience and what that means today. And in both cases, the numbers were somewhere between a quarter to 32% of developers time is spent actually writing code and time is spent elsewhere now in terms of maybe improving existing code or doing administrative tasks or meetings or testing or securing your application. So I think as we think about shift left, what we’re really doing is we’re putting a lot of strain and stress on a developer community and we’re asking them not only to write code, but understand how to test that code and then understand how to secure that code and then understand how to go release it and also maintain the operations of that code over time.

So for me, it puts a lot of pressure on the development teams, and I think that’s why developer experience is becoming so important these days. I’ve heard a ton of just conversations around developer experience. We work with customers that now have titles for the VP of developer experience and their initiatives is to go make sure that their developer community is productive, that they have good impact and that they’re satisfied. So I think a lot of the reason that we’re seeing developer experience sort of come around today is simply because we are putting a lot of strain and stress and pressure on the team as we continue to progress through these shift left motions.

Shane Hastie: How do we reduce that but still get the benefit?

Adam Kentosh: That is a very good question. I’d say ways that I’ve seen it be reduced and probably the most effective means of reducing it, is truly having more of a product centric approach. So we’ve seen two things, I’d say, happen over the last few years. Number one, we asked the developer to do everything, and that obviously breeds challenges. Number two, we continue to have isolated silos for testing and for security. And that again, causes challenges. Maybe we see a reduction in velocity because now I have a handoff between development to testing and then there’s a handoff between testing and security. So depending on which way you go, you’re having issues no matter what.

And I truly feel that we probably need more of a product-oriented approach, meaning that the product teams have to have these capabilities on the team themselves. We still need testing, we still need security, we still need development, but I don’t think one person needs to do all of those things. And if we can get them closer working together, more collaborative, I think we’re going to be in a much better situation.

And if you look at some of those surveys that I mentioned, it was really interesting because a lot of what the developers were asking for was better collaboration. And I think this would breed not only better collaboration, but probably a more productive product team.

Shane Hastie: But haven’t we known that collaborative cross-functional teams are the way to do things, since 2001 at least?

Adam Kentosh: We have definitely known that. I would totally agree with you there. I still feel the challenge is the organizational structure is matching the team structure, and so we’re seeing a dedicated testing organization or we’re seeing a dedicated security organization, and now we run into the people process aspect of our challenge where people don’t want to maybe give up control of the teams that they’re working with. And so the idea of blending teams means that we have organizational changes that must take place, and that, I think, is probably what’s restricting or limiting, I think, this type of approach. Inevitably we get in a situation where, like I said earlier, we either fully shift left and we have developers owning more of that delivery or we just continue to deal with our silos because it’s almost easier than trying to reorganize a large enterprise.

Shane Hastie: What are the small nudges we can make that move us to a better developer experience?

Small nudges for better developer experience [12:22]

Adam Kentosh: The best thing that I’ve seen work for nudges is really still the idea of trials, targeted use cases, where we can actually take a product-centric approach to a team and let them go be successful. And if they can be successful and we can validate that their numbers are what we would anticipate, that they’ve got great cycle time, that their bugs in code are not there or they’re infrequent, I think that gives us justification then to say, “You know what? This is a working model. This is an opportunity to work together better, and it’s a way that we can improve our overall performance as an organization”.

So for me, it’s always in doing. I think, in getting out there and actually trying out new models and then making sure that it’s a model that can scale. So come up with some best practices and actually be able to take that back to the organization and show real results and real numbers behind it. I feel like that’s probably the best way that I think we can continue to nudge, otherwise you’re in a position where you’re trying to nudge at an executive level, and that can be maybe far more challenging.

Shane Hastie: We’ve known this. As an industry, we’ve figured this out. We’ve seen organizations that have done this well, and yet 80% are still mid-transformation. We’re still facing a lot of the same problems that all of our methodology changes and everything that we’ve tried to do over the last couple of decades. Why don’t we learn?

Finding the optimum delivery frequency [13:58]

Adam Kentosh: Well, first off, I’m racking my brain to say, “Have I seen an organization that’s done it well?” And I suppose that the flagship examples are maybe your FAANG or MAANG companies, I guess now, but to me they’re operating at a different level and they also seem to be producing a lot of this content around some of the more progressive types of development approaches. So if I think about that, and I’m getting off-topic a little bit, but we’ll get back around to it, but I think about that, they’re pushing and asking for continuous delivery. Let’s just be able to deliver on a continuous basis, small incremental chunks. And I think for the vast majority of organizations that I work with in terms of financial services, insurance companies, even some level of gaming, definitely airline, I mean, those companies don’t necessarily care about delivery frequency.

They’re purposefully pumping the brakes on delivery because they want to make sure that they’re compliant, they want to make sure that they’re secure, and they want to make sure that they’re not introducing a poor customer experience with the next release that’s going out the door.

So in reality, I’d say there are companies I suppose, that do it well, but there are other companies that just maybe don’t need to do some of this. And so being pragmatic about what is useful and what will resonate with your organization, and truly at the end of the day, what outcomes you care about. If you care about new features and keeping pace, then certainly maybe continuous delivery makes sense for you. But if you care about stability, well, maybe it doesn’t. Maybe there’s the better way. And I’m not advocating that people go back to a completely waterfall style of delivery and that we take six months to get a release out the door. That’s certainly not, I think the case here, but I think technology has enabled us to take a more, let’s call it reasonable approach to delivery, and then also still be able to get better quality, more secure applications that can go out the door.

So I know that was sort of a little bit of a segue or a long way away from what your question was, but just something I was thinking about as you mentioned that. Now back to what you were asking, “Why don’t we learn?” I think that the challenge is from a developer standpoint, and you’ll see this too, if you talk to any developer, one of the things that they really enjoy is just the opportunity to learn new things. And so when you go to a developer and you say, “Hey, we want you to take on testing”. “Well, hey, that’s interesting. I’ll go learn that new thing for a little while”. Or, “We want you to take on release pipelines”. “Oh, interesting. I’ll go take on that new thing for a little while”. So I don’t think they’re shy about saying, “No”, or I guess they’re happy to say, “Yes”, rather and say, “Yes, I want to go learn new things”.

So for me, I’m not going to pin it all on the developers. It’s certainly not all their fault, but we’re asking them to do the more, we’re giving them an opportunity to learn new things. That means professional development, it means new skills. That’s something that they’re always after, so they’re just going to go do it and they’re going to solve a problem. And I find that true for most engineers. I talk to my wife and she always wants to just kind of talk at me, but I always want to solve the problem that she’s talking about. And so it’s hard for me to stop trying to solve a problem, and sometimes I have to recognize I just need to listen. But I think that’s just an engineering mindset. You’re always looking to solve a problem. So maybe we run into a situation where the problem’s been identified and nobody’s doing anything about it and they just go fix it.

So why don’t we learn? I think that’s probably the biggest reason from an individual standpoint. From an organizational standpoint, to me, this could be controversial, but I just feel that maybe organizational structure has not evolved the way that we need it to evolve to support IT. And I know IT is supposed to support the business, and we’re supposed to ultimately support the outcomes for our customers. And I definitely recognize that. But internally to the CIO on down who owns testing security and other things, maybe there has to be an evolution there that helps support better collaboration and maybe a more product oriented or product focused approach.

Shane Hastie: One of the things that struck me when you were talking about the contrast is those companies that are held up as doing this well, they have a product mindset and they were built from the ground up with the product mindset. So are we asking our “traditional” organizations, these banks, insurance companies, airlines and so forth to fundamentally shift their business model?

The challenge for non-product focused organisations [18:50]

Adam Kentosh: It sort of feels that way. Now, you can say that those companies are obviously wildly successful, so maybe it makes sense for them to try to shift their business model. But to your point, when you’re not built that way from the ground up, there is organizational structure, of course, organizational politics that are going to come into play, there is certainly a level of skill from a just perspective of people that are working there that you have to take into consideration. And so with that, I think that we might be asking companies to act in an unnatural way. And so if we’re asking them to act in an unnatural way, I think that’s where you get… And we’ve seen people get into the situation, as I’ve mentioned, where shift left is now the term, and DevSecOps is definitely still a term in trying to get everybody to work together in a meaningful way without actually maybe changing who they report to or putting them on the same team, truly together.

So the way that they’re trying to evolve is maybe not necessarily conducive to the way that some of these organizations just grew up from the ground up.

Shane Hastie: Adam, there’s a lot we’ve delved into here. What have I not asked you that you would like to get on the table?

Software engineering intelligence [20:07]

Adam Kentosh: I’d probably say, what is the role of something like platform engineering or DevSecOps and how important is Velocity truly to companies and organizations? And I think Velocity has ruled for the last 10, 15 years. Everything was about getting code out the door faster. And in reality, I think taking a deliberate approach to software development is probably more meaningful. And let’s focus not so much on Velocity, but let’s continue to focus on the outcomes that we want to deliver for our customers. And again, that kind of goes back to, what is my end goal? Is it a stable customer environment or is it a stable application environment for my customers to operate into? And if so, well then let’s just be maybe thoughtful about how we deliver software. And then from a platform engineering standpoint, I think there’s value in the idea of platform engineering, but I actually think there’s probably almost more value. I shouldn’t say more. There’s equal value in the idea of something like a software engineering intelligence platform.

And Gartner coined this term back in March, and it’s kind of interesting as we start to think about it. But if you look at software development, you’ve got to plan what you’re going to build. You’ve got to code it, you’ve got to test and secure it, you’ve got to operate that. And historically we’ve purchased solutions inside of each one of those areas, and we have engineering processes inside of each one of those areas that have to happen. And that all generates data into its own database. And that’s great for business intelligence. It gets us some information from an agility standpoint or an Agile standpoint. We can look at things like cycle time and figure out how our teams are doing.

However, when I want to ask real questions as a business leader, when I want to understand how teams are performing across the organization, when I want to understand what changes are coming down the line that are really risky or when I want to understand what teams have the biggest impact to cycle time across the organization holistically, I can’t answer those questions with that type of approach.

So for me, I think the next evolution that we really need to see, and it’s going to set up AI initiatives inside of an organization, which is why this is also important, is the idea of unifying all of that data into a meaningful data lake and then putting engineering processes in place so that we can link the data together. And what I mean by that is if I’m doing a release and I’m not opening up a ServiceNow ticket or some type of tracking for that, well now I can’t tie an incident back to a specific release. But if I could, that would be very helpful. And if I can tie a release back to specific stories or features that are actually getting into that release, well that’s also very helpful. I still talk to customers every day that spend hours on change advisory board reviews, and they spend hours on tier one resolution calls that they have to jump on, and they’ve got 80 developers on there. So if it’s a situation where we can reduce some of that, that breeds value to an organization, certainly.

So I think the important thing is being able to measure where we’re at today and being able to unify that data to answer meaningful questions about my organization, then that sets us up for truly applying artificial intelligence and machine learning on top of that data set to help make recommendations. And that’s really where I think people want to get to over the next two years as we talk about AI and ML.

Shane Hastie: A lot of good stuff here. If people want to continue the conversation, where do they find you?

Adam Kentosh: So you can find me on LinkedIn. You can definitely reach out to me as well. I’m happy to have a conversation and talk anytime. I love chatting about this stuff. I’m also very happy to be disagreed with. So if you disagree with what I’m saying, I’d love to hear a counter perspective. I’m fortunate, I get to talk to probably 50 to a hundred customers every year, if not more. And a lot of those are in that large enterprise space. So hearing other perspectives outside of that is extremely valuable for me and helps me understand what are other companies doing, how are they doing it well? And so Yes, LinkedIn is definitely the place to go.

Shane Hastie: Adam, thanks so much. Been a real pleasure to talk with you.

Adam Kentosh: Thanks, Shane. Appreciate the time today.

Mentioned:

About the Author

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Maybe WebAssembly Is the Next Evolutionary Step From Containers: Fermyon at InfoQ DevSummit Munich

MMS Founder
MMS Olimpiu Pop

Article originally posted on InfoQ. Visit InfoQ

During her presentation at the inaugural edition of the InfoQ Dev Summit Munich, Danielle Lancashire, principal software engineer at Fermyon, hinted at WebAssembly containers as a greener alternative and a potential evolution from the current containerised approach to serverless computing.

Lancashire started her presentation by talking about the software’s carbon efficiency by referencing the Green Software Foundation’s software carbon intensity (SCI). This standard metric is based on a formula that allows the calculation of the carbon footprint of any software. It aggregates all the potential sources of carbon: the energy needed to run, the resources required to maintain the underlying computing power, and the minerals needed for developing the components they run on (e.g. CPUs or GPUs).

Even though the formula defined by the Green Software Foundation looks complicated:

SCI = ((E * I) + M) per R

  • E = Energy consumed by software in kWh
  • I = Carbon emitted per kWh of energy, gCO2/kWh
  • M = Carbon emitted through the hardware that the software is running on
  • R = Functional Unit; this is how software scales, for example, per user or device

She pointed out that, in an overly simplified manner, the SCI is just “the amount of compute over the usefulness of what if does.

The SCI index can never reach zero, but it should be as low as possible, and to achieve that, there are the following routes:

  • Use less electricity while still performing the same amount of work
  • Use fewer physical resources while still performing the same amount of work
  • Schedule the computing that is not time-sensitive for those moments when cleaner energy sources are available

Lancashire pointed out that using very efficient programming languages (“rewriting everything in Rust or C would not have the impact you are hoping for!”) “doesn’t tell the whole story”, as few applications use the CPU at its maximum capacity. More than that, “most servers use 30-60% of their maximum power during idle time”. Since approximately 70% of the CPU is unused in containerised applications (primarily due to overprovisioning), using Kubernetes is insufficient. She hinted that improving compute density often matters more than the efficiency of a single application.

Regardless of the apparent evolution (from bare metal to virtual machines and now containers), Lancashire stated that “it still sucks”, and there is place for improvement. Wasm’s initial target of replacing Flash in the browser in a safe manner generated interest also on the operational side, and WebAssembly System Interface’s implementation made it possible to use Wasm as deployable units on the server side as well. The addition of Wasi allowed Wasm to safely access the underlying computer (networking, port handling, the filesystem, etc.). So, WebAssembly deployable units are small (just a bit bigger than the code’s binary), portable, fast to start and stop, and safe as they are very isolated.

Lancashire believes that embracing serverless on top of WebAssembly units will allow organisations to have faster and cheaper infrastructure (financially and carbon-related). So, Wasm deployables might be a potential evolutionary step from containers.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Tencent builds NoSQL DB for multiple data models – The Register

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Exclusive Chinese web giant Tencent has revealed it created a NoSQL database that it believes can handle multiple data models more elegantly than other attempts to do so, and has used it to consolidate its database fleet and improve resource utilization.

The existence of the database – named X-Stor – was recently revealed in a paper [PDF] published in the Proceedings of the Very Large Data Base Endowment, the journal of the non-profit organization that exists to promote and exchange scholarly work on databases and related fields.

The paper opens with observations that NoSQL databases are generally built to handle certain data models. Tencent admits it ran several of them to power its fleet of products – social networks, video streaming services, online games, and a public cloud – that collectively serve more than a billion active users.

Titled “X-Stor: A Cloud-native NoSQL Database Service with Multi-model Support”, the paper reveals Tencent used graph databases to store info about user relationships for its social networks, wide-column stores to hold user profiles, document series databases to power its advertising operations, and time-series databases to record user behavior data.

That proved less than ideal because Tencent found it hard to support novel data models in existing systems – so sometimes needed to develop a new NoSQL system from scratch. Doing so meant rebuilding functions already found elsewhere – a wasteful overlap.

Like any hyperscaler, Tencent abhors under-used resources. The web giant was therefore not thrilled to learn that “deploying multiple heterogeneous databases at scale leads to system resources isolation for different NoSQL databases, which not only complicates maintenance but also hinders efficient resource sharing among clusters.”

X-Stor addresses that issue – allowing the use of different data models by “extending the corresponding storage engine and data access interfaces within the X-Stor system.” The independent storage engines “can fully support their respective data models, with performance comparable to that of their single-model counterparts.”

The paper claims that’s a more elegant arrangement than those used by rival NoSQL databases MongoDB, Redis, and ArangoDB, each of which has its own way of accommodating multiple data models.

X-Stor is serverless and runs as multiple microservices orchestrated by Tencent’s own Kubernetes Engine. Tencent initially ran the database on hosts packed with fast SSDs to handle the needs of different data models, such as I/O-intensive key-value and time-series models. However, doing so saw under-utilization of memory in some SSD-equipped servers. X-Stor can identify which nodes have the resources needed to match a workload and the data model it employs, thus using each node to optimal extent.

Tencent’s paper offers some dense math explaining how workloads compete for and are allocated resources – enjoy its equations if that’s your thing.

The bottom line is that the Chinese giant built itself a database it claims can handle any data model – even entirely new ones – and which it has proven can scale to store 12PB for online operational data, 700 billion requests per day with a peak of 30 million requests per second, while handling more than 100,000 tables with multiple data models.

Sadly, it appears the database is not open source – so the rest of us can’t take it for a spin.

China’s hyperscalers are doing interesting things. We’ve recently reported Alibaba Cloud’s hardware failure detection code, modular datacenter architecture, and an advanced Ethernet scheme that sees nine NICs installed in the servers it uses for AI model training. Huawei Cloud runs an advanced network health probe. Tencent found a way to halve WAN latency. ®

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Tencent builds one NoSQL database to rule all data models | United Kingdom – Head Topics

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Chinese web giant Tencent has revealed it created a NoSQL database that it believes can handle multiple data models more elegantly than other attempts to do so, and has used it to consolidate its database fleet and improve resource utilization. published in the Proceedings of the Very Large Data Base Endowment, the journal of the non-profit organization that exists to promote and exchange scholarly work on databases and related fields.

Titled”X-Stor: A Cloud-native NoSQL Database Service with Multi-model Support”, the paper reveals Tencent used graph databases to store info about user relationships for its social networks, wide-column stores to hold user profiles, document series databases to power its advertising operations, and time-series databases to record user behavior data.

Like any hyperscaler, Tencent abhors under-used resources. The web giant was therefore not thrilled to learn that”deploying multiple heterogeneous databases at scale leads to system resources isolation for different NoSQL databases, which not only complicates maintenance but also hinders efficient resource sharing among clusters.”

The paper claims that’s a more elegant arrangement than those used by rival NoSQL databases MongoDB, Redis, and ArangoDB, each of which has its own way of accommodating multiple data models.

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:TheRegister /  🏆 67. in UK

United Kingdom Latest News, United Kingdom Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

MongoDB CEO says if AI hype were the dotcom boom it is 1996MongoDB CEO says if AI hype were the dotcom boom it is 1996NoSQL database slinger attempts to reassure investors, kinda
Source: TheRegister – 🏆 67. / 61 Read more »

Tencent's AI GameGen-O Takes Aim at Open-World Video Game GenerationTencent’s AI GameGen-O Takes Aim at Open-World Video Game GenerationChinese tech giant Tencent has unveiled GameGen-O, a novel diffusion transformer model designed specifically for generating open-world video games. This groundbreaking research appears to lay the groundwork for AI-powered game engines.
Source: CreativeBloq – 🏆 40. / 65 Read more »

As Ubisoft struggles and shareholders sweat, Tencent and the Guillemot family are reportedly looking at a buyoutAs Ubisoft struggles and shareholders sweat, Tencent and the Guillemot family are reportedly looking at a buyoutAndy has been gaming on PCs from the very beginning, starting as a youngster with text adventures and primitive action games on a cassette-based TRS80.
Source: pcgamer – 🏆 38. / 67 Read more »

Remedy and Tencent enter into a $17m loan agreement that could lead to it owning a larger chunk of the Alan Wake developerRemedy and Tencent enter into a $17m loan agreement that could lead to it owning a larger chunk of the Alan Wake developerFraser is the UK online editor and has actually met The Internet in person. With over a decade of experience, he’s been around the block a few times, serving as a freelancer, news editor and prolific reviewer.
Source: pcgamer – 🏆 38. / 67 Read more »

Sandy Ryan calls for immediate rematch after controversial build-up with Mikaela MayerSandy Ryan calls for immediate rematch after controversial build-up with Mikaela MayerBritish boxer Sandy Ryan has called for an immediate rematch against American Mikaela Mayer following a controversial build-up to their fight in New York last weekend. Ryan claims she was targeted with a red paint attack and subjected to threats and a smear campaign before the bout.
Source: BBCSport – 🏆 111. / 51 Read more »

Formula 1: Renault cancel plans to build Alpine F1 works engine for 2026 and beyondFormula 1: Renault cancel plans to build Alpine F1 works engine for 2026 and beyondRenault have confirmed the cancellation of their plans to supply works engines for the 2026 Formula 1 season and beyond.
Source: SkySports – 🏆 58. / 63 Read more »

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


NVIDIA Unveils NVLM 1.0: Open-Source Multimodal LLM with Improved Text and Vision Capabilities

MMS Founder
MMS Robert Krzaczynski

Article originally posted on InfoQ. Visit InfoQ

NVIDIA unveiled NVLM 1.0, an open-source multimodal large language model (LLM) that performs on both vision-language and text-only tasks. NVLM 1.0 shows improvements in text-based tasks after multimodal training, standing out among current models. The model weights are now available on Hugging Face, with the training code set to be released shortly.

NVLM 1.0 has been evaluated against proprietary and open-access multimodal models and performs well on both vision-language and text-only tasks. In particular, the NVLM-1.0-D 72B model shows an average 4.3-point improvement in accuracy on math and coding tasks after multimodal training. This contrasts with models like InternVL2-Llama3-76B, which lose performance in text-only tasks following multimodal training. The text improvements seen in NVLM suggest that its architecture manages multimodal data effectively without undermining its original language abilities.


Source: https://nvlm-project.github.io/

The NVLM-1.0-D 72B model is not just about text. It handles a wide range of multimodal tasks. These include object localization, reasoning, OCR (optical character recognition), and even coding tasks based on visual inputs. The model can interpret complex scenarios, such as understanding visual humor or answering location-sensitive questions in images. Its ability to perform mathematical reasoning based on handwritten pseudocode, as well as its handling of other multimodal inputs, highlights the breadth of tasks it can manage.

User Imjustmisunderstood reflected on NVLM’s potential for deeper understanding:

Extending tokenization to more ‘senses’ exponentially increases dimensionality. I would be fascinated to see whether the latent space recognizes the common temporal dimension in different modalities.

This touches on the broader implications of working with multimodal data, suggesting that models like NVLM could offer new ways of connecting different types of information.

Overall, NVLM 1.0 got very positive feedback from the community. For example, Luênya dos Santos, shared the following thoughts

NVIDIA’s NVLM-D-72B is a huge leap in AI innovation. NVIDIA’s decision to open-source the model is a game-changer, giving smaller teams access to cutting-edge technology and pushing the boundaries of AI development! Really exciting news.

John McDonald added: 

By making the model weights publicly available and promising to release the training code, Nvidia breaks from the trend of keeping advanced AI systems closed.

NVLM 1.0 is available for the AI community as an open-source AI model, with model weights accessible through Hugging Face. The training code will be released soon, allowing for further exploration of the model’s capabilities.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


InfoQ Dev Summit Munich: How to Optimize Java for the 1BRC

MMS Founder
MMS Karsten Silz

Article originally posted on InfoQ. Visit InfoQ

Java applications passed the 1 Billion Row Challenge (1BRC) from January 2024 in 1.5 seconds. 1BRC creator Gunnar Morling, Software Engineer at Decodable, detailed how the participants optimized Java at the InfoQ Dev Summit Munich 2024. General optimizations applicable to all Java applications cut the runtime from 290 seconds to 20 seconds using parallel loading/processing and optimized parsing. Getting to 1.5 seconds required niche optimizations that most Java applications should forego, except for possibly GraalVM Native Image compilation.

Each of the 1 billion rows has the name of a weather station and a single temperature ranging from -99.9°C to +99.9°C. Participants created that data file with a generator application, which was part of the challenge code. The application had to read that entire file and calculate the minimum, maximum, and average temperatures of the 400 hundred weather stations as quickly as possible. Using external libraries or caching was forbidden.

1BRC measured the official results on a 2019 AMD server processor with 128 GB RAM. The applications only used eight threads. The data file was on a RAM disk. Applications had about 113 GB RAM available, but most used much less. Each submission was tested five times, with the slowest and fastest runs discarded for consistency.

Morling wrote the baseline implementation, which took 290 seconds:

Collector
  collector = Collector.of(MeasurementAggregator:: new,
    (a, m) →> {
      a.min = Math.min(a.min, m. value) ;
      a.max = Math.max(a.max, m. value);
      a.sum += m.value;
      A.count++;
    },
    (aggl, agg2) -> {
      var res = new MeasurementAggregator();
      res.min = Math.min(aggl.min, agg2.min);
      res.max = Math.max(aggl.max, agg2.max);
      res.sum = aggl.sum + agg2.sum;
      res.count = agg1.count + agg2.count;
      return res;
    },
    agg -> {
      return new ResultRow(agg.min,
        round(agg.sum) / agg.count, agg.max);
    });

Map measurements 
  = new TreeMap(Files.Lines(Paths.get(FILE))
      .map(l -> new Measurement(l.split(";")))
      .collect(groupingBy(m →> m.station(), collector)));

System.out.println(measurements);

The 1BRC is similar to back-end processing that Java often does. That’s why its general optimizations are applicable to many Java applications.

Parallel processing meant adding a single parallel()call before the map()statement in the fourth line from the bottom. This automatically distributed the following stream operations across the eight CPU cores, cutting the runtime to 71 seconds, a 4x improvement.

Parallel loading replaces Files.Lines(Paths.get(FILE)), which turns the file into a Stream sequentially. Instead, eight threads load chunks of the file into custom memory areas with the help of JEP 442, Foreign Function & Memory API (Third Preview), delivered in Java 21. Note that the Foreign Function & Memory API was finalized with JEP 454, delivered in Java 22.

The final step of the general optimizations was changing the line reading from a string to individual bytes. Together with the parallelization, this achieved the 14.5x improvement from 290 seconds to 20 seconds.

The first step of the niche optimizations is “Single Instruction, Multiple Data (SIMD) Within a Register” (SWAR) for parsing the data. In SIMD, a processor applies the same command to multiple pieces of data, which is faster than processing it sequentially. SWAR is faster than SIMD because accessing data in CPU registers is much faster than getting it from main memory.

Custom map implementations for storing the weather station data provided another performance boost. Because there were only 400 stations, custom hash functions also saved time. Other techniques included using the Unsafe class, superscalar execution, perfect value hashing, the “spawn trick” (as characterized by Morling), and using multiple parsing loops.

Mechanical sympathy helped: The applications picked a file chunk size so that all chunks for the eight threads fit into the processor cache. And branchless programming ensured that the processor branch predictor had to discard less code.

The last two optimizations targeted how the JVM works. Its JIT compiler profiles the interpreted Java bytecode and compiles often-used methods into machine code. Additionally, the JVM creates a class list at startup and initializes the JDK. All that slows down application startup and delays reaching the top speed. The GraalVM Ahead-Of-Time (AOT) compiler Native Image moves compilation and as much initialization work as possible to build time. That produces a native executable which starts at top speed. GraalVM Native Image ships alongside new Java versions.

Using GraalVM Native Image comes at the price of some possibly showstopping constraints and a more expensive troubleshooting process. Many application frameworks, such as Helidon, Micronaut, Quarkus, and Spring Boot, support GraalVM Native Image. Some libraries, especially older ones, do not, which may be a showstopper. These constraints were not a factor in the 1BRC, as no external libraries were used.

Finally, the JVM garbage collector frees up unused memory. But it also uses CPU time and memory. That is why applications that did not need garbage collection used the “no-op” Epsilon garbage collector.

The niche optimizations provided a further 13x improvement down to 1.5 seconds. The fastest application with a JIT compiler took 2.367 seconds, the fastest with GraalVM Native Image finished in 1.535 seconds.

Co-author of the 1BRC winner and GraalVM Founder and Project Lead Thomas Wuerthinger published his 10 steps of 1BRC optimizations. His baseline solution takes just 125 seconds, compared to Morling’s 290 seconds, probably because it runs on a newer 2022 Intel desktop processor. Unsurprisingly, his first step is using GraalVM Native Image. After just two steps, his solution is already down to 5.7 seconds.

Morling was positively surprised by the community that quickly formed around 1BRC. They contributed a new server, a test suite, and scripts for configuring the server environment and running the applications. Morling has not thought of a new challenge.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


The Significance of Moving Averages in MongoDB Inc Inc. (MDB) Price Performance

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


The Linux Kernel to Support Real-time Scheduling Out-of-the-Box

MMS Founder
MMS Sergio De Simone

Article originally posted on InfoQ. Visit InfoQ

Linux 6.12 will officially include support for real-time processing in its mainline thanks to a PR that enables PREEMPT_RT on all supported architectures. While aimed at applications requiring deterministic time guarantees, like avionics, robotics, automotive, and communications, it could bring improvements to user experience on the desktop, too.

In development since 2005, PREEMPT_RT is a set of patches for Linux implementing both hard and soft real-time computing capabilities. It makes the Linux kernel fully preemptible and able to respond to external events in deterministic time and with low-latency on the x86, x86_64, RISC-V, and ARM64 architectures.

While PREEMPT_RT could already be used on its own to patch a Linux kernel, its introduction in the mainline means it is now just a matter of enabling the CONFIG_PREEMPT* options at compile time to build a real-time Linux kernel. But, most importantly, integrating PREEMPT_RT into the mainline has also meant polishing a number of things to make it play nicely under most circumstances.

One significant bit of work concerned the printk function, which is critical for kernel development and was not entirely ready for real-time. Developed by Linus Torvalds, this function ensures developers know exactly where a crash occurred. Its old implementation, though, introduced a delay which has been now removed, breaking the goal of low-latency.

Previous to PREEMPT_RT being part of the kernel, the easiest way to run real-time Linux was using Ubuntu Pro, available for free for personal and small-scale commercial use but at a premium for more than five machines.

It is important to stress that being real-time has nothing to do with performance but all with predictable (i.e. deterministic) task preemption, which is key for applications that depend on actions happening within a maximum time after an external event. The plain-vanilla Linux kernel is optimized instead to guarantee maximum hardware utilization and fair time allocation to all processes, but it can also be configured for example to minimize energy consumption, or to adapt to specific tasks’ requirements (aka, utilization clamping).

According to Hacker News user femto, running real-time Linux can bring interrupt latency for each CPU core down to single digit milliseconds from double digit milliseconds. This requires, though, that you also run the scheduler with a real-time policy (SCHED_FIFO or SCHED_RR) to prevent hardware events like trackpad touches to come in the way of real-time tasks such as playing audio or 3D gaming.

Others also mention that using a real-time kernel seems to improve UX by avoiding the occasional freezes of Gnome and makes it possible to synthesize more musical instruments while running Chrome and games. The Mixxx audio player also suggests to enable real-time scheduling (among other things) to reduce audio latency and avoid audible glitches.

The final release of Linux 6.12 is expected in mid or end of November 2024, while release candidate 2 is currently available for testing.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Piper Sandler maintains Overweight rating on MongoDB shares – Investing.com

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Piper Sandler has confirmed its Overweight rating on MongoDB shares (NASDAQ: NASDAQ:MDB), maintaining a $335.00 price target.

The firm addressed concerns about the company’s recent performance, noting a decline in growth to 13% in the last quarter from 40% in the same quarter the previous year. Despite this slowdown, the firm anticipates a potential growth rebound for MongoDB.

The investment firm highlighted the recent downturn in MongoDB’s stock value, pointing out a year-to-date drop of 33%. This decline has brought the company’s forward enterprise value to sales (EV/S) multiple for the calendar year 2026 estimate (CY26E) to 7.7 times, nearing its two-year next twelve months (NTM) low of 7.1 times. According to the firm, these figures present a compelling opportunity for large-cap growth investors to consider MongoDB as it approaches a period of renewed growth.

In other recent news, MongoDB’s strong second fiscal quarter performance has led to multiple analyst firms revising their price targets upwards. DA Davidson has increased its price target to $330 while maintaining a Buy rating, following MongoDB’s robust earnings, with a significant boost from its Atlas (NYSE:ATCO) product and enterprise agreement (EA) upside. Similarly, KeyBanc Capital Markets, Oppenheimer, and Loop Capital have raised their price targets due to the company’s impressive performance.

The company’s Q2 results showcased a 13% year-over-year revenue increase, totaling $478 million, largely driven by the success of its Atlas and Enterprise Advanced offerings. MongoDB added more than 1,500 new customers during the quarter, bringing its total customer base to over 50,700.

Looking ahead, MongoDB’s management anticipates Q3 revenue to be between $493 million to $497 million, with full fiscal year 2025 revenue projected to be between $1.92 billion to $1.93 billion. These projections are based on MongoDB’s recent performance and the analyst firms’ expectations.

To complement Piper Sandler’s analysis, recent data from InvestingPro offers additional context on MongoDB’s financial position and market performance. Despite the challenges noted in the article, MongoDB’s revenue growth remains positive, with a 22.37% increase over the last twelve months as of Q1 2023. This aligns with the firm’s expectation of potential growth acceleration.

InvestingPro Tips highlight that MongoDB holds more cash than debt on its balance sheet, indicating financial stability amidst the current growth slowdown. Additionally, 22 analysts have revised their earnings upwards for the upcoming period, suggesting a positive outlook that supports Piper Sandler’s growth rebound thesis.

The recent 8.89% price return over the last week, as reported by InvestingPro, may indicate that investors are beginning to recognize the potential value in MongoDB’s stock, as suggested in the article. However, it’s worth noting that the company is trading at a high revenue valuation multiple, which investors should consider alongside the growth prospects.

For those interested in a deeper analysis, InvestingPro offers 11 additional tips for MongoDB, providing a more comprehensive view of the company’s financial health and market position.

This article was generated with the support of AI and reviewed by an editor. For more information see our T&C.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: 0 → 1, Shipping Threads in 5 Months

MMS Founder
MMS Zahan Malkani

Article originally posted on InfoQ. Visit InfoQ

Transcript

Malkani: I’m Zahan. I’ve been an engineer at Meta for over a decade. I’m here to tell you about how we launched the Threads app last year. Let’s start by talking about the opportunity that presented itself. It was January last year, and I’d just returned to work after a couple of months on leave. I was looking for something to work on and the tech scene was buzzing about Elon’s business decision since he took Twitter private in November. We were talking about it at Meta as well. For reference, back in 2019, Mark had made the call that public conversations were not the focus, and pivoted both Facebook and Instagram to focus on private communication between friends instead. The analogies were the public town square and the living room.

Mark’s view was that communication in the living room was going to grow more over the next decade. That left Twitter as the de facto forum to debate ideas in public, a function so important, I think we all can agree, to society. Since Elon took over at Twitter, he made a number of controversial decisions, pretty lightly. Popular opinion was that it was just a matter of time before the site had an extended outage depriving millions of their favorite place online. It was increasingly obvious that there was an opportunity opening up here.

The world clearly valued a service like Twitter, and the primary value in a social network is the people in it not the product. As many people abandoned Twitter, they were looking for somewhere else to hang out. At first, we thought that maybe the differentiating feature here is the text format, so we made it easier to make text posts on our existing products. In Instagram, some double-digit percentage of posts have some form of text on them, so we figured, what if it were easy to make posts that had nothing but text in it? How would that fare? It didn’t get much usage.

Few used the format, because while Instagram and Twitter share mechanical similarities, like a directed follow graph, and a feed that’s based on interest, each product has its own culture, its own raison d’etre that’s set early in the product lifecycle. While Twitter was the public town square, Instagram was where you caught up with your friends’ lives, or where you are entertained by short-form videos. We realized that to build a competing product, we really would have to build a new product with a new set of norms.

The main concern we had starting from scratch was how long this could potentially take. We were acutely aware that we had a fleeting window of opportunity to capitalize on here, that could disappear as quickly as it appeared. People still love Twitter as a brand, though little did I know of his plans for the brand. Still, in our view, time to market was all that mattered, and we enshrined that in everything we did. Every shortcut was on the table, every bit of scope up for debate, every complication a source of concern. The ask was that we needed to be prepared to ship at very short notice.

Part 2 – The Product

Let’s focus on exactly what we wanted to build. We started by laying out the basic values of this product, we had four. First, the format here was text. Every post needed to start with text. Instagram put media front and center but not here. Two, we wanted to carry over the design language and ethos of Instagram. The simplicity, the feel of the product that so many love around the world. We felt like that was a good foundation. Three, we knew that one of the values that helped early Twitter establish itself was its openness. By this I mean that the community was free to craft their own experience with an API.

Public content was widely available on the web through embeds. People generally used these tools to share their Twitter feed all over the place. While this is a new world, with GenAI coming up, we felt like a new walled garden product just wouldn’t fly. We were paying attention to Mastodon, the fediverse, the interoperable social networks. I’ll talk more about that later. Lastly, we felt like we needed to prioritize the needs of creators. Every social network has a class of folks who produce the majority of the content that others consume. This usually follows the Zipfian exponential distribution. On text-based networks this is exaggerated, and it’s a smaller proportion of the user base who produce most of the content. It’s hard to be interesting and funny in 500 characters. We knew that keeping the needs of this community in mind was critical for long term success.

With these values in mind, we sketched out what the absolute bare minimum product was and got to work building. To give ourselves goalposts, we outlined four milestones we’d like to get to. An important objective was that we wanted to have a product ready to ship as soon as possible to give us options. Each milestone was designed to be a possible end state itself, where we could ship if needed, each one gradually layered on the next most essential bit of functionality. Milestone one was just about standing up the app. You can see how bare bones that is.

Being able to log in, to make a text post that gets associated with your account. Really just the basics. Milestone two was our essentials bucket. Giving the app its familiar look with tabs for feed, notifications, and profile. Getting the basics of integrity working, being able to block another profile, to report it. Milestone three was what we called the lean launch candidate. Here we fleshed out services that had been neglected before, like a basic people search, a full screen media viewer to view photos and videos. Starting to figure out what to do with conversations, the lifeblood of the app. How do you rank them as a unit? Copying your follow graph from IG so that you can bootstrap your profile, and the ability to mute other profiles.

That’s a long laundry list. Milestone four was where we enabled interoperability with the fediverse, and was a final ship candidate. For anyone who knows the product, you might be laughing at the ambitiousness because we are far from full fediverse interop even today. In one of my weaker moments, I promised we could have this ready within the month. This was going to be the last thing we tackled in May. We took this very seriously. As each milestone neared, we’d shifted from a building features mindset to a polish for launch mindset. This was honestly exhausting as an engineer, as the context switching between building mods is taxing.

In some ways, we shipped three products in those six months. Each time we’d call a war room, burn the midnight oil, and push through to get a complete app. Then we decide whether we were ready or not to ship. With the benefit of hindsight, I now see there was a big upside to the strategy. It served as a strong forcing function to simplify the product. If all you get is three weeks to pick what features you will add, and which will add the most incremental value, you really have to boil it down to just the basics. The actual product we shipped was something like M3.5. We never got anywhere near M4, but we did get a much-needed iteration on top of M3.

Part 3 – Shortcut

All of this would be extremely ambitious to build from scratch in five months. We started in earnest in Feb, and we were promising our leadership that we’d have it ready by the summer. We had a trick up our sleeves, and that was that we had no intention of building this from scratch. When you take a wide-angle view, broadcast sharing on Instagram is fairly straightforward. You can follow profiles, access a feed of their posts, plus some recommendations. You can respond to one another, building up communities around interests. Coincidentally, this is exactly the set of features we wanted for Threads on day one, so we did it.

We took the biggest shortcut ever and reused Instagram wholesale. For the backend, it literally just is the Instagram backend with some custom functionality for Threads. For the apps, we forked the Instagram app code base on each platform. Starting from Instagram, and that we started with a fully featured product and had to strip it down to just the functionality needed. The screen recording shows what I mean. Our first prototype added a mod to the Instagram feed that surfaced text-only posts. We reused the ranking. For the post, the inside joke is that we just reordered the layout. We put the caption on top and the media on the bottom, everything else is just the same. This drastically reduced technical scope.

We’ve now taken the intractable problem of how do you build a new text-based social network and turn it into a very specific one? How do I customize Instagram feed to display a new text post format? As engineers, you will undoubtedly note that this approach has major downsides. You’re accumulating massive tech debt using a code base for something new that it wasn’t designed to serve. There’s a whole series of paper cuts that you accumulate as a result. You also need to know said legacy code base inside and out. It’s millions of lines of code, but you need to customize just a few so that you can effectively repurpose it.

My favorite one-liner about this is that it’s harder to read someone else’s code than it is to write your own. With this, you’re losing your one chance to start over clean. This approach help us stay focused on the product and keep this experience simple. A user could onboard by just logging in with Instagram. We borrowed Instagram’s design language. Your Instagram identity bootstrapped your Threads one.

We carried the spirit throughout, surgically reusing what we could and rebuilding only where necessary. It also means that the Threads team has much to be grateful for, the existence of Threads itself owes itself to the foundation laid by Instagram infra and product teams over the years. We’d be nowhere without it. I like to think that this focus on simplicity paid off. Much of the praise we received at launch was for the sparkling simplicity of the app without it feeling like it lacked essential features.

Part 4 – The Details

Now to dive into a couple of areas with interesting implications. One technique we use to tune the experience quickly was to make iteration in the product itself very easy. We did this using server driven UI. In a traditional client-server model, the client is responsible for everything about how the data is laid out on-screen. The server is just middleware pulling data out from stores. What we did was for the core interfaces which showed a list of posts, we instead sent down a full view model, which told the client exactly how to render it. This meant that when we were experimenting on the call to action and the reply bar, or when and how we show different reply face files, we could do that with just a code change on the server, which only takes a couple of hours to roll out and takes effect on all platforms.

This let us iterate on the core Threading UI very quickly in the early days. Another big focus was making the space feel safe and welcoming for all. We’d seen public social networks before and many turn into angry spaces where people just don’t hear each other out. Of course, the jury is still out on what degree Threads can live up to this aspiration, but I think there’s a few factors that are important. First is the product culture, set from the early days by those who are the seed of your network. A lot of our early users were experienced in this and so you saw a lot of callouts like, block early and often. Don’t engage with rage bait. Don’t use quote posts to dunk on people.

Then there’s the tooling that helps people maintain control over their experience. For instance, it starts with the ability to block, of course. There’s variations on this for different circumstances, like restrict, which ensures that the other party doesn’t know you took action, and mute which hides their content from your feed.

There’s the popular hide reply functionality, which gives you some small control over the conversation that your post generates. Lastly, there’s moderation. It’s tricky to get right. There’s no doubt in my mind that robust moderation is essential. To make a space acceptable to a mainstream audience, you need to do something about the extremes of speech. There’s just too much breadth in what one can find on the open internet. I’ll go as far as to say that for a new social network today, that is the unique value you’re providing. In many ways, moderation is the product people are subscribing to. Luckily, we brought a decade of experience with this from Facebook and Instagram here. While I’m sure it needs further tuning to feel right, at least we started from a good starting point. Combined with features and culture, all these helped set healthy new norms.

The last focus I want to touch on is creating a buzz for the product before launch. An example of this is the golden ticket Easter egg, which we had made and hidden in Instagram. You might call this holographic effect overengineered, but it looked beautiful and felt playful, which sparked interest. We had little hidden entry points, specific keyword searches, hashtags, and long presses, to let people know that something was launching soon and let them sign up for a reminder. Moreover, we had a small early access launch for journalists and creators who had shown an interest. It ended up being just a day or so, but it gave them a chance to learn and see what this new platform was like. It didn’t hurt that when they shared their impressions with their audience, the curiosity grew.

Part 5 – The Launch

If you’re tired of fuzzy stories, this is when we get to the more technical part of the talk. To understand the launch, I need to give you a quick technical overview of the Threads stack. Meta in general is big into monolithic binaries and its monorepo. There’s a number of reasons for this. The upshot is that much of Instagram’s business logic lives in a Python binary called Distillery, which talks to Meta’s bigger monolith, a massive PHP binary or Hack binary called WWW. Distillery is probably the largest Django deployment in the world, something I don’t know if we’re proud of, even if we run a pretty custom version of it.

The data for our users is stored in a variety of systems. The most important ones are TAO, a write-through cache that operates on a graph data model, and UDB, so a sharded MySQL deployment that stores almost all our data. By graph data model, I mean that TAO is natively familiar with links between nodes and has optimized operations to query details of those. There’s indexing systems on top that let you annotate particular links with how you want to query them. They build in memory indexes to help. The whole model has evolved over many years of building social products as Meta, and this has served us well.

There’s also other systems involved, and I can’t do them all justice. There’s a big Haskell service that fronts a lot of the rules that go into our decision making around restrictions, like looking for inauthentic behavior in user actions. There’s a key-value store called ZippyDB that’s used to stash lots of transient data that’s written too often for MySQL to handle the load. There’s a serverless compute platform, async, which was critical, and I’ll cover it later. Then there’s a Kubernetes-like system for managing, deploying, and scaling all these services. All of these need to work in concert.

That’s the background into which you insert our little product going live in July of last year. The story is that we were planning for a mid-July launch when news started doing the rounds that Twitter planned to restrict consumption of tweets for free users. This started a significant new cycle and we saw an opportunity, so we decided to launch a week early, even though it meant forgoing some load testing and final prep work. The launch was set for July 6th, and we opened up the Easter eggs I mentioned to generate buzz, and we started our early access program. On July 5th, anyone who wasn’t an engineer was happily hanging out on the product with the celebs who had come on for early access. This was probably the closest anyone was going to get to a personal chat with Shakira. For engineers, it was an all hands on deck preparation for launch the next day, upsizing systems, sketching out the runup show, and the like. I’ve never forgiven my product manager for that.

Then in the middle of the day, our data engineer popped up in the chat and mentioned something odd. He was seeing tens of thousands of failed login attempts on our app. This was odd, because no one, certainly not tens of thousands of people should have access to the app yet. We pivoted quickly and ruled out a data issue. Then we noticed that all of these were coming from East Asian countries. Maybe you figured it out, it would take us another beat to realize that this was time zones. Specifically, we were using the App Store’s preorder feature where people can sign up to download your app once it’s available, and you specify a date. Since we said July 6th, once it was past midnight in these countries, the app became available, and they couldn’t log in because of another gate on our end.

This was an oh-shit moment. The warm fuzzy feeling of being safely ensconced in an internal only limited testing was gone. We pulled together a war room and a Zoom call with close to 100 experts from around the company and all the various systems I mentioned. Given that this was the middle of the night in those countries, it was evident that demand was going to far exceed our estimations once this all went live. We had a healthy preorder backlog built up and they were all going to get the app once the clock struck midnight in their local time.

We chose a new target launch time, specifically midnight UK time, which gave us a couple of hours to prepare. We spent that time upsizing all the systems the Threads touched. I fondly remember a particular ZippyDB cache that is essential for feed to function at all, that needed to be resharded. It had to be resharded to handle 100x the capacity it was provisioned for. That job ended minutes before Mark posted that Threads was open for signups. I don’t think I’ll ever forget the stress of those final moments.

The user growth in those first couple of days has been well covered elsewhere. Broadly, a million people downloaded our app and onboarded to try it out in the first hour from when it went live. Ten million in the first day. Seventy million in the first two days, and about 100 million in the first five days. After that, the novelty effect started to wear off and the buzziness subsided, but surviving those first five days was quite the feat. While people would later opine that it was great that Threads had a relatively smooth ride with no major visible downtime, it certainly didn’t feel that way to me.

We had a number of fires going at all times, and I didn’t sleep much that week. What kept us up was all the experts in the room, certainly not me, who had the experience of dealing with larger user bases if not facing this kind of explosive growth. We tweaked the network enough to stay afloat, and we furiously fought the fires to keep things from spiraling. To pick a couple of interesting fires. The first is that we were consuming more capacity than one should. We were serving just as relatively small pool of posts to a smallish user base.

This was most evident on our boss’s timeline. We started getting reports it was actually failing to load. The thing is that Mark had an order of magnitude more interactions with his posts than anyone else in that early network. More likes, replies, reposts, quote posts, you name it, Mark was at five digits when everyone else was at four. We finally root caused this to a particular database query that we needed to render every post, but the runtimes scaled with the number of repos. It needed an index. With a couple of tries, we nailed the problematic query, and the site got healthier.

Another fire revolved around copying the follower graph from Instagram to Threads. When someone signed up for Threads, we gave them the option to follow everyone that they already follow on Instagram, on Threads. That might sound daunting, but it’s actually a limited operation. You have a maximum number of people you can follow, in the single digit thousands. That works fine. It’s a limited operation. Our serverless compute platform soaked up the load. I think at one point, we had a backlog in the tens of millions in the job queue, but with some handholding, it worked just fine.

Again, a testament to the wonderful engineering that’s gone into these systems. The issue with graph copying was that you could also say that you wanted to follow people who hadn’t signed up for Threads yet. Maybe you can see the issue now, because handling that person signing up is an unbounded operation. When big celebrities, say former President Barack Obama signed up for Threads, they had a backlog in the millions of people who were waiting to follow them. The system that we had originally designed simply couldn’t handle that scale. We raced to redesign that system to horizontally scale and orchestrate a bunch of workers, so it could eat through these humongous queues for these sorts of accounts. We also tried to manually get through the backlog since this was a sensitive time for the product.

We didn’t want to leave potential engagement on the table. It was a hair-raising couple of days, but I loved the design of the system we finally made, and it’s worked smoothly ever since. All told, I doubt I’ll ever experience a product launch quite like that again. I learned a bunch from others about staying graceful in stressful situations. I’ll carry that wherever I go.

Part 5 – After Launch

Past the launch, we needed to quickly pivot to address features that users were asking for. It had now been nine months. In that time, we finally opened to Europe in this last December. We shipped a following feed, a feed you have full control over with content from people you follow, sorted chronologically. We got our web client out there which caters to our power users. We have a limited release of an API that allows you to read and write posts. We polished content search and started showing what conversations are trending on the network.

We also have big questions that loom over us. What should play a more important role in ranking? The follow graph that people choose, or understanding content and matching it with people’s reveal preferences? How do we cater to the needs of power users who’ve come over from Twitter who have very clear ideas about what to expect from the product, while also serving all the people who are new to a text-based public conversation app? One of the developments you might find more interesting is about the adoption of ActivityPub. This is an open protocol for interoperating between microblogging networks. The accumulation of those networks is called the fediverse. Keen-eyed listeners will note that I had promised fediverse interop on an insanely ambitious timeline pre-launch, but thankfully that wasn’t put to the test.

Today we’re approaching fediverse interop, piece by piece, and we want to build it right instead of building it quickly. The reason is that we cannot integrate it into the legacy Instagram broadcast sharing stack. It has too many integration points with the rest of Meta, and we want to make strong guarantees about how we process this important data. We essentially need to rewrite our business logic and backend stack, which is ongoing.

Takeaways

In conclusion, getting this off the ground was a unique experience for me. My biggest takeaway is the power of keeping things simple. It’s certainly not easy. The aphorism about it taking longer to write a shorter letter applies to anything creative. If you’re clear about the value you want to provide, it can guide all your hard decisions on where you want to cut scope. The other learning is that cleaner, newer code isn’t necessarily always better. All the little learnings encoded into an old battle tested code base add up. If you can help it, don’t throw it away. The compliment to this is us saying that code wins arguments.

This is to say that building a product often answers questions better than an abstract analysis. We got through a lot of thorny questions quicker with prototypes instead of slide decks. Of course, nothing can be entirely generalized, because we need to acknowledge how lucky we are for the opportunity and the reception once this product came out. None of that was guaranteed. I feel very grateful for the community that embraced the product. We have a saying that goes back to the early days of Facebook that this journey is just 1% finished. It indeed feels that way to me, especially with Threads.

Questions and Answers

Participant 1: What was the size and maturity of the engineering team put on this task?

Malkani: I think, as we started building, it was like a rolling stone. We started with maybe 5 to 10 engineers, in Jan. By the middle of it, we were up to 25, 30 engineers. By launch, we were up to maybe 30, 40 engineers on the core product team. Again, standing on the shoulders of giants. This doesn’t count all the pieces that we were able to reuse, and platform teams we could lean on for different pieces. In the end, maybe it was more like 100 engineers who contributed in some way. Of course, we got to reuse all those bits of infra and stuff. The core product team stayed under 50.

Participant 2: Can you talk a little bit about the testing? How did you ensure that it worked, and you don’t kill yourself, Instagram and everything?

Malkani: We didn’t get to test much because of the timeline. We lived on the product, of course. From the time that it was a standalone app, and we could download it on our phones, we published all the time, we shared. Instagram itself is a fairly big organization. They’re like a couple of thousand people. For at least a couple of months, a couple of thousand people were using the app. It was all internal only content, and we wiped it all before we went public.

We got a fair amount of ad hoc testing that way. We were not prepared for the scale. That’s again what a lot of this talk is about, is about handling that. Then we were able to react quickly and rely on these systems that have been built for the scale. On the backend, we do have a pretty good test-driven development culture where we write a lot of integration tests for APIs, as we make them and stuff. There’s no religiousness about it.

Participant 3: You mentioned earlier that you were onboarding tech debt with your code, one year later, what happened to that?

Malkani: We’re paying down that debt. Reusing the code base meant that you’re reusing it with all its edge case handling and all its warts. Sometimes it’s difficult to know which context you’re operating in. Are you serving an Instagram function or are you serving a Threads function? We have a big project underway to disentangle that. This is where being opportunistic was important. That opportunity was only going to last for that time. I think Ranbir mentioned like, everything in architecture is a tradeoff. Here the right tradeoff definitely was to take on that tech debt and enable launching the product sooner. Now we have this like changing the engine of the plane while it’s in flight sort of project, to rewrite it and tease things apart and stuff, but we’re doing it.

Participant 4: If I understood you correctly, you said you started out by forking the Instagram code. Are you planning on reunifying that or do you want to split it further apart?

Malkani: It’s drifting apart over time. It was a convenient starting point. With us embedding more deeply with this interoperable like ActivityPub support and things, and just leaning into more of the uniqueness of what makes Threads, Threads, and what makes Instagram, Instagram, they’re drifting apart over time and we have to tease that apart.

Participant 5: You mentioned reusing the backend of Instagram. How does it say the data model and the APIs look for Threads versus Instagram? Are you reusing the same data model with some fields, or a new API for each operation?

Malkani: They’re similar, but we’re starting to tease them apart. Instagram had a custom REST API, which is how the clients and servers spoke to each other. We started by reusing that, but we’re moving towards adopting GraphQL more. GraphQL is used by Facebook extensively. With GraphQL, you get to do a lot more granular specification of what data the client needs, and also like how you define that on the server.

Moving to that more field-oriented model, and it’s this whole inverting the data model idea. Instead of business logic living on the server, for compositing data together, the client just requests specific pieces of data. We can label those things as being either Threads only or Instagram only, and tease it apart that way. We started with Instagram, and now we’re teasing it apart.

Participant 6: One of the differentiators you mentioned was that you could iterate quickly on the UI from the mobile applications. At Uber we specify a generic card format, that the backend then was returning with text first then for the other stuff. Is there something similar in place here?

Malkani: It was a little more product specific than that. We have frameworks to do that very generic stuff. Like, specify a shadow DOM, and that’s your API. There’s server-side rendering stuff. That was not what we were doing over here. This is more like product specific. This is the post you should put here, but stuff that usually you would do on the client, whether it’s figuring out what reply CTA to use, how to render the line. Like, is there a loop here, or stuff like that? It was like a custom data model, but it did let us iterate. We had variations on how we ran the self Threads, where you reply to yourself in a chain.

How do we show that? We tried a number of ideas, and we were able to do this with our dogfooding community of a few thousand people, on the fly, quickly, thanks to this. No, we were not doing a generic server-side rendering thing. We do use that in other places for some of the onboarding things, or like the fediverse onboarding uses that sort of a server driven rendering thing, but that’s not what this was.

Participant 7: If you have to do it again, what would you do differently?

Malkani: This reusing to move quicker was what saved our bacon. I would do that. I would take a few architectural decisions differently where like, specific data models that we reused, I would choose to spend a little time detangling them. Maybe it would push us back a month or something, but it would have been worth it because untangling it now is very complicated. No, the overall ethos of like, reuse what you can, yes, I wouldn’t change that.

Participant 8: Coming back on the question relating to technical debt, how are you unfolding that? How do you address that thing on a day-to-day basis? What does that look in the team planning and backlog?

Malkani: We have pretty mature frameworks for doing data migrations on this scale, because we’ve had to split things out before in Meta. When you think of like splitting Messenger from Facebook, or splitting Marketplace out from Facebook. There are a couple of examples of this. We have frameworks, for instance, that tail all mutations that happen, so that UDB that I talked about, that’s our main datastore in MySQL. There are systems that tail those bin logs and let you do things with that, so they’re observers. I’m sure there are open source equivalents of this, like tailing a Kafka. You can write an observer that tails that and says, ok, double write this, and so now we have two versions of this particular data model, say, the user model, that we need to keep up to date.

Then we start migrating the product logic over to start reading from both of those places. Depending on whether it’s an Instagram context, or a Threads context, you now start reading from the new data model. Then you start doing the double reading and comparing consistency and stuff. There’s a number of techniques like that. Part of the problem is, it’s quite a deep data model. When you think about the post itself, the post itself can have a lot of things linked to it, and all of those things recursively need to now either be Threads data, or Instagram data. We need to go in deep and do those annotations and stuff, some of which can be done statically, some of which need to be done dynamically at runtime, because it depends on who created it. It’s complicated, but still the right tradeoff.

Participant 9: When reusing the code, you’re also potentially reusing the technical debt which was originally in that code, and then you’re ending up with multiple places using the same mistakes or problems. How do you see that?

Malkani: It’s part of the tech debt you take on. Again, it was a tradeoff. We could either take on that tech debt and build a product in six months, or we could start from scratch and probably take a year or something to have a fully-fledged product. We knew that we needed to take advantage of this window. Yes, we were ok with taking that on and paying for it later. You’re taking out a loan, and you’re saying, I’m going to pay interest later.

Participant 10: When you say that you first inherited the code from Instagram, and now you are decoupling this code, so in terms of coding and backlog and storage that are being done by the developers, do you have a certain allocation for this decoupling? For example, a certain team is working on removing the unneeded parts and just keeping what is needed in terms of code, so that in the future the code is more maintainable and easier to read and everything that is needed is there without extra stuff?

Malkani: Absolutely. We have to balance needs now. On the one hand, we need to evolve the product because we haven’t found product market fit at all. We’re a brand-new product. On the other hand, we need to make this code base more maintainable and pay down some of the tech debt and remove footguns, remove things that can catch developers unaware and lead to big outages. We need to balance that. Right now, we’re balancing that by having a separate team focus on this particular migration. At some point, it impacts all product developers and you can’t get away from that. That will temporarily slow down progress in building the product. We know that we’re going to have to pay that at some point. It’s a tricky thing of how to organizationally set this up. So far, it’s been with a separate team.

Participant 11: Are there any development decisions you’ve made in the development of Threads that have then influenced things that have gone back into the code bases and applications at Meta?

Malkani: I hope the ActivityPub stuff makes it at some point. That has not happened yet. I think that has a lot of promise. Or generally interoperating with other networks. Certainly, the lean nature of this has been lauded within Meta. The fact that we were able to do this with a small team. Meta has a playbook of building large teams around important goals. This was a counterpoint example. That organizational lesson has definitely been taken to heart. I think on the technical side, there’s a lot of interesting stuff we’re doing with ranking, that’s new to Meta.

Other networks, with their ranking, the most important thing they’re predicting, obviously with checks and balances, is how engaging is this post going to be? I’m going to go look back and find the most interesting post and then sequence it by the next most interesting post. That’s how you build up your feed. With Threads, you need to pay attention to the real-time nature of this product to a much greater degree.

Some of the luxury and slack you have in your ranking stack, from the time a post is made to the time it’s eligible for ranking and the time it starts actively getting picked up, that latency window can be much bigger for something like Instagram than it needs to be. For Threads, like from the moment you post, it needs to be eligible near real time, because this is how news breaks. If there’s an earthquake, you want to know right now. If there’s a helicopter flying overhead, you want to know why. That novelty in the ranking stack is something we’ve had to push on. It’s something Twitter does very well, for instance.

On the subject of technical debt, are we working with the Instagram teams that we inherited this from and potentially have a similar solution to?

Organizationally, Threads and Instagram team is still a subsidiary. We do work with those teams. In many cases, it was not like a straight fork as much as we’re reusing the same code. If we had a shared solution, it improves in both places. Meta has this system called better engineering, it’s almost like a 20% time thing, which is similar to what Google used to have, where we incentivize people to work on tech debt type problems in their spare time. We’re falling in into that. We’re working with these teams. The main short-term focus now is making the data models clear between these apps.

Participant 12: Thinking about the products of Meta, was there a specific reason to use the Instagram architecture rather than Facebook’s, maybe something else?

Malkani: I think it started from the fact that Instagram is a directed follow graph, as opposed to Facebook, which is bidirectional friends. That one decision has a lot of technical implications. We probably could have started from somewhere else, but I think that, plus the fact that there’s a bit of Conway’s Law. The team who was building this was familiar with the Instagram tech stack, and so it made sense to repurpose that. Our options were Facebook and Instagram, and Instagram seemed like a better fit.

Participant 13: How many people were involved initially in the project, boiled down to how many teams? How did you structure that based on the fact that you were just starting on a huge code base?

Malkani: It was small. It started out with 10-ish engineers, snowballed into maybe 50-ish by the end. The fact that we could lean on these platform teams made a lot of difference. In the end, maybe 100 engineers or so, touched something that made it into the Threads app. We kept it small deliberately. A, we wanted to stay a nimble team, and we wanted to pivot the product itself that we were making. We wanted to keep the product simple. B, we were certainly worried about leak risks and stuff.

Participant 14: You said you did some ranking stuff. I’m curious if you had issues with bots, and if you’re taking steps in the security area?

Malkani: Yes, bots are a problem. Again, very lucky that we can rely on a long history of teams fighting bots on Instagram and Facebook as well. They do show up in unique ways on Threads. We have had problems where it’s gotten out to users. Sometimes you get spammy replies and stuff like that, which we’re fighting hard. Integrity is very much like a firefighting domain. These are adversaries who continually evolve their patterns. You learn one pattern, you put in a mitigation against it, and then they shift to a different abuse pattern. It’s taxing, but it is something that we know how to do and we have teams who do this well.

Among the more interesting novel challenges we have here is, Threads to a much greater degree than Instagram or Facebook, is about linking out to the web. Facebook and Instagram can be more internally referenced, but not as much outbound links. We have a number of challenges there we have to figure out about awareness of what’s going on in the web, redirects and link crawlers, and that fun stuff. Yes, it’s hard.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.