Month: June 2022
MMS • Sergio De Simone
Article originally posted on InfoQ. Visit InfoQ
Git 2.37 brings many new ans improved features, including a builtin file system monitor on Windows and macOS, better unreachable objects management, improved external diff, faster git add
, and more.
Git’s new builtin file monitor aims to improve performance when accessing the file system to detect file changes. This may reduce the time required to execute git status
and other commands. Git has supported the possibility of hooking tools like Watchman since version 2.16. This option was not easy to configure, though, and not frequently used. Instead, you can now enable the builtin file monitor by using the following configuration option:
git config core.fsmonitor true
According to Jeff Hostetler, the author of the patches for git’s new file monitor, the implementation relies mostly on cross-platform code with custom backends leveraging OS-native features, i.e. FSEvents
on macOS and ReadDirectoryChangesW
on Windows. A Linux backend would probably use either inotify
or fanotify
, Hostetler says, but that work has not started yet.
To improve pruning performance, git 2.73 introduces cruft packs, aimed to reduce the chance of data races when removing unreachable objects.
Unreachable objects aren’t removed immediately, since doing so could race with an incoming push which may reference an object which is about to be deleted. Instead, those unreachable objects are stored as loose objects and stay that way until they are older than the expiration window, at which point they are removed by git-prune.
Unreachable objects that have not left their grace period tend to accumulate and enlarge .git/objects
. This can lead to decreased performance and in extreme cases to inode starvation and performance degradation of the whole system.
Cruft packs eliminate the need to store unreachable objects in loose files and instead consolidate them in a single packfile between successive prune operations along with a timestamp file to track grace periods.
Another improvement in git 2.73 deals with diff temp files. Instead of using loose files, diffs are now generated inside a temporary directory under the same basename, using mks_tempfile_ts
. This allows the files to have arbitrary names, each in their own separate directory. The main benefit this brings in is with graphical diff programs, that may display a nicer output.
As mentioned, git 2.73 also include improved performance for select commands, such as git add -i
, which was rewritten in C from Perl and been under testing for a while. The latest git version adopts the new C implementation as a default.
As a final note, many developers will welcome the new git -v
and git -h
options, which will be interpreted as git --version
and git --help
respectively. Interestingly, while apparently a no-brainer, this patch still required some discussion.
Git 2.73 includes many more changes than can be covered here, so do not miss the official release note for the full detail. Additionally, you can also check out GitHub’s and GitKraken’s takes on what is most relevant in the new release.
MMS • Matt Saunders
Article originally posted on InfoQ. Visit InfoQ
Dropbox have published a detailed account of why and how they unplugged an entire data center to test their disaster readiness. With a dependency on their San Jose data center, Dropbox ran a multi-year project to engineer away this single point of failure in their metadata stack, culminating in a deliberate and successful switch-off of the San Jose data center.
Dropbox had moved away from AWS for storing data, but were still heavily centralised and dependent on their San Jose data center. The recovery time from an outage at San Jose was considered to be far in excess of what was desired – hence initiating a project to improve this in case of a significant disaster – such as an earthquake at the nearby San Andreas Fault. The improvement was measured as a Recovery Time Objective (RTO) – a standard measure from Disaster Recovery Planning (DRP) for the maximum time a system can be tolerably down for after a failure or disaster occurs.
The overall architecture of Dropbox’s systems involves a system to store files (block storage), and another system to store the metadata about those files. The architecture for block storage – named Magic Pocket – allows block data to be served from multiple data centers in an active/active configuration, and part of this new resilience work involved making the metadata service more resilient, and eventually also active/active too. Making the metadata stack resilient proved to be a difficult goal to achieve. Some earlier design tradeoffs – such as using asynchronous replication of MySQL data between regions and using caches to scale databases – forced a rethink of the disaster readiness plan.
The disaster readiness team began building tools to make performing frequent failovers possible, and ran their first formalized failover in 2019. Following this, quarterly failovers were performed, until a fault in the failover tooling caused a 47 minute outage in May 2020, highlighting that the failover tooling did not itself fail safely. A dedicated Disaster Readiness (DR) team was formed, also charged with owning all failover process and tooling, and performing regular failovers, thus removing the competing priorities involved before there was a dedicated team.
Early 2020 brought a new evolution of the failover tooling – with a runbook made up of a number of failover tasks linked together in a DAG (directed acyclic graph). This made for a much more lightweight failover process, with re-usable tasks and easier regular testing of each task. Also, it became easy to see whether tasks had succeeded or not, and important actions were guarded in case of a failure in a preceding task.
Dropbox implemented other changes to help reduce risk, with a customer-first strategy:
- Routine testing of key failover procedures – regular automated small-scale tests of failover tasks
- Improved operational procedures – a formalized go/no-go decision point, checks leading up to a failover “countdown”, and clearly defined roles for people during a failover
- Abort criteria and procedures – clearly defining when and how a failover would be aborted
In addition to working on the DR processes and tools as described above, a small team also began working on true active/passive architecture. This involved improving internal services that were still running only in the San Jose data center, so that they could either run multi-homed in multiple data centers, or single-homed in a location other than San Jose. Techniques and tools used here included using the load-balancer Envoy, and using common failover RPC clients with Courier to redirect a service’s client requests to another data center.
The final steps in preparation for unplugging San Jose involved making a detailed Method of Procedure (MoP) to perform the failover, and this was tested on a lower-risk data center first – that at Dallas Fort Worth. After disconnecting one of the DFW data centers, whilst performing validations that everything was still working, engineers realised that external availability was dropping, and the failover was aborted four minutes later. This test had revealed a previously hidden single point-of-failure in an S3 proxy service.
The failed test provided several lessons to the team – significantly that blackhole tests needed to test entire regions (metros) and not individual data centers. A second test at DFW after adjusting the MoP to accommodate these learnings was successful. Finally, the team were ready to disconnect the San Jose data center. Thanks to all the planning, the new tooling and procedures, there was no impact to global availability, and the anti-climactic event was declared a success. This provided a significantly reduced RTO and proved that Dropbox could run indefinitely from another region, and without San Jose.
The key takeaways from this multi-year project were that it takes training and practice to get stronger at Disaster Readiness. Dropbox now have the ability to conduct blackhole exercises regularly, and this ensures that the DR capabilities will only continue to improve, with users never noticing when something goes wrong.
MMS • Ben Linders
Article originally posted on InfoQ. Visit InfoQ
When we design team and departmental processes, we want to know what’s happening in the software teams. Asking team members to provide information or fill in fields in tools adds a burden and distorts reality. Setting up observability in the software can provide alternative insights in a less intrusive way. Observability in the software can be an asset to organizing teams.
Jessica Kerr spoke about applying observability for speed and flow of delivery at QCon London 2022 and QCon Plus May 2022.
When you want to go fast, it helps to see where you’re going, Kerr stated. To deliver fast and focus on flow, teams can use observability, as Kerr explained:
Observability gives developers clues into the consequences of each change they make: did it do what we expected? Did it do anything else harmful?
Observability lets developers see performance regressions and error rates, check usage on features, and show the functionality to others in an intelligible, referenceable way, Kerr said.
Kerr explained how observability in the software can become an asset to organizing teams:
As leaders of teams, we can use observability by adding tracing to continuous integration. Then we can measure deploy frequency and build times. We can graph those the same way we measure performance in software. And when it’s time to improve lead time (from commit to production), we can see what’s taking so long in our builds and fix it.
A little bit of system knowledge plus a distributed trace gives a lot of insight, Kerr concluded.
InfoQ interviewed Jessica Kerr about how observability can be applied to increase the speed and flow of delivery.
InfoQ: How can building in observability help to see performance and cost impacts?
Jessica Kerr: When Honeycomb added the ability to store our customers’ event data for up to 60 days, instead of only what fits in local storage, lots of consequences happened. Queries over a wide range of data took minutes instead of seconds — even tens of minutes. Querying our traces, we could see exactly how much. Looking at a trace, we could see why: hundreds of fetches to S3 bogged down our database servers.
To fix this, we moved those fetches to AWS lambda functions (I gave a talk at StrangeLoop 2020 on how we used serverless to speed up our servers). This lets us scale our compute power with the scope of the query, live on demand. It also scales our AWS costs rather unpredictably. To help with this, we built observability into our lambda functions, so we can see exactly which queries (and whose queries) are costing us a lot. We got in touch with some customers to help them use Honeycomb more efficiently.
And then! when AWS released Graviton 2 for Lambda—it’s a different computer architecture, cheaper and supposedly faster—we tried it out. We easily measured the difference. At first it was less predictable and slower, so we scaled back our use of it until we made our function more compatible.
Serverless is particularly inscrutable without observability. With it, we can measure the cost of each operation, such as this database query.
InfoQ: How can developers benefit from adding observability to their software?
Kerr: Let me give an example. In one of my personal toy applications, I started with traces instead of logs. As soon as I looked at one, I found a concurrency error. That would have been really hard to find any other way, because the waterfall view of the distributed trace clearly showed an overlap where I knew there shouldn’t be one.
MMS • Vasco Veloso
Article originally posted on InfoQ. Visit InfoQ
Trivago’s platform was built using PHP and their Melody framework. A small number of engineers at Trivago maintained Melody, which was a continuity risk. Melody’s documentation and examples could not be as rich as desired due to a lack of capacity, making engineer onboarding and support much more difficult. Trivago then decided to rewrite its platform on Typescript using Next.js.
To mitigate the risks that developing a home-grown framework such as Melody entails, Trivago engineering had to decide whether to assign more resources to Melody or drop it. They decided to stop using Melody.
Such a platform replacement represents a risk to the business in potentially lost revenue because new features are not introduced during the rewrite period. The project started in 2020 under circumstances that already mitigated that risk.
Another risk is the loss of development team motivation once existing features are ported over, but new features remain on hold. Teams could begin to build new features once the new platform is stable enough, which reduces the risk of motivation loss.
Developer experience, hackathon results and market penetration were the deciding factors in choosing the new technological stack: Next.js with Preact using Typescript. Developers would benefit from having a cleaner code base employing widely used libraries with significant community support.
Many architectural decisions must be made while designing a new platform with new technologies. Agreements must be reached timely and pragmatically to ensure the project’s success. Trivago engineers used a form of architectural decision records that Tom Bartel described as a process based on the following:
- A decision document where all the relevant facts and viewpoints are collected and organised.
- A decision owner curates the decision document, prepares the decision meeting, and is responsible for reaching a decision.
- A decision meeting, where viewpoints are exchanged and discussed, and a decision is made at the end.
One of the most critical learnings in this project was that the team should not grow too quickly. The author noted that experimentations and course changes are easy when a handful of engineers work together; the team size should only rise above five once all crucial decisions are made and the foundations are stable. Doing so prevents communication overhead, frustration and wasted effort.
Being a rewrite, the new system needed to achieve feature parity with the existing platform. The team verified correctness by running both platforms in an A/B manner and comparing some indicators, such as user interaction, revenue generation and search types.
The author adds that the rewrite also brought benefits for the end-users in the form of faster application loading times.
MMS • Karsten Silz
Article originally posted on InfoQ. Visit InfoQ
The goal of Project Leyden is to “address the long-term pain points of Java’s slow startup time, slow time to peak performance, and large footprint.” It wanted to get there “by introducing a concept of static images” to OpenJDK. Static images result from Ahead-of-Time (AOT) compilation to native executables. After two years with no publicly visible activity, Project Leyden pivoted in May 2022 to first optimize Just-in-Time (JIT) compilation. The “resulting optimizations will almost certainly be weaker” than initially planned and reach mainstream Java developers at the end of 2025 at the earliest. Oracle’s Graal project has already achieved Project Leyden’s goal, but at a cost that the project wants to avoid for now.
The Graal project originates in Oracle Labs and is not part of OpenJDK. Its GraalVM Native Image is a Java AOT compiler that produces native executables today. They have four advantages over Java’s JIT compiler: fast startup, lower memory and CPU usage, fewer security vulnerabilities, and smaller file sizes.
But these achievements come at a cost: GraalVM Native Image enforces a so-called closed-world assumption on Java applications that eliminates a whole category of Java applications. Why? Because Java is a dynamic language and gives applications a lot of power at runtime, such as reflections, class loading, or even class construction. And some of these features don’t work in the closed world of GraalVM Native Image. That’s why Project Leyden now wants to “explore a spectrum of constraints, weaker than the closed-world constraint, and discover what optimizations they enable.” Still, Leyden “will likely […] produce fully-static images,” though only “in the long run.”
OpenJDK Has Previously Tried AOT Compilation
Project Leyden is OpenJDK’s second attempt at AOT compilation. The first effort was jaotc
with JEP 295, Ahead-of-Time Compilation, delivered in JDK 9 in September 2017. Like GraalVM Native Image, it used the Graal project. Unlike GraalVM Native Image, it was highly unpopular: When Oracle removed jaotc
from its Java 16 builds, “no one complained,” Oracle drily noted with JEP 410, Remove the Experimental AOT and JIT Compiler, delivered in JDK 17.
Project Leyden has had an unusual history for an OpenJDK project. Java Language Architect Mark Reinhold proposed it in April 2020 followed by OpenJDK having approved it as a project in June 2020. But the project had shown no visible progress in the two years between that approval and the creation of its mailing list in May 2022. That’s why the project is just starting, focusing “more upon concepts than code” now. Reinhold stated that components such as “the HotSpot JVM, the C2 compiler, application class-data sharing (CDS), and the jlink
linking tool” are targets for optimization. Notably missing from that list was CRaC, an OpenJDK project that reduces startup time by loading the Java application state from disk.
A back-of-the-envelope calculation shows possible delivery dates. LTS releases have now outsized importance in Java. Ben Evans, formerly of monitoring company New Relic, announced at Devoxx UK 2022 that “no non-LTS version has ever passed 1% of market share“. This shows that mainstream Java developers only migrate from one Java LTS version to another LTS version.
Since Project Leyden is only now underway, few results will be production-ready in September 2023 for JDK 21, the next LTS release. So mainstream Java developers will likely only see Project Leyden’s first results with the LTS release after that, JDK 25, in September 2025. Based on that assumption, Project Leyden would, at the earliest, deliver AOT compilation to native executables with JDK 29 in September 2027. InfoQ will continue to monitor progress on Project Leyden.
Spring Boot Reacts to Project Leyden
At least some of the features considered for Project Leyden, such as jlink
or CRaC, require application framework support to work best. That’s why InfoQ reached out to developers representing Spring Boot, Quarkus and Micronaut for their initial reaction to the Project Leyden announcement.
Spring Framework project lead, Juergen Hoeller, approves of Project Leyden:
Project Leyden is a promising initiative aligned with the general direction that we are taking in Spring Framework 6 and Spring Boot 3.
Hoeller also embraces CRaC for Spring:
CRaC heap snapshots could become a common option for improving the startup time of Spring-based applications. Taking the snapshot at the very end of the application startup phase, there would be hardly any open file or network resources at that point, in alignment with CRaC’s expectations. Spring even resets its common caches at the end of an application context refresh already, clearing startup-related metadata before dynamically repopulating the caches with request-related metadata. In terms of […] the application context specifically reacting to a snapshot event or improving the “snapsafety” of common components, we will certainly try to empower early adopters as far as technically feasible within our Spring Framework 6.x line.
Hoeller thinks that Spring will support jlink
and the Java Platform Module System (JPMS) soon:
The current Spring Framework 6.0 milestones do not include module-info descriptors yet. This is on the roadmap for the M6 milestone in September, re-evaluating the module system readiness of the third-party ecosystem as we move on to our 6.0 release candidate phase. With Project Leyden potentially turning
jlink
into a more powerful and versatile tool, we intend to prepare not only forjlink's
current capabilities but also for its further evolution.
Quarkus Reacts to Project Leyden
Quarkus Co-Founder and Co-Lead, Jason Greene, commented on Project Leyden:
We are most excited about the Leyden project’s goal to revise the Java Language Specification to better support static images, native compilation, and other technologies such as JVM checkpointing. Further, we were happy to see closed-world remaining as a likely long-term goal for the project.
Greene embraces CRaC for Quarkus:
Initial support for the CRaC research project was recently contributed to the Quarkus project by the CRaC lead. Since Quarkus performs build-time optimization no matter the run-time target type, you still see considerable savings when running on OpenJDK, not just GraalVM. Adding a checkpointing approach, such as CRaC, on top of OpenJDK further optimizes startup time. It does not bring similar memory savings as native images, but it is an interesting future option for applications that prefer or require JVM execution.
However, Greene is more reluctant about jlink
and JPMS in Quarkus:
As of today,
jlink
only brings benefits to the storage overhead of a JVM-based application (memory overhead and startup time are essentially the same without it). However, the common practice in a container or Kubernetes application is to layer on a standard JVM base image, which already brings further savings than switching all applications tojlink
(since each would bundle their own trimmed JVM). In the case of a native image, the fine-grained elements of the JVM are compiled into the image, sojlink
is not helpful in this scenario as well. Likewise, with JPMS, Quarkus already has the notion of modularity through Quarkus extensions, allowing you to trim your dependency set to only what you need. The approach Quarkus takes is compatible with the simple flat classpath that most of the Java ecosystem and build tools prefer today. On the cost side, moving to a pure JPMS module model asjlink
requires (no auto-modules) would mean a breaking change not just to Quarkus but to many of the libraries Quarkus builds on. Before considering a switch, we’d like to see these factors balance out better.
Micronaut Reacts to Project Leyden
Sergio del Amo Caballero, Principal Software Engineer at Object Computing, Inc. (OCI), had no official Micronaut Framework statement on Project Leyden. But he pointed to a recent GitHub issue for adding CRaC support in Micronaut.
Caballero also shared a YouTube clip from July 2020 featuring Micronaut creator, Graeme Rocher, commenting on JPMS: Micronaut supports JPMS and publishes module-info files, but has “to balance that with supporting Java 8”. JPMS was added in Java 9, but Micronaut 3.5, the current version, still runs on Java 8.
Conclusion
So far, OpenJDK hasn’t addressed “the long-term pain points of Java’s slow startup time, slow time to peak performance, and large footprint.” First, its jaotc
AOT compiler failed to gain traction and was retired. Then Project Leyden set out to standardize native compilation in Java but stalled for two years.
Now that Project Leyen has pivoted to first optimizing JIT compilation, things are looking up: Both Spring and Quarkus embrace CRaC for startup time reduction. But when it comes to smaller Java application sizes, only Micronaut adheres to Project Leyden’s suggestion of using the JPMS. Spring plans to support the JPMS in version 6 by the end of 2022, though the Spring ecosystem may not. And Quarkus currently has no plans to add the JPMS.
Results, in the form of JEPs, from Project Leyden could reach mainstream Java developers by the end of 2025 at the earliest. So at least until then, the combination of the GraalVM Native Image AOT compiler with a framework like Quarkus, Micronaut, or the upcoming Spring Boot 3 remains the best option to avoid “Java’s slow startup time, slow time to peak performance, and large footprint.”
MMS • Rebecca Mahoney Andy Molineux
Article originally posted on InfoQ. Visit InfoQ
Key Takeaways
- Growth at pace is hard(!) and it’s all too easy to form silos and stretch too thin. Organisational structure is key to mitigating this – form highly aligned teams by sharing your company vision and immediate goals, and keep all teams loosely coupled with teams formed around product modules with clear end-to-end ownership of the tech stack.
- Always put your people first – treat individuals as adults, and ensure everyone has the tools needed to have open conversations and give effective feedback. Give everyone as much freedom and self-directed growth as possible, to unleash their talents to drive your mission forward.
- Never stop communicating, with teammates and across the organisation. Ensure key messages are given in various forms: in person, in writing, and even by video recording!
- Smash the silos – allow space for free movement between teams, making upcoming opportunities visible to all and double down on guilds as a way of sharing knowledge across teams and empower individuals to drive cross-organisational learning.
- Pay down “social debt” by making room for remote interactions that would happen naturally if you were in an office environment. Make the effort to reach out for a 5-minute debrief after a challenging meeting, or embrace the “organised fun” of team games!
Being a small EnTech disruptor in a rapidly evolving market can feel a bit daunting; add in an acquisition, a rebrand and twice the team members that you had a year ago and you have a recipe for growing pains. Here is how we leaned on our strengths and pulled experience from all directions to allow for team member fulfilment during a breakneck growth spurt.
The impact of fast growth
Manchester-based KrakenFlex, previously Upside Energy, was founded in 2014 and acquired by Octopus Energy Group in 2020. Now part of the award-winning Kraken technology platform, KrakenFlex connects with a whole host of clean energy technologies, allowing it to control, dispatch and optimise those devices to match real-time energy demand and supply. This allows technologies like batteries to charge when prices are low and discharge when prices are high, as well as participating in services to help balance the electricity grid.
At the end of 2020 we had 227MW capacity supported on our platform and 26 people working at KrakenFlex. Less than 18 months later, our capacity has grown sixfold to 1.3GW, and our headcount has nearly tripled. It’s an exciting journey but has certainly felt like a rollercoaster at times!
We’ve struggled in a few areas over the past couple of years: how to grow sustainably without stretching ourselves too thin; we’ve fallen into silos occasionally where one hand does not know what the other is doing; and we’ve seen our “veteran” and team lead level people spinning many plates. Following the acquisition by Octopus Energy Group and the rebranding to KrakenFlex, we rapidly scaled the business by forming teams around product modules, improving internal communication and giving people the space to lead their teams. Since then we’ve minimised silos, replacing them with greater collaboration and responsibility for every team member.
A key initiative that has driven this forward has been to be really clear on our business goals; the key performance indicators (KPIs) that we track to see how we’re progressing towards them; and setting quarterly company Objectives and Key Results (OKRs) to drive progress on strategic items. Our implementation of this hasn’t been perfect, and we’ve gone through a number of iterations to avoid teams independently coming up with their own OKRs that don’t align to the wider strategy! We’ve found that by focusing on company-level OKRs and sharing these widely and often with the team, we’re able to pull everyone together, and then let teams discuss and align on how their work is contributing towards our wider goals.
We have a fantastic team of talented professionals who have stepped up to every challenge we’ve thrown at them and deliver a fantastic product and service to our customers every day. We’re proud of our culture of learning.
On a human level, one thing we focus on is supporting the team as everything changes and grows. When we were a small startup, it was quite natural for everyone to be involved in everything; but that approach doesn’t scale. The journey to more focused work is uncomfortable as it often involves letting go of work, work that you feel is important to your place and identity within KrakenFlex. We coach team members, showing them that we always have their back and that there are many opportunities for them to grow and try new things within the company. A key component of this is visibility for the team on where we’re headed as a company, what our product roadmap looks like and how we think our team structures may evolve in the coming year to keep pace with demand. Our product roadmap is continuously visible to the whole company using our shared planning tools, but we also talk through highlights at our fortnightly showcase once teams have demonstrated new functionality they have built. A forecast of our team’s structure for the coming quarter is also shared at each of our “all hands” town hall meetings and published on our intranet.
Keeping growth sustainable
Sustainable growth is all about our people – hiring fantastic folks who are passionate about our mission and want to take responsibility; creating a supportive environment with lean structures and processes to ensure everyone is working together; and continually listening and iterating on what we can do better.
Communication is key to all of that – we’re on a huge and important mission to create a new energy system for a better world and we’re growing very fast, so it’s going to be bumpy at times! We use programmes like Slack to allow seamless communication and community building both as individuals and across the whole company. No matter what you’re interested in, someone in the 3,000 strong global Octopus Energy Group team will have created a slack channel for it, building bridges across teams and improving inter-team relationships. Charity fundraising, creating custom emoji keyboards, sharing good news or simply recommending dinner spots – it’s all available in one place.
As individuals, we do that through encouraging a continuous learning culture, regular company-wide Q&A sessions with our CEO, and an accessible structure which means anyone can speak directly with the leadership team whenever they want or need to.
Bringing people along on the journey makes growth sustainable. In our industry there’s always going to be more opportunities than we can handle, and the current climate change reports published highlight the need for immediate global change. We found we were repeating the mantra of “it’s a marathon, not a sprint” and encouraging everyone to find a pace that means they can perform at their best and do the job they like the best. One of the key parts to this is allowing managers to source the training and development that they need; this can come in the form of learning from business mentors, attending short courses on specific business administration or simply following the tracks of a manager before them. This increases the sources and points of learning for different approaches – which may be stronger than a monolithic approach.
Much like our parent company, Octopus Energy Group, we also allow free movement between teams. If there is something specific that you are working on that is really getting you excited or pushing your professional buttons, it’s encouraged to seek that speciality out and foster it, allowing colleagues to adapt and experiment to find their calling.
Build bridges across siloed teams
As a fast-growing scaleup, one of the decisions we made when building our teams was to copy what small businesses do and make managers the HR, general manager and go-to person for their team. This gives our managers space to learn and progress, and often means that colleagues are happier because they have a strong relationship with their managers. Plus, team members “go find” the support they need, building bridges in the process. This has lots of benefits – it aligns teams directly to the value they’re building for customers and gives new starters a smaller footprint to onboard into.
To ensure that siloes didn’t form, we took the time to bring everyone together as one KrakenFlex team, and talk about our mission and our goals for the year. We encourage teams to think about what success looks like to them and how it feeds into the wider goal, and how what they were building linked back to our shared company objectives.
The other key thing we did was to double down on our guilds – communities of practice focused around specific technical or practical topics. In keeping up with our guilds we found cross-team relationships and knowledge sharing has strengthened which has helped to reduce silos.
Onboarding newcomers and collaborating remotely
Before the pandemic struck in March 2020, we had around 26 people in the business, all working out of our small office in the Northern Quarter in Manchester. Collaboration between teams was easy as everyone was sitting next to each other and had direct lines into senior management.
One key challenge we found as soon as we started hiring and onboarding remotely during the pandemic was that new people got to know their team and their part of the system really well, but didn’t have much visibility of the rest of the company.
By bringing teams together at every company town hall meeting and work showcases – both online and onsite- we’ve been able to combat some of that. Weekly all-hands meetings with the teams and even the whole Octopus Energy Group give a taste of different parts of the business. We also record everything and make it available on internal systems and Slack, along with lots of guidance on how we work together. Anybody can ask questions and speak up when given the opportunity and it helps make new members know their opinion is always appreciated and considered.
We are big advocates of radical candour, a feedback approach developed by Kim Scott that encourages you to care personally and challenge directly. It provides a framework for people to give honest feedback built on a foundation of mutual respect and caring for each other. We ran a cross-company book club with Kim’s book and had great engagement from the folks who joined. We met weekly for six weeks while we read the book a few chapters at a time and captured notes and insights on a Miro board, with the discussion focused on how we could take the ideas we’d read about that week and implement it in our daily work. Although it’s not easily measurable, we have seen an increase in the number of “radical Candour Feedback Moments” across the team.
Putting people first at KrakenFlex
A big one for our team is “social debt”. We work in an industry where there are many different solutions to a single problem, and we encourage discussing potential solutions and approaches and working together for a solution. When we were all in the office, occasionally these discussions could become quite heated, with folks gathering around a whiteboard arguing the case for one technical solution over another. This “constructive conflict” is often essential in order to come to the best solution to a problem! However, once the dust has settled, we’d inevitably go for a coffee or beer together to decompress and laugh about the tense technical arguments of the day. Paying down this “social debt” and restoring balance in the relationship after a good debate is far more difficult when technical discussions happen remotely.
Tackling this remotely requires carving specific time out for things that would “just happen” if we were face-to-face in an office. It can feel awkward to reach out for a quick call just after leaving a big meeting, but those one-to-one debriefs can make all the difference.
Nothing makes professionals shudder more than “organised fun”, but it has to be done! Here’s a couple of things we found work: leave five mins at the end of a meeting to allow natural chance encounters to take place after a group meeting; online board games were a big success (I’m a big fan of Sushi Go!); and encouraging office visits to our various other Octopus Energy Group sites on the occasions when it makes sense.
By building trust in managers and creating micro-communities among teams and across functions, we’re able to produce the strong bonds that enable you to disagree, discuss and work collaboratively safely. While this is more applicable to Octopus Energy Group, this approach has been so successful that it has allowed the operations team to scale at breakneck pace. One Team Leader looks after a ten-person team and is their IT, HR and administration support for the most part; once the team starts working successfully and becomes a high achiever, the team splits in two with a new team leader managing the second team and both teams hiring five new starters, combining experience with learning- a sort of cell replication structure.
Our people are central to everything we do at KrakenFlex – we can’t continue providing excellent service to our customers, or drive forward our mission to transform the energy system without them!
We make this real in a number of ways.
As we mentioned, we love to take tips from the Octopus culture, and how we bring this to life in KrakenFlex. We hire fantastic professionals and give them the freedom and responsibility to run with ideas and take ownership of their work. A fantastic example of this is our unlimited holiday policy – we expect folks to work hard and trust them to make responsible decisions with their team when they’d like time off. This mostly means making sure that whole project teams aren’t all off at the same time, but if by some force of nature they do need to be, how can we support everyone’s needs for time off? Where we might need to bring in externals or ask another team to step up to support?
Another area we’ve really focused on, particularly since we’ve been remote during the pandemic, is gathering feedback. We use a tool called OfficeVibe to help us do this. Every week it sends five brief questions out to everyone in our team over Slack, and maintains a dashboard showing us how our team is doing across 10 themes, and gives an overall employee engagement score. It also has a feature whereby people can send us feedback (anonymously if they prefer) which has been invaluable in staying close to what people really think and enables us to act quickly on their ideas.
Through collaboration and replication we have become an altogether more organised and scalable business. Already at KrakenFlex we are well on the way to reaching our initial target for the year: management of 100,000 devices and 6,000 MW of energy capacity by 2023. This wouldn’t be possible without having taken a real look about how we were currently doing things and how we wanted to do them moving forward.
MMS • Sergio De Simone
Article originally posted on InfoQ. Visit InfoQ
Uncovered by a team at MIT CSAIL, PACMAN is a new vulnerability affecting a defense mechanism available in Apple Silicon processors and known as pointer authentication code (PAC). While Apple downplayed the severity of this finding, the researchers hint at the fact PACMAN brings an entire new class of attacks.
Pointer authentication code is a mechanism available on ARM-based processors, including Apple’s, which aims to enable detecting and guarding against unexpected changes to pointers in memory.
Pointer authentication works by offering a special CPU instruction to add a cryptographic signature — or PAC — to unused high-order bits of a pointer before storing the pointer. Another instruction removes and authenticates the signature after reading the pointer back from memory.
This mechanism makes it possible to detect when a value has been tampered with between the write and the read operation by checking the signature. If the signature is invalid, the CPU will treat that value as corrupted and will cause a program crash.
According to a paper by Ravichandran et al., a new attack methodology makes it possible to speculatively leak PAC verification results via microarchitectural side channels without causing any crashes. Furthermore the attack works across privilege levels, thus allowing an unprivileged user to gain access to the OS kernel space.
For PACMAN to pose a real threat, though, it must count on an existing software-level memory-corruption bug. In other words, since PAC is a line of defense against memory tampering, if you have a bug that makes it possible to access a memory location and modify its content after a write, then you could use PACMAN to prevent the CPU from detecting it and crashing.
As mentioned Apple somewhat downplayed the relevance of this vulnerability. In a statement to Tom’s Hardware, an Apple spokesperson said “this issue does not pose an immediate risk to our users and is insufficient to bypass operating system security protections on its own.” Still, as the researchers point out, PACMAN hints at a new category of vulnerabilities:
We believe that this attack has important implications for designers looking to implement future processors featuring Pointer Authentication, and has broad implications for the security of future control-flow integrity primitives.
PACMAN is not the first vulnerability found in Apple Silicon processors. Recently, researchers at the University of Illinois Urbana-Champaign, the University of Washington, and the Tel Aviv University described Augury, an attack that leaks data at rest the A14 and the M1 family of processors. Augury targets another novel optimization mechanism available in modern processors, data memory-dependent prefetchers (DMP) by making it possible to access out-of-bounds memory locations when prefetching data belonging to an array.
MMS • Daniel Dominguez
Article originally posted on InfoQ. Visit InfoQ
AWS in a joint effort with Microsoft have established PyWhy as a fresh GitHub organization to integrate AWS algorithms into DoWhy, a casual ML library from Microsoft, which has moved to PyWhy.
The mission of PyWhy is to build an open-source ecosystem for causal machine learning that advances the state of the art and makes it available to practitioners and researchers. PyWhy, will build and host interoperable libraries, tools, and other resources spanning a variety of causal tasks and applications, connected through a common API on foundational causal operations and a focus on the end-to-end analysis process.
The majority of real-world systems, whether they be industrial procedures, supply chain systems, or distributed computer systems, may be characterized using variables that may or may not have a causal relationship with one another.
The evaluation of causal machine learning models and the formalization and integration of domain knowledge into machine learning pipelines present significant research problems. Finding the best identification technique, creating an estimator, and performing robustness checks are all phases that are often completed entirely from scratch as part of the normal procedure. The assumptions were difficult to comprehend and validate, though.
DoWhy is one of the existing causality libraries that focuses on several methods of effect estimation, with the overall objective of determining the impact of interventions on a target variable.
By utilizing the strength of graphical causal models, AWS work enhances DoWhy’s current feature set GCMs. Judea Pearl, who won the Turing Award, created the formal framework known as GCMs to model the causal links between variables in a system. The causal diagrams, which visually depict the cause-effect linkages among the observed variables with an arrow from a cause to its effect, are a crucial component of GCMs.
DoWhy already integrates possible outcomes and graphical causal models, two of the most well-liked scientific frameworks for causal inference, into a single library for effect estimates. AWS contribution seeks to strengthen the relationship between the frameworks and the communities of researchers who are committed to them.
MMS • Matt Klein
Article originally posted on InfoQ. Visit InfoQ
When talking about edge proxy use cases, North-South is a term that is often used to talk about traffic entering a network perimeter. Envoy proxy is often used directly or as part of many other solutions when implementing these use cases. Today on the podcast, Wes Reisz speaks with Matt Klein about a recent announcement that Envoy Proxy will partner with many well-known companies in the space, including VMware, Ambassador Labs, and Tetrate to build and maintain a new member of the Envoy family – Envoy Gateway.
Java News Roundup: Classfile API Draft, Spring Boot, GlassFish, Project Reactor, Micronaut
MMS • Michael Redlich
Article originally posted on InfoQ. Visit InfoQ
This week’s Java roundup for June 20th, 2022 features news from OpenJDK, JDK 19, JDK 20, Spring point releases, GlassFish 7.0.0-M6, GraalVM Native Build Tools 0.9.12, Micronaut 3.5.2, Quarkus 2.10.0, Project Reactor 2022.0.0-M3, Apache Camel Quarkus 2.10.0, and Apache Tika versions 2.4.1 and 1.28.4.
OpenJDK
Brian Goetz, Java language architect at Oracle, recently updated JEP Draft 828039, Classfile API, to provide background information on how this draft will evolve and ultimately replace the Java bytecode manipulation and analysis framework, ASM, that Goetz characterizes as “an old codebase with plenty of legacy baggage.” This JEP proposes to provide an API for parsing, generating, and transforming Java class files. This JEP will initially serve as an internal replacement for ASM in the JDK with plans to have it opened as a public API.
JDK 19
Build 28 of the JDK 19 early-access builds was made available this past week, featuring updates from Build 27 that include fixes to various issues. More details may be found in the release notes.
JDK 20
Build 3 of the JDK 20 early-access builds was also made available this past week, featuring updates from Build 2 that includes fixes to various issues. Release notes are not yet available.
For JDK 19 and JDK 20, developers are encouraged to report bugs via the Java Bug Database.
Spring Framework
Spring Boot 2.7.1 has been released featuring 66 bug fixes, improvements in documentation and dependency upgrades such as: Spring Framework 5.3.21, Spring Data 2021.2.1, Spring Security 5.7.2, Reactive Streams 1.0.4, Groovy 3.0.11, Hazelcast 5.1.2 and Kotlin Coroutines 1.6.3. More details on this release may be found in the release notes.
Spring Boot 2.6.9 has been released featuring 44 bug fixes, improvements in documentation and dependency upgrades similar to Spring Boot 2.7.1. Further details on this release may be found in the release notes.
VMware has published CVE-2022-22980, Spring Data MongoDB SpEL Expression Injection Vulnerability, a vulnerability in which a “Spring Data MongoDB application is vulnerable to SpEL Injection when using @Query
or @Aggregation
-annotated query methods with SpEL expressions that contain query parameter placeholders for value binding if the input is not sanitized.” Spring Data MongoDB versions 3.4.1 and 3.3.5 have resolved this vulnerability.
Spring Data versions 2021.2.1 and 2021.1.5 have been released featuring upgrades to all of the Spring Data sub projects such as: Spring Data MongoDB, Spring Data Cassandra, Spring Data JDBC and Spring Data Commons. These releases will also be consumed by Spring Boot 2.7.1 and 2.6.9, respectively, and address the aforementioned CVE-2022-22980.
Spring Authorization Server 0.3.1 has been released featuring some enhancements and bug fixes. However, the team decided to downgrade from JDK 11 to JDK 8 to maintain compatibility and consistency with Spring Framework, Spring Security 5.x and Spring Boot 2.x. As a result, the HyperSQL (HSQLDB) dependency was also downgraded to version 2.5.2 because HSQLDB 2.6.0 and above require JDK 11. More details on this release may be found in the release notes.
Spring Security versions 5.7.2 and 5.6.6 have been released featuring bug fixes and dependency upgrades. Both versions share a new feature in which testing examples have been updated to use JUnit Jupiter, an integral part of JUnit 5. Further details on these releases may be found in the release notes for version 5.7.2 and version 5.6.6.
Eclipse GlassFish
On the road to GlassFish 7.0.0, the sixth milestone release was made available by the Eclipse Foundation that delivers a number of changes related to passing the Technology Compatibility Kit (TCK) for the Jakarta Contexts and Dependency Injection 4.0 and Jakarta Concurrency 3.0 specifications. However, this milestone release has not yet passed the full Jakarta EE 10 TCK. GlassFish 7.0.0-M6, considered a beta release, compiles and runs on JDK 11 through JDK 18. More details on this release may be found in the release notes.
GraalVM Native Build Tools
On the road to version 1.0, Oracle Labs has released version 0.9.12 of Native Build Tools, a GraalVM project consisting of plugins for interoperability with GraalVM Native Image. This latest release provides: support documentation for Mockito and Byte Buddy; prevent builds from failing if no test list has been provided; support different agent modes in the native-image
Gradle plugin, a breaking change; and support for JVM Reachability Metadata in Maven. Further details on this release may be found in the release notes.
Micronaut
The Micronaut Foundation has released Micronaut 3.5.2 featuring bug fixes and point releases of the Micronaut Oracle Cloud 2.1.4, Micronaut Email 1.2.3, and Micronaut Spring 4.1.1 projects. Documentation for the ApplicationContextConfigurer
interface was also updated to include a recommendation on how to define a default Micronaut environment. More details on this release may be found in the release notes.
Quarkus
Red Hat has released Quarkus 2.10.0.Final featuring: preliminary work on virtual threads (JEP 425) from Project Loom; support non-blocking workloads in GraphQL extensions; a dependency upgrade to SmallRye Reactive Messaging 3.16.0; support for Kubernetes service binding for Reactive SQL Clients extensions; and a new contract CacheKeyGenerator
to allow for customizing generated cache keys from method parameters.
Project Reactor
On the road to Project Reactor 2022.0.0, the third milestone release was made available featuring dependency upgrades to reactor-core 3.5.0-M3
, reactor-pool 1.0.0-M3
, reactor-netty 1.1.0-M3
, reactor-addons 3.5.0-M3
and reactor-kotlin-extensions 1.2.0-M3
.
Apache Camel Quarkus
Maintaining alignment with Quarkus, The Apache Software Foundation has released Camel Quarkus 2.10.0 containing Camel 3.17.0 and Quarkus 2.10.0.Final. New features include: new extensions, Azure Key Vault and DataSonnet; and removal of deprecated extensions in Camel 3.17.0. Further details on this release may be found in the list of issues.
Apache Tika
The Apache Tika team has released version 2.4.1 of their metadata extraction toolkit. Formerly a subproject of Apache Lucene, this latest version ships with improved customization and configuration such as: add a stop()
method to the TikaServerCli
class so that it can be executed with Apache Commons Daemon; allow pass-through of Content-Length
header to metadata in the TikaResource
class; and support for users to expand system properties from the forking process into forked tika-server
processes.
Apache Tika 1.28.4 was also released featuring security fixes and dependency upgrades. More details in this release may be found in the changelog. The 1.x release train will reach end-of-life on September 30, 2022.