MMS • RSS
The Netflix Technology Blog has shared the story of the Netflix Edge Engineering team’s journey of experimenting with approaches to building and operating services, which has culminated in the creation of “Full Cycle Developers”. This approach is showing promise with Netflix, where developers are responsible for certain operational aspects of service delivery, and are supported through training and a range of self-service tooling. Centralised teams create and maintain platforms and tooling, but each team within the organisation has the freedom to deviate from the “paved path”.
The authors of the blog post, Philip Fisher-Ogden, Greg Burrell, and Dianne Marsh, state that the purpose of the software delivery life cycle is to optimize “time to value” in order to effectively convert ideas into working products and services for customers. This is similar to Dan North’s and Jessica Kerr’s proposal that modern software development should be focused on “working to sustainably minimise lead time to business impact“. Developing and running a software service involves a full set of responsibilities: design, develop, test, deploy, operate and support. Traditionally these responsibilities have been segmented and implemented as silos within an organisation. This is typified in the classic DevOps book “The Phoenix Project“.
These specialised roles create efficiencies within each segment, but potentially create inefficiencies across the entire life cycle. The Netflix Edge Engineering team, who are responsible for the first layer of AWS services that are required for video streaming, rethought their traditional approach to delivering software by drawing inspiration from the principles of the DevOps movement. In particular, the “Three Ways of DevOps”, popularised by Gene Kim, describes the importance of systems thinking, amplifying feedback loops, and cultivating a culture of continual experimentation and learning.
The Edge Engineering team’s new approach focused on “operate what you build” (much like Amazon CTO Werner Vogel’s now famous “you build it, you run it” mentality), and put DevOps principles into action by having the team that develops a system also be responsible for operating and supporting that system in production.
Distributing this responsibility to each development team, rather than externalizing it, creates direct feedback loops and aligns incentives. Teams that feel operational pain are empowered to remediate the pain by changing their system design or code; they are responsible and accountable for both functions.
The challenge with this approach is that ownership of the full development life cycle created additional overhead for developers, and often required new skills to be learned. There was also a potential for burnout to occur as the responsibilities mounted upon individuals and teams. To mitigate these issues tooling can be harnessed that simplifies and automates associated development and operational requirements. Netflix have created centralised support teams — such as “Cloud Platform”, “Performance & Reliability Engineering” and “Engineering Tools” — that have the mission of developing a common (“Paved Road“) platform and tooling to solve problems that every development team has. Many of these tools have been released as open source as part of the Netflix OSS movement, such as the Spinnaker continuous delivery platform.
Combining the change in mindset and the creation of common infrastructure and tooling led to the creation of “Full Cycle Developers”. Full cycle developers are expected to be knowledgeable and effective in all areas of the software life cycle. Moving to a full cycle developer model requires a mindset shift; a full cycle developer thinks and acts like a software engineer (SWE), software development engineer in test (SDET), and site reliability engineer (SRE). Not all developers arrive at Netflix with the relevant skill set, and accordingly extensive training is provided. In addition, the blog post discusses that not all developers want to work like this, and there are opportunities within Netflix for more specialised roles.
The empowered full cycle developer (image from The Netflix Tech Blog)
The blog post cautions that to apply this model outside of Netflix, adaptations will most likely be necessary. This push to prevent cargo-culting and blind copying of practices from the popular “software unicorns” is echoed by industry thought leaders like Gareth Rushgrove, and was captured in his 2016 presentation “Two Sides of Google Infrastructure for Everyone Else“. As discussed by Matthew Skelton and (InfoQ editor) Manuel Pais on the “DevOps Team Topologies” website, there are a wide range of approaches and organisational structures that have been created to solve development and operations requirements.
For organisations looking to embrace a full cycle approach, the Netflix blog authors suggest starting with an analysis of the potential value and associated costs, followed by the mindset-shift. There is much associated information available on the Netflix blog, and the web in general, and there also exists open source and SaaS solutions for platforms and tooling that can meet many companies needs. An important word of concluding advice focuses on embracing simplicity: “Evaluate what you need and be mindful of bringing in the least complexity necessary.”