Uncategorized
How to Work Asynchronously as a Remote-First SRE
MMS • Ben Linders
The core practices for remote work at Netlify are prioritising asynchronous communication, being intentional about our remote community building, and encouraging colleagues to protect their work-life balance. Sustainable remote work starts with sustainable working hours, which includes making yourself “almost” unreachable with clear boundaries and protocols for out of hours contact.
James McNeil shared lessons learned from being a remote-first SRE at QCon Plus November 2021.
Netlify employs documentation as the core of their asynchronous interaction, as McNeil explained:
I can sign in for my day in London and review the ideas and proposals circulated by my colleagues while I was sleeping.
McNeil mentioned that it can be very difficult to disconnect when you work remotely:
When you don’t have an office, Slack is your office. “Just checking your messages” at 11pm is the same thing as walking back to your desk. We remind colleagues to sign off if we know they’re around out of hours. We also have unlimited holiday and are encouraged to take as much time off as we need.
When you work remotely you are more available; you’re literally just an @ mention away, McNeil said. It is very tempting but also incredibly draining to respond to every mention or jump into every interesting conversation that occurs outside your working hours. McNeil suggested that people should make themselves almost unreachable:
As an on-call engineer, I need to be very easily reachable during my shifts. Even when I’m not on call, sometimes I’m needed as a subject matter expert during an incident. “Almost unreachable” means turning off all out of hours notifications but configuring Pagerduty to escalate pages to you when appropriate. It means fostering a culture where it is absolutely not appropriate to call or text engineers, but where you can feel comfortable paging them if they’re needed.
Remote work isn’t better or worse than in-person work and it suits some people more than others. There are limitations to how far you can push asynchronous working, as McNeil explained:
For instance, the need for code review means that you ideally don’t want colleagues to be alone for most of their day. But with a disciplined approach you can build a high performance organisation where people can be both challenged and supported.
InfoQ interviewed James McNeil about his experiences as a remote-first SRE.
InfoQ: What practices does Netlify employ to make remote work sustainable?
James McNeil: We have colleagues all over the world. We don’t expect them to be online for all meetings and discussions, but we do hope that they feel that they have a voice in decision-making.
We use Notion as our primary document store. It’s great because of how dynamic it is. You can annotate lines, rearrange sections and tag people to request their input. Above all, it has keyword search which makes it great as a knowledge base.
Slack is our office. We try to foster a sense of community around our digital interactions. To help this, we communicate in the open and discourage private channels or DMs. Anyone can create a channel and we have ones for all kinds of interests. For instance, #we-make-things, #we-grow-plants, #we-love-food, and #we-talk-mental-health. We also use emojis extensively to condense and amplify our reactions.
InfoQ: What hygiene factors help you to be more effective working remotely?
McNeil: Practice writing. The flipside of so much documentation and asynchronous work is you can share an idea and not necessarily be around to guide people through it. Taking the time to draft and consider what you are trying to get across pays dividends in this kind of environment.
The other thing which I think is very different in remote working is the amount of personal responsibility you have to assume over your setup. In my talk I focused specifically on headsets and how important muting yourself is, particularly in incidents. But really everything from the strength of your internet connection to making sure you’ve got the right chargers and peripherals is on you. This can be very stressful if you’re not diligent in your preparations. I personally have a “go bag” with everything I need to be mobile and an unlimited data phone plan in case I need to tether for an internet connection.
InfoQ: What does working asynchronously look like and what benefits does it bring?
McNeil: When I sign in in the morning, I will first watch the recording of any company meetings that might have happened out of hours. I will also check for mentions (of my handle or my team’s) in Slack and github to get context on recent developments that concern us. We might have some Zoom meetings, but most of my day will be focus time.
One of the misconceptions about our working model might be that it’s not collaborative. For one thing, plenty of my colleagues operate in similar time zones to mine and we’re in contact throughout the day. But also because of our document-first approach, we’re very practiced in interacting with colleagues via collaboration on documentation.
The obvious benefit of working asynchronously is that it opens up a breadth of opportunities for developers who aren’t located in the big tech hubs. From the company side, it broadens your access to talent in a way that used to only be possible for a multinational corporation.
InfoQ: What tools and habits do you use for incident management and how do they support remote SRE work?
McNeil: In many ways, the way we approach SRE is an extension of our general remote practices. During an incident, we try to document everything we can in the moment. In a first instance this happens in our incident channel. The incident coordinator will translate that into the review document later. We also try to repeat the things which are said to us to confirm that we’ve heard correctly. We provide all charts in UTC during the incident and normalise our timeline to UTC in our review so there’s no confusion around what happened when. Finally, we are very explicit about who the IC is. The IC’s handle is posted at the top of the incident channel and we do an explicit verbal handover (“James can you take IC”. “Confirmed, I’m taking IC”) whenever an IC needs to be relieved.
InfoQ: What benefits have you seen in remote and hybrid working settings?
McNeil: Personally, I really hated commuting. I’ve found that being remote offers me so much more time to spend time with family and friends, exercise and work on side projects. I also think I’m better at my job as an engineer. I do miss interacting with colleagues in person, but I feel like I can better structure the cadence of my day in my own space.