Mobile Monitoring Solutions

Close this search box.

Presentation: Navigating Complexity: High-performance Delivery and Discovery Teams

MMS Founder
MMS Conal Scanlon

Article originally posted on InfoQ. Visit InfoQ


Scanlon: I’m going to start with a quick confession, which is something that I need to get off my chest when I’m talking about high-performance discovery teams. The first thing is that I was a software developer, and I was a, very bad software developer. I’m currently a product manager. As I was a bad software developer, I’ve also been a part of some teams that I would describe as fairly low-performing. I’ve seen some things out there, I’ve seen deployment processes that take teams of people eight hours to get something out to production. I’ve seen entire codebases that don’t have any automated testing at all, teams that don’t track their work, all different sorts of things. As I transitioned my career from engineer to product, I wanted to make sure that I was a product manager that engineers were going to enjoy working for. I wanted to have a high-performing team. I wanted to make sure that people were comfortable with the work. I think that over the course of my career, I can say that I’ve done that a little bit.

I want to show a chart right here to give a quick definition of what I might consider a high-performing team. This shows a series of iterations that we’ve performed. We’re tracking the standard velocity here, both in tickets and points. I think this says that, as you see towards the end, we’re getting very consistent in how we’re delivering things. We’ve almost doubled the output of the team from when we started to where we are now over 20 or so different iterations. You see there’s a metric at the top that says greater than eight. We measure happiness and engagement, our scores are consistently very high. I feel fairly vindicated that we have a very high-performing delivery team right here. Pretty proud of this.

However, you’re probably guessing that there’s one caveat to this, and that’s that this team was mostly in charge of innovating. We were supposed to be building new products. What you see here, what these normal metrics show you is that we’re delivering working software at the end of every iteration. It’s working very well, but we’re not necessarily having that innovation. We’re not delivering a lot of value to our business. My talk is going to explore a little bit about why that is.

I’m going to give you the key learnings upfront, and then keep coming back to them, layer them in as we go through. The first is that there are fundamentally two different types of work and two different ways of thinking. I’ll break those down as basically the delivery work that we were talking about, shipping working software, and then discovery, which is basically understanding things about our users, learning different things as we go, validating hypotheses, that thing.

As we think about what makes a team successful in discovery, there’s four areas that are a little bit different than a team that’s delivering working software. The first is maximize learning. Then the tactics on the right here side that we use to do that are accelerate delivery and use MVPs. The other areas of focus are going to be better ideas, alignment metrics. Again, we’ll talk about all these in more detail. The topics are on the left, the tactics are going to be on the right, and we’ll walk through those.

Our Team

I want to start with a bit of context and a bit of introspection. I don’t know what you all know. This is the area that I come from right now, we’re working on an atomic product team, which means that we have everything we need to do our job successfully. There’s a product manager, typically a designer maybe half time, but at least some portion of a designer is working with the team in an embedded capacity, and then engineers. We have typically three to six in the environments that I’ve worked in. There are also some other functions that aren’t represented here. Where I am now at Flatiron Health, we have quantitative scientists, which are basically data scientists. They might have a typically a different background, but if someone is supposed to be building a product, they’re on that team. We normally use something like Scrum, Lean, or Kanban, but it’s typically up to the team to choose the tools that they need to actually get things done.

The example that I’m going to walk through to illustrate the points I have is from a company called Handshake, which does B2B e-commerce. I think this might be fairly relatable for a lot of the people in the room. Handshake was recently acquired by Shopify. This is background for what these examples are going to be. The context that I want to walk through here is that this team was tasked, after having that good delivery output, with building a B2B payment system. We already have a record of this team delivering software very well over a course of many different iterations. Now we’re expanding to a completely new part of B2B e-commerce that Handshake has no prior knowledge in, the B2B payment system.

As we know, we’ve been delivering this software fairly consistently, however, it hasn’t always been delivering value to the customers. I want to take a step back from a 30,000-foot view talk about why that is. Basically, I to visualize work in this way. This is an arrow pointing to the right, pretty straightforward. This is our delivery track. I mentioned we were using Scrum. We have these little iterations here, this represents how we do our work, it’s usually two to four weeks at a time. We start with a sprint planning, we put working software out at the end. We have daily scrums, we inspect and adapt every so often. That’s all gravy.

However, there’s a different concept and area that we were completely ignoring in the process frameworks, which was the discovery side of things. This is something that Jeff Patton and Marty Cagan talk about a lot. They do a lot of product owner trainings and product management trainings. Essentially, “Discovery” are interactions that you have that help you learn something either about your users or about your products. It’s typically the domain of the designer, the product manager, and some members of the engineering team, who are out there learning and validating different things and figuring out what it is that they’re doing. At some point, the learnings that you have in the discovery track are things that flow back down into the delivery track so that you can do things reprioritize your backlog, do backlog grooming, those sorts of things.

It’s important to mention here that I think discovery is going to happen, whether or not you’re planning to do it or not. It happens anytime a user interacts with your product or a bug report comes in, incident, anything like that. Discovery is always happening, even if you’re going to be thoughtful about actually going out there and doing it.

One of the first things is that there’s fundamentally two different types of work, as well as two different ways that this can be structured within team mandates. If you think of a perfectly balanced team who has a mandate from the company to do 50% of delivery work versus maybe 50% innovation, you may have teams that are very committed to roadmaps and projects, where I would say maybe up to 80% of their time is already built out in a structured roadmap. Maybe you have a day reserved for personal projects or innovation or something like that, but you could go up to 80%.

As a product team, a lot of what we’re doing, and especially in a new product space for something like building out a whole new part of an e-commerce system, would be really much more focused on discovery. We might have something that’s 20% delivery, 80% discovery. This is how we would try to think about balancing our time. What this says is that this team is supposed to be innovating and finding new spaces to work, and not necessarily focused on delivering software. You can see the first problem here is that we’re almost entirely describing our process in terms of the software we’re delivering, when it’s not necessarily the key output of the product team right now.

The other learning about this is that the context shifts from the team from iteration to iteration. Even though we have a team that’s focused primarily on discovery, as you see in the early iterations here, at some point, we’re going to circle around a solution. We’re going to converge on what actually needs to get built, and then we might shift into more delivery mode. Maybe two months in, we’ve done a couple customer development interviews, we feel pretty good about what it is that we actually need to build, we’re going to start shifting from thinking about the discovery side of things to actually building the software and getting it out the door in the typical Scrum, Agile, Lean processes that we have.

At some point, that’s probably going to shift away, maybe actually release a feature. We’ll probably still monitor it to make sure it’s working properly. Maybe something else comes up on our roadmap, and then we have to start the process all over again. It’s a continuous cycle that we have to go through evaluating which context it is that we’re working in. The first key learning here is that there’s fundamentally two different types of work, whether we’re planning for them or not.


Next, I want to touch about on Cynefin here. I think this is interesting, every talk I’ve been in so far has mentioned Cynefin. Before today, has anyone heard of Cynefin? A few. How many people after today have heard of Cynefin or seen some introduction? A lot more. A quick background, I’m not going to spend too much time – Cynefin is a Welsh word that means habitat or place of abode. It’s a framework that says there are four different types of problem areas, four different ways that problems can react and systems can react. The little gray box in the middle here is that we don’t know which domain that we’re in right now. I’m going to talk through a couple of different examples here.

The way to read through this framework is thinking about a problem in terms of cause and effect. In the obvious quadrant in the bottom right, we might have a very straightforward relationship between cause and effect. This might be something in terms of engineering terms where we have a very simple checklist for something that has to go out, or a playbook that gets written where someone who doesn’t necessarily have domain knowledge can come in, pick it up, understand what to do and solve the problem that’s at hand.

The next level is “Complicated.” This is something that has a lot of different inputs, a lot of different variables that go into it, but it’s something that requires domain knowledge and can be solved algorithmically or heuristically. A complicated problem, we’ll walk into an example, it’s something where you can understand the current space that you’re in, all the different inputs and outputs that you have, and figure out the best path forward.

A complex problem is something where that path forward isn’t apparent until you actually encounter the problem until you do something to change what happened and see how the system reacts to what it is that you put out there. If you think about complicated, it’s very straightforward for me to have the domain knowledge I need. I can debug, I can do something, I can fix the problem. For a complex problem, though, I’m not going to be able to do that until there’s some outside system force that gives me more information. I have to get through this process very quickly. Then chaotic is something that we don’t necessarily have to go into but means there’s no apparent relationship between cause and effect.

For some concrete examples of what a complicated problem might look like, this is one of our failing builds right now. I hope it’s not still failing. I can see here that the build failed, I can see the error message. I get a stack trace in our logs. If I am someone who has the requisite domain knowledge, I can look at this, I can walk through what actually happened when this build failed. Then I can use my expert knowledge to figure out how to proceed from there. If it’s the test case needs to be deleted because it’s obsolete, if it needs to be updated to accommodate a new use case, I should have the context to actually go through and figure this out start to finish without actually having to encounter the problem any further.

This is something that we use a lot, called a story map. I think this represents what a complicated domain looks like. The way to read a story map, if you’re not familiar, is that we have broad buckets of user actions that are at the top. We have different paths that they can take through the system at the bottom. We like to slice these things horizontally so we can figure out what should be in an actual release. This is essentially something that we use when we’re encountering a new problem space or trying to build out a new feature that will tell us what it is that the user cares about and maybe where we should be focusing our time.

This is actually one that we did when we were building out the new B2B payment system to figure out what our different options are, what users are going to care about and figure out how to structure those releases going forward. That’s part of the problem. We did this in a vacuum. We tried to take this complex problem of understanding what it is that a user would want to do in a brand new feature in a brand new area of our product, and we tried to distill it down into a complicated problem, where we said, “These are the features that are out there. This is the release plan that we’ve identified, and we’re going to go and plow through and build all these things. Maybe we’ll put some logging and monitoring in to make sure that people are using it. Broadly, we know best, we’re going to go ahead and do this.” Where we wanted to get to was more on the complex side. We needed to build in that uncertainty that said, “We don’t know how the users are going to react to this, so how can we get those learnings as quickly as possible?”

This is a good reference if you’re interested in going into more detail about complexity and complication in business. We already talked about the first step, which is recognizing the type of system that you’re dealing with. You also have managing and not solving. We’re not intending to solve user problems, we’re intending to manage the system and set ourselves up for success. Try, learn, adapt whatever it is your chosen OODA loop of choice. Just making sure that we’re constantly ad ing how it is that we’re thinking. Then the key point I have here at the bottom is developing a complexity mindset. All this means that we want to be cognizant that there are areas where we don’t know best, where we have to get things in front of users as quickly as possible so that we can get these learnings and figure out what it is that we actually need to build.

Basically, these are the two key learnings of what we needed to change. There’s discovery work, there’s delivery work. We were treating both of these as the same thing, but we needed to split these out into different areas.

Our first thing that we tried as were doing this is that we wanted to maximize how much learning we were having. If we understand that we’re operating in this complex environment, we want to make sure that we are encountering this problem as quickly as possible so that we’re getting a lot of feedback faster and can act on that feedback later on. This is the model that I showed before. We had infrequent integrations from the discovery track down into the delivery track. What we wanted to do is make sure that we at least doubled this. We wanted to learn as fast as we could and make sure that we were actually getting that information back to where it needed to go.

The other thing here is that now since we’re learning so quickly, it doesn’t necessarily match up very well with our process model from before. Learning is oftentimes very messy. It might not be something that fits into the Scrum process, and we had to adapt as we went to make sure that it still worked for us.


The tactic that we used to accelerate that discovery was the concept of MVP, a minimal viable product. The goal of the MVP is to maximize learning while minimizing risk and investment. When we were thinking about this from the discovery standpoint, all of our MVPs were going to be single-use case MVPs. We thought that paring down the functionality enough would give us enough leeway to put it out there in front of users and get that feedback. However, there’s several other different layers of things that we could have been doing to make sure we got that feedback a lot faster. Recognizing that we wanted to do that, we had to try some different things here.

I have an example up here of a paper prototype that our designers put together. This is something that we can easily draw out in an hour, and then we can validate the mental models we have of the problem. We can put components on the screen, present it to a user and see if how we’re thinking about the problem makes sense. If we need something more high fidelity, we can move into an Invisionapp thing that has nice high fidelity wireframes and put that in front of them and see if that’s how they were thinking about it so or not. The point of all these, to have these different types, is to make sure we’re validating the next thing that we need to do as we go through it. At the lowest level, which is validating the problem space, do we understand what it is that a user wants to do? Are we thinking about something that may be fairly foreign to us, as a B2B payment system? Are we thinking about that in the same way that the user is? Just making sure that we’re all aligned on that.

The next stage might be validating usability. Can we put something together that they’ll actually be able to use? Then at maybe the higher stage where you have a more high fidelity HTML mockup or something, we’ll validate the interactions that we have as they go. There are really a couple different things we can learn as we go, and we need to adjust the tools that we use accordingly.

“Lean Customer Development” is really good for figuring out how you can structure some of these MVPs. I put some other types up here because I think these are interesting if you start to see them out in the wild. I actually got MVP the other day by Warby Parker. They have a new app on their website. It says, “Enter your phone number and it will text you a link,” but they did not actually text me a link. I’m guessing that’s an audience-building MVP that they have out there to see if there’s enough demand for that new app.

That’s the first area we focused on. We were trying to maximize the learning, so we accelerated the discovery cycles. Then, to do that, we had to use MVPs to actually get them done.

Idea Flow

Now that we’re having these learnings, we want to make sure that we’re not only having them faster, but we’re having higher quality ones as well. This is a representation of the team again. We wanted to make sure that the ideas were bubbling up from within each of the team members and that they’re getting the most out of all the learnings that are going on. The key thing for us to do this, and I don’t think this is very new, is making sure that we have a comfortable level of psychological safety. Project Aristotle is a study that said there’s a couple key factors in what determines how well a team is going to perform. At the highest level, it’s psychological safety. This is going to be the best predictor of how your team is going to do.

From within our team, we were able to focus on making sure team members felt safe to take risks. They could be more vulnerable in front of each other. We could have healthy, productive conflicts and still resolve them and still work together on the same things. This is something that was going to give us a higher quality level of idea as we were having these learnings.

However, this is really good for focusing on our individual product team. What we found is that we are also trying to validate these ideas with users, who could be people who are well outside of our company, if we’re building something that an admin is not going to use, or maybe an internal team, customer support or success or product operations, something like that. Psychological safety really took care of making sure the ideas were flowing within our team. Between those teams we didn’t have a good way to actually do that. We needed to make sure we were tying the dots together a little bit better than we were doing at the time. This says that even teams outside of your walls are things that you’re going to make sure you need to have communication pass for.

Collective Intelligence

What we did next was focus on our collective intelligence as a useful framework for figuring out if we’re going to have better idea flow, and if we’re going to have more high-quality ideas. Collective intelligence is basically defined as your ability to perform a wide range of tasks as a team, and how well you do on that. It’s very highly correlated with performance on complex tasks. “Team of Teams” is useful reference if you’re interested in building out your team’s collective intelligence, especially across an organization, super helpful as you scale.

Collective intelligence is the bar in the red. This says that it’s going to predict how we do on complex tasks up to five times better than either the maximum individual intelligence of a team member or the average team member intelligence as well. When we’re thinking about this, it’s much more important to have a team that feels safe and comfortable than it is to have A players on our team. That’s how we’re going to generate better ideas, if they’re more comfortable in taking those risks and having interpersonal conflicts when they know they’re not going to be judged personally for those.

There are three things that we did here to help raise our collective intelligence as a team. The first was removing friction in the feedback process. We did this both with external customers and internally. The next was having informal internal teams who are really focused on collaboration and not status updates, as well as having more frequent broadcasting of updates to whoever was interested in getting that sort of information.

The first one looks like this. If we’re actually broadcasting our updates, we wanted to focus on pushing status updates out, whether it’s to our customers in the form of release notes or more relevant communications, emails, marketing, anything that, as well as to internal teams as well. Whatever you use at your company is probably fine, but if it’s status reports, if it’s something called snippets, whatever you want to do to get your information out there.

The other thing that we started doing is really having closer relationships with our specific customers. As we’re thinking about this learning process, it’s very helpful for us to be able to show someone a paper prototype, and then a mocked-up wireframe, and then something that might be slightly more interactive to gauge their reaction at each different level. We wanted to make sure we had very close relationships with these people. We would run a lot of programs to get them to sign on for maybe three or four weeks at a time. Where we have a weekly meeting, we can show them something a little different every time and see how they’re reacting to it.

The other area is focusing more on our internal teams as well, especially when there’s dependencies on the software that you’re building. If you have to depend on an infrastructure team to maintain the code, if you have to rely on a separate team who’s going to be an internal user of it or security or something like that, we wanted to make sure that we were having the ideas flow earlier in the process compared to things that happened after the fact. That was really where we saw we ended up getting into trouble.

We start to form these internal teams, we’ve called them working groups, that are really focused on a specific problem and making sure that all the voices are represented so that we don’t have any surprises at the end. Similarly, how you want to integrate early and often, we were actually building functioning software. We want to make sure we’re getting those voices together as early and often as possible too so that later on when we try to actually integrate the working software, we don’t have those surprises.

This was the second area we focused on. Now we know that there’s two different ways of thinking about the work that we’re delivering: the discovery work of innovation and building new products, as well as actually building the software. In order to work in the discovery area, we’re accelerating our discovery, and we’re building out lots of different types of MVPs to raise the quality of those MVPs. We’re making sure that our teams feel safe in taking risks, as well as building out collective intelligence with our customers and across the organization.


The next thing is that as we were building out all of these different MVPs and as we’re talking to customers and generating these ideas, we want to make sure that the team feels empowered to act on them, as well as making sure that they’re all working towards the same goal. We have a lot of different ideas that are flowing right now as we have these very quick learning cycles and we’re validating different hypotheses. We might end up with a team that ends up with ideas that go in different directions than what the ultimate goal of the product is. What we want to do is make sure that everyone has ideas that are very tightly aligned to what it is that we want to build in the long-term, and that they understand the why of what it is that what we’re doing. There are three things that are fairly useful for this. We use OKRs at the organizational level. Below that we use product briefs to talk about the why of what it is that we’re doing. Then we shifted how we do roadmaps a little bit as well.

The first is OKRs. I’m at a company that was founded by two ex-Googlers right now. We think OKRs are fairly ingrained in us. OKRs stand for objective and key results. It’s a way to iterate on your planning process, where you say, “This objective is where I want to go in the future, and these key results are how I measure if I’m actually getting there in the way that I want to.” We do these on an annual level for our strategic planning as well as quarterly at the tactical team level. It really helps to say, “This is what it is that we’re actually doing.” OKRs are something that are very difficult to get right. This objective is supposed to be something that’s aspirational. The key results are supposed to be metrics space, as well as maybe a couple of delivery things in there if you have hard deadlines. There’s a lot of times when people try to do OKRs, especially on a quarterly cycle, and end up struggling for a little bit.

I’ll give an example of what some of our objectives were as we were trying to build out this B2B payment system in first. The first thing is that this objective says we want to beat our competitors. Obviously, the point of building the software is that we want to make it so that our product is better than them. We’re going to get more money for the owners. That’s great. These key results are I want to integrate with I want to build out this functionality, as well as different shipping charges. I have a metric in here because I think this is going to be something that actually reduces churn. Maybe I’ll explore it integrating with Bitcoin and then PayPal as well. I know these things are things that I’m going to have to do at some point. I’m going to put them in my key results list.

I think all of these are probably wrong. The problem is that the objective isn’t very aspirational. It doesn’t give you a good understanding of why it is that we’re doing something and who it is that we want to focus on with this. The key results aren’t supposed to read a roadmap of things that you do. You don’t want to take the current this delivery work that you’ve been working on and throw it into this format, because you’re not going to get the same results. What we actually settled with was something more like this. The objective is aspirational. It’s that I want to accelerate cash flow for small businesses. You can clearly understand here why it is that I’m building something, and even has the target persona in here as well, the small business owner, and the reason why they’re going to want to use this.

One of my key results is actually delivery-focused, integrate with two payment processors by a certain deadline. I want to make sure that I have this out in time for a key busy season for us. I want to make sure that we give ourselves that leeway. I still have a delivery-focused key result here, as well as one that says, “I want to make sure my learnings are happening. I want to conduct at least 15 customer development interviews. I want to talk to 15 different users, understand the problems faced better than I do now, and make sure that I’m building something that is going to end up being very successful at the end.”

Then this metric is actually one that represents pretty well. The one before this was reduce churn, which is fairly broad. It might not be something that this team can actually impact on their own. However, this one says, “I want to process $100,000 a month in gross merchandise value.” I want at least $100,000 of transactions to flow through my system using this new payment processing stuff. This gives us the outcome of what it is that would happen here.

These are a lot more triangulated to what the actual goals are of what I need. Now I can have conversations across the organization to make sure there’s alignment on what it is that I’m building, as well as report back up to senior leadership level to say, “This is how we’ve interpreted the vision of what you’ve put out,” which was that build a broad payment system thing. The key thing here is, for the OKR perspective, it now it helps you align your organization around what it is that teams are trying to focus on versus what it is that they’re actually building.

You see the objective here at a level higher than what this team is working on is establish ourselves as the premier B2B platform for small businesses. That’s not something that came across where we said, “We need to build this payment system, and we have this list of features that it needs to support.” This means that the team has more understanding now of the problem that it is that we’re working on. We understand who our customer is. We understand a little bit better what it is that we need to actually do to solve the objective at the strategic level. Then the senior leadership level from their perspective can easily understand what it is the team’s going to be focused on and more accurately incorporate that into their strategic planning and guidance later on.

What we do is, on a quarterly basis, we’ll measure how we did against these key results. This is also something that comes from Google. There’s a link here for a scorecard if you’re interested in trying this out for your own goals. Essentially, we want to end up somewhere between 60% and 70% of our key results being graded out as green. You see at the top, I put the objective in here. We want to accelerate cash flow for small businesses. Since my key results are very measurable, I can actually go through this list now and figure out one by one how I did against them in the quarter. I have integrated with two payment processors by August 1st. Maybe I only got one of those done by August 1st. I got 50% on that, not the best. I did, however, conduct 15 customer development interviews, and I was able to process $80,000 in GMV, not $100,000, but it shows that we’re moving in the right direction.

Overall for this team, 80% means that we probably did pretty well. We probably could have stretched ourselves a little bit further, maybe at aim for a higher goal of processing GMV per month or maybe doing more customer development interviews, because we could have stretched a little bit further.

That’s really what OKRs give you here. The first is as you get to focus and commit to different priorities, you get to align and connect with your teammates to make sure that you don’t have a lot of dependencies across teams that may be loosely coupled, as well as making sure that your team, your internal product team is very focused on what it is that you’re building. It also helps you track for accountability by using something that scorecard that you had before, as well as the stretch goals that you can put in there. It easily lets you know if you’re supposed to be aiming for 60% and 70%. Are you setting goals at the right level or do you need to raise or lower the bar that you’re trying to attain here? “Measure What Matters” is a good reference for this if you aren’t at a company that was founded by people who worked in Google.


Next, we have these OKRs which are very useful with communicating with our leadership team and with setting the vision and the why we’re doing these things. There’s always going to be that lower level of things that we need to actually build. the epics, the features, whatever you want to call them. We’re still actually producing software at the end of the day, and we want to make sure that we have a way that connects the high-level objectives and the mission and the vision, all the way down to the team who’s doing the implementation so that they understand how that works.

This was the mental model that I think our team brought into this. When you think about alignment and autonomy, you probably think of them as two ends of a scale, where if you have a team that is very fully aligned, they’re probably doing everything very in sync, but there might be someone who’s very command and control who’s forcing them to do those things. If you have a lot of autonomy, you might not have a lot of alignment. What people probably do when they think about this is try to pluck down this little slider at the appropriate level for the context that they have.

What we found, especially using “Art of Action” as a reference, is that this really isn’t the case. Alignment and autonomy are two different scales. We can have a very highly aligned team that has a full amount of autonomy, and we can have a team with no autonomy that has no alignment either. Examples of that, if I have high alignment and no autonomy, this is probably something that you’ve experienced, where someone hands you a requirements document, says, “You need to go out and build this.” What you’re really missing here is the why. Why does this matter? Why do our customers care about it? What are these requirements here going to actually give us?

If you have a team with a lot of autonomy but not a lot of alignment, you might get a very vague metric that you’re supposed to move or a vague direction to go, but you’re not going to necessarily be doing it in the right way. Someone who says, “You want to increase the payment processing on our platform for small businesses,” if you don’t have the appropriate context, engineers might come back and say, “Great, I want to play around with Bitcoin. I want to integrate it, see if small business owners are going to do it.” It’s probably not going to be a successful future. You have this high degree of autonomy, not a lot of alignment in terms of what it is that you want to do.

What we found successful in doing this is saying, “Here’s the metric, here’s why this matters, and let me also provide more context so that you understand why it is you need to build the things that you need to build.” Having that metrics focus, but also saying, “These are the reasons why,” normally results in a better idea coming from the team after they have that understanding.

The tool that we found was most effective for doing this was these product briefs. Product briefs are a written document that we have. It’s normally between two and three pages long, we try to keep them very short. It spends a lot of time building up the context and walking it all the way down from the strategic level to how it is that we got to the future that we’re going to build. What’s not on here is any requirements or any technical architecture doc. That’s something that comes after the fact. It’s really something that after the team is fully aligned on what the problem is and what the hypothesis is for this feature, we can actually figure out what it is that we want to build.

I talked about the problem space. This will include things like, “Here’s all the research we’ve done, here’s my supporting metrics that I’m going to use to justify this,” as well as “Here’s the context, here’s the elevator pitch for what it is that I’m building.” The product hypothesis is something that we phrase this, “We believe that doing this thing or building this for these people is going to have this tangible impact. If it works, we’re going to do this, this rollout step or amplification plan, and if it doesn’t, here’s how we’re going to roll it back.” This helps us, one, bake in that complexity mindset that we talked about before. We’re recognizing that there is not a lot of certainty in what it is that we’re doing until we actually get further down the line, and saying that we believe, or we think, and giving us a little bit of leeway.

We talk about the metrics, but we do this in three different levels. I’ll get into more detail later on. Basically the business metrics are top, bottom-line revenue, churn, customer satisfaction, things like that. We also have product-level metrics, which I think are typically outcome metrics. How do we know that someone who is using my product is benefiting in the ways that I’ve expected them to benefit? Then engagement is the lowest level of metrics. It’s things that happen every day.

Then the key thing for these briefs here is going to be this concept of this back brief. This is the continual cycle of updating this document and having conversations that are really focused around it to show what it is that I’ve learned as I’m discovering new things about our users or as I’m trying to build something. Initially, it’s going to give us a lot of alignment around the problem space and what it is that we need to do to actually build something. It’s going to be useful ongoing as well as we move down into the actual iterations. Are we learning something new that invalidates the research that we’ve done before? Then it’s easy to see how this changes over time, and easy to communicate up to leadership why it is that we’re adjusting the strategy that we had before.


Now we have this concept of alignment around our visions and our goals using OKRs, the things that roll up into the KRs, the actual features. We have the context appropriately set using these product briefs. Now we need to put it in our roadmap, which, unfortunately, mostly still looks like a Gantt chart. This is something that we’ve typically used. I think I’ve used it at every place that I’ve worked. It says, I’m going to deliver this feature in this month, I’m going to move on to this one, it’ll be done here, on and on. This is something that I think marketing and sales typically ask for. They need a level of certainty when they’re actually out there talking to people. They want to make sure we’re committing to dates. However, knowing that we don’t fully understand the problem until we encounter it later on, we need to bake in a little bit of uncertainty with this.

The best balance we found is using something that uses broader buckets of time in more metrics space to communicate around what it is that we’re doing. Using everything that we’ve pulled together before, I have my objective here. It ties to the top-line strategy. I have the key metric that it is that I’m trying to move with these different features. Then I can still say these are the features that I’m going to be doing right now, as well as links back to the product brief.

The next bucket of time is going to be maybe six months out. It probably depends on how frequently it is that you’re actually delivering things. Six months might not be enough for you, but this, at least, allows you to say, “There’s lower certainty in these groups of things, but we still think we’re going to move on to these later on.” You have the ability to back out if you learn something after the fact. Then later are things that are maybe good ideas that you want to explore, but they haven’t been prioritized yet. You don’t need to be a parking lot for ideas. It doesn’t need to be your JIRA project, where you can throw anything in and then forget about it. All these things should still have product briefs. It might be something that’s not a key for the business at that time.

We’re generating these better ideas, and now we need to make sure that we have the alignment across the organization to do that. We use OKRs to communicate with leadership about what it is that we’re doing, as well as focusing on the key results at the team level to give more of that context for the actual things that we’re building. We use these product briefs to make sure that they’re appropriately tied to the OKRs, as well as aligned with the team around the metrics that we’re moving, and baking in the uncertainty that they need to really capitalize on ideas as they come up. Then we’ve shifted our roadmap a little bit so that we’re talking about these things in a different way.

3 Levels

The next thing is talking about these metrics. This is the final area that we’ll be spending time on. Essentially, we mentioned these three different levels of metrics before, engagement being the things that come up the most at the bottom. These are how do I know people are using my product? These are daily active users, monthly active users, different events that happen, different milestones, things like that. These are going to be the most frequent things that occur as we’re measuring how people are using our product.

At the next level up, we had the actual outcome metrics. We call them product-level metrics. This is, “How do I know my product is benefiting my users?” This is normally the key one for a team to focus on. if I build something that is supposed to be a payment system, how do I know that it’s having the intended effect of helping customers get their cash back faster? Then at the top level, it’s tied to top or bottom-line revenue, average contract value, lifetime value.

Correct Category

I think typically where we’ve been when we talk about these is that for teams that are focused on delivery and for most teams in general, you have to have some tie back to revenue or long-term value or some cash benefit to what it is that you’re doing. We found that this has actually been fairly restrictive in how it is that we do things. One thing that happens when we’re only looking at revenue is that it’s very hard to prioritize new ideas, new areas that a team should explore. If you’re only tying to things that have an impact on the top or the bottom line, you’re not really going to be able to tie that to a balance sheet maybe for the foreseeable future and maybe forever.

What’s really freed up the teams is to focus on something customer satisfaction or NPS. This gives you the leeway to say, “These are the biggest customer problems that our product has right now. We should really have a team focused on moving these metrics as compared to the other ones.” If we were focused on revenue, again, it’s going to really restrict us in what it is that we can actually do. Whereas on the customer satisfaction side, we know that we can surface problems no matter where they are, as long as they’re going to be important.

The final thing I want to tie in here is now that we have these metrics, is back to the high-performing delivery team from before. Still doing the same delivery work, still delivering consistently, delivering more and more value every time, engaged in what they’re doing. The other thing that we need to reference now is, are we delivering that value that we said we were going to deliver? How we do that is layering in our OKRs here. You see this is basically what we had in the spreadsheet from before. We’ve delivered 65%, which is right in line with what we want to do. Now we can report on the value that we’re delivering as a discovery team, as well as the actual delivery portion of the process as well. We know the software is going out, now we can measure if the value is actually there too.

To sum everything up here, we have the delivery, the discovery work. Discovery work is something that fundamentally is going to be uncertain until we actually get through it. We have to get through it very fast, we have to maximize our learning. We use MVPs to do that. As we’re building these MVPs, we want to make sure that they’re higher quality so we make sure that the team feels safe in taking risks, and that we’re aligned across the organization and with the teamwork that we have.

For the alignment, we want to make sure that we have our OKRs set up to communicate the vision and the strategy appropriately. We use the product briefs to talk about the actual tactical level of the things that we’re doing. Then we shift our roadmaps a little bit as well. Then for metrics, we recognize that there are three levels and that we’re not going to be able to necessarily move the revenue side of things. However, we can choose a category that gives us a lot more leeway in how we innovate by focusing on customer satisfaction or NPS or effort score, something that’s reflected in how our users are feeling about our product.

This is me. This is where I’ve worked previously, it was at the UN. I’ve done government contracting. I was at a startup called Handshake here in the city that was recently acquired by Shopify. I’m now at Flatiron Health. We’re going to grow to 1,000 people by the end of this year. If you’re interested in working at Flatiron, feel free to click that link or let me know. Then as promised, here are the references that have helped shape this. This really, I think, is a good overview of how you can go from something high-level strategy to actually implementing this at a team level and making sure that people feel engaged and have the proper constraints in the work that they’re doing.

Questions and Answers

Participant 1: Let’s say you are in a setting where you join a team, they have the roadmap, their OKRs, and they recently released something. There are constant fires. You realize that the product is unstable, and also the testing suite is incomplete. How would you go about adjusting the roadmap, the OKRs? Renegotiating basically what you have committed to and try to stabilize the team, the performance and make sure that releases happen in a better and more stable manner.

Scanlon: I think the key thing for that is if you’re not delivering properly now, then that’s probably the area you need to focus. Your primary measure of progress is going to be working software, so making sure that your engineering practices are squared away, is probably where you need to start. If you’re already causing fires consistently, it does sound you need to renegotiate the roadmap. I would use that as your leverage for getting that renegotiation done. If you’re wasting a lot of time doing things fixing bugs or you have a lot of technical debt that’s never been prioritized, I would try to quantify that somehow and say “Look, 50% of our time is not spent on delivery or execution, it’s spent on rework.” We need to make sure that we can free up the team time later on so that we can start to do one of these other two valuable things.”

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Migrating Two Large Robotics ROS1 Codebases to ROS2

MMS Founder
MMS Roland Meertens

In 2018, the Robot Operating System 2 (ROS2) launched as a successor for ROS1. At ROSCon 2019 several speakers shared their experience in moving from ROS1 to ROS2. Lessons were shared in two separate talks: the Autoware project, and a demo port by Rover Robotics.  

The Autoware project took the opportunity to restart the design of their software to work better for ROS2. Autoware started as Autoware.AI, which is open-source software for self-driving vehicles. They based the software on ROS1, which was great for prototyping. However, there are three reasons why Autoware was not suitable for building certifiable products. The main reason is that ROS1 is not certifiable, and achieving this would take many years and people’s effort. Determinism and memory safety of the Autoware AI application was not also possible. Last but not least, there is less than 6 years of the support life left, as the end of life for ROS1 is specified as 2025. Therefore, Autoware decided to start as new software, which was more work but proved to give better long-term results. 

Moving to ROS2 provided several benefits over ROS1. One benefit was managed launching, where you can specify in what order nodes launch. Another benefit was the DDS communication protocol, which can pass messages around with zero-copy, saving both CPU and memory resources. In terms of development they spend more effort on having increased test coverage, more and better understandable documentation, and more continuous integration to get the software certified. 

To ensure continued happy use of the existing Autoware stack, the engineering team added support for the new project to the old project. This was achieved by adding a ROS1 bridge. This way new high-quality features are introduced in the new project, while keeping current existing contributors happy. In terms of contributors: because of the higher expected quality, Autoware needed to have a higher test coverage, a design document for every major contribution, and writing for and testing of deterministic execution. To encourage both current contributors and new contributors, people working on Autoware are mentoring them. They walk through the process of contributing to Autoware with potentially interested contributors, who are also given frequent encouragement. 

A second talk was given by Nick Fragal and Nick Padilla, both working at Rover Robotics. They also used ROS1 for sharing common robotics code, and to minimize rewrites of common tasks. They want to use ROS2 to share reliable robotics code. The technical steering committee for ROS2 contains many people from large companies who take reliability seriously, and thus it can be expected that ROS2 will be adopted by many companies. This provides ROS2 with a lot of promise. 

Fragal talked about their application: a t-shirt delivery robot which brings t-shirts to attendees at a conference. They made a demo using ROS1 and wanted to port it to ROS2. The initial port went smooth, but when they give a new demo at a conference with slow wifi, they ran into problems. 

The underlying reason was the DDS protocol, which did not run well over a slow wifi link. To solve this problem, they looked at the parameters which can be tweaked to make DDS work better on slow wifi. They also compared different implementations of the DDS protocol and worked together with suppliers to improve their implementation. Eventually, multiple DDS providers could be used to bring the software on the robot up within 10 seconds. The take-home message is to choose a DDS middleware that has zero-copy to prevent issues related to moving around images in your memory. 

Overall, Rover Robotics estimate they spent approximately 60% of their time looking at the communication protocol when porting this demo. However, now that it is working better (for everyone) they hope to focus 90% of their effort on their navigation and application code. 

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Presentation: Inside Job: How to Build Great Teams Within a Legacy Organization?

MMS Founder
MMS Zoe Gagnon Francisco Trindade

Article originally posted on InfoQ. Visit InfoQ


Trindade: My name is Francisco [Trindade] and here we have Zoe [Gagnon]. We’ll leave intro for a bit into the presentation. Just to make clear, we are here to talk about how to build high-performing teams in an existing organization.

I want to tell you a brief story. I joined Meetup one year ago, I was managing a few teams. In the first week that I started, there was a team presenting a project that was supposed to take three months. That was the deadline that was being given. It was clear the project was going to take more than a year, and it actually took more than a year. The teams I was managing were having discussions about how do we estimate stories, are we estimating effort, or complexity of time, so how do we work together? We were struggling to find ways to really work together effectively. I come from a background of consulting, and process improvement, and teams and building teams, and it was clear to me that something needed to change at Meetup.

Gagnon: I came to Meetup through my friend, Lisa. We used to have brunch together quite a bit – we’re a bit busy for that right now. We used to do brunch every other weekend, and I found myself, for about nine months that brunch was just consulting for Meetup. I really thought it’d be nice to go give that a try and see what I could actually do coming in. When I got there, I found myself on a team where it had just come into existence, but the stakeholders wanted immediate results, and we were nowhere close to being able to deliver. I looked at this and I really thought we need a change in how we approach this and how we build software.

Trindade: We are not these mod people coming into a big organization. This was something that everyone was seeing. We were hired as part of an effort to make Meetup better. At the time that we joined, there was a struggle to find a way to work together, a way to work effectively in software, and a way to build software effectively as a company, in the context that we had. One of the challenges the company was really trying to tackle was how to find a sustainable way of working that’s both effective and keep engineers fulfilled and happy in their jobs? How do we stop having projects that go two, three, four times over the budget and scope and deadline that we had, or estimates that we have? How do we actually breach this lack of and solve this lack of trust in software delivery that both leadership, from a product and business perspective had on the engineering side?

What we wanted to bring to you today is in three parts. It’s really to tell our story from the year that we’ve been in the company, and tell you what have we done that has worked well, and which context that we had and how that has worked well. Also wanted to tell you the things that we tried and clearly didn’t work as we expected, and where we failed, or still trying to improve. The last thing we want to tell you about the blind spots, the things that – we came, certain that we could solve, and really couldn’t move in the time that we have been there.

The reason we’re doing that is because changing and improving companies, I think, is something that’s very common, and I’m sure that everyone here is at the conference and is at this session because you think that your team or your company or your unit could be improved. We really want to try to bring as much of examples and stories and a context to you so you can copy us in context that are similar. You can apply things that we did, and also hopefully, you can actually do it better and not make the mistakes that we made throughout this journey.

A bit of intro. Zoe [Gagnon] here, she is an engineering manager at Meetup. She has been there for eleven months now. She manages the discovery team, which is one of the teams that’s trying to help people find events within Meetup. Before that, Zoe [Gagnon] was three years at Pivotal, the consulting company, and she has been around seven years also, working with startups in different phases of their existence.

Gagnon: This is Francisco Trindade. He is engineering director at Meetup, working with the discovery team and the apps team. Before Meetup and his year here, he was at ThoughtWorks for seven years in the UK and Australia, doing exactly this thing as a consultant. He also ran a startup for five years, where he learned that a lot of the things he had done as a consultant didn’t really apply to the business world all the time.

Trindade: Meetup’s a website, You can always check us out afterwards. Meetup has been in existence for 17 years, which is something that surprised me when I joined. Yes, it’s a company that’s 17 years old, but still tries to behave as a startup. The mission for Meetup is getting people to connect in real life. We all perceive and see the trend that real-life communities and real-life connections are dying down. people are meeting less and connecting less. Our mission is really to reverse that trend and try to get people to connect in real life and improve their lives through that connection. We are around 100 engineers, we are mostly in New York, plus a bit in Berlin, and a few people remote across the world.

We were recently acquired by the We company, which WeWork belongs to. With that acquisition, which draws the context of a lot of things that we’re talking about here, Meetup has been trying to get this second wind of a startup growth curve, and really multiply the size of the company and the impact the company has in the world.

What to Change

Gagnon: I want to talk a little bit about the situation that we found when we first got there, and some of the ways that we identified things we’d like to change and how they were affecting the company. The first one that we identified was project teams. We didn’t really look for symptoms of this because they were just there, there’s projects teams. What we found in our experience, and what is really playing out at Meetup as well. This made collaboration really hard because people were coming together for a project, working on it for maybe three months, maybe a year, maybe it was supposed to be three months and it took a year. They would work together, and then they’d go away. It’s really hard to find people that you connect with, you can build a relationship with, and then lose it. There’s also a lot less safety in those cases because each of those project teams had an engineering manager who would be your manager for three months, and then you’d get a different manager. When I got there, one of the people who I manage now had four managers in one year. I think we can imagine just how uncomfortable that really is.

We also didn’t have much context across the organization because the people who built stuff wouldn’t own it – they’d go somewhere else – or the entire team would dissolve. That made it really hard to have long-term ownership and context, and just maintainability.

We also found some legacy code. We found that it was really hard to implement changes. We had high and unpredictable cycle time, some things would go really fast because it was a newer thing, and some things would take forever. We had a lot of custom frameworks that had been built out over the years, both in the front end and the back end. That was causing people to avoid doing certain kinds of work at all. We have a custom request handling framework for our backend for rest requests, and nobody wants to work on that, so we don’t add new endpoints anymore, which was a problem.

Trindade: If I can add to that, I’d just like to create some context. One of the teams that Zoe [Gagnon] was working on last year, it was part of the legacy codebase when we had this legacy UI. You had to put some UI elements on the screen [inaudible 00:08:48] and the team had to learn four languages to add that. It took them, I think, a few weeks to actually add one element.

Gagnon: The total thing took a month and a half to add the stars to three different pages in JavaScript, Java, Scala, Python, including using jink, mustache, handlebars, which we had forked and merged together to cause meetstash, which is a really cute name. The whole thing was was JSPs, and there was a Django server running in Jython in order to do the pre-rendering, so that was pretty complex to work with. You can imagine with that little chain, there’s exactly one person in the company who knows how that works, and a lot of our time was spent waiting for him to be available.

We then found, because of this fact, that those three-month projects took a year, or adding stars to a page took a month and a half. The executives had lost trust in the engineers and didn’t have any confidence in the view of the future. We had a lot of meetings that took a very long time and didn’t result in anything. We figured this was really caused by the lack of process that we had. There was something agile-like but it’s the form without the function of it. There were retros, where people would come together and complain, and then leave. There were stand-ups that could take an hour, as everybody detailed the lines of code they had written the last day.

There wasn’t really a cadence or an understanding of why we would have these types of rituals in our process. That’s because there wasn’t a lot of people who were thinking, why do we do these things, what value are we trying to derive from them, and how can we improve to get more of that value out?

As we mentioned before, we had a lack of long-term ownership from these project teams. From legacy code, as well, we didn’t have a lot of great understanding of that. We had lots of on-call incidents because people were writing code that they didn’t really understand. Those incidents took a very long time to resolve because they didn’t know it, they didn’t write it, and they weren’t allowed to change it, and that caused real problems, long-term.

Finally, underpinning a lot of this was that we had a lack of engineering craft. We had models that had little to do with the actual user experience. You may be familiar with Meetups, and know that you join a group and that group hosts events. You may also be a little bit surprised to know that there is no group or event table in our database. That seems natural to me. We had a lot of browser-based Selenium tests, and they could fail for reasons that people didn’t understand. Even when they failed for reasons you did understand, it didn’t necessarily tell you why it had failed, just that this piece of the page was no longer working.

We had really low predictability, just overall, if the code could work correctly because we have very high coupling. Sometimes you’d change this thing, but that was actually statically scaled to globally available, and this thing over here, you’d get spooky failure at a distance, which is terrible. We also had a case of misapplied or misnamed patterns. Maybe not everybody’s read the Gang of Four, but there’s a proxy pattern that stands in for a database connection. We’ve got those, but they’re called adapter patterns, which are completely different things in that book and in the normal nomenclature. This just led to more confusion. We found, overall, that there was just this lack of craft and deep-diving that was really slowing down the organization.

Process and Practices

Trindade: This is going to be where we start telling about what has happened. What we did as good ex-consultants was, we applied the number one rule of consulting, which is you assess what’s your sphere of influence, and you try to change within that. We’re going to try to frame our conversation now as telling the story of what happened within that context. We’re going to talk about the things that we had control of, what we did with that, and what happened. We’re going to talk about the things that we didn’t have control of, and what happened with that, and how we tried to influence and tried to move out of it. Also we’ll try to talk about how he tried to change that influence that we had within the company. To begin with, the thing Zoe’s [Gagnon] going to describe basically what we had control of, and what we have done about that.

Gagnon: For me, as the manager of a team, I had pretty decent control very direct influence over what is this team going to do. One of the places was just the process, what steps do we follow turning an idea into a software? We started there, and we started with just making stories much smaller so that they could actually be understood. They used to be feature-sized, and that would be two or three weeks on a story. We started with getting devs involved in story writing so that the acceptance criteria actually made sense to them. We ended the up-front architecture and emphasized YAGNI and examples for extractions before really moving on.

We also re-emphasized feedback loops by taking those retros, taking their cadence from once every three weeks to once a week, and emphasizing that you have to walk out with action items, then you have to do the action items, which really is the key. Feedback matters when there’s action on it. We also started doing one-on-ones with everybody on the team, in order to give them individual improvement feedback. We started collecting metrics and discussing how can we move faster, broadly across the team in a very open way. We definitely, the whole time, emphasized as well, this is a system problem, it’s not an individual problem. It’s not something that any of the people on the team caused, but rather the way we were working together in a system. The basic idea there was that a band system will beat a good person every time.

We also decided to focus a lot on the practices underlying it. We did faster feedback for the devs through test-driven development, through pairing, and through instituting trunk-based development so that they could find out very quickly, does it work like you want it to? Is it a good idea to do it like this? And does it work with all of the other things, and can you learn from those other things at the same time? We created faster design feedback by creating iterative architecture rather than imposed architecture, and emphasizing YAGNI, or deferred decisions and saying, let’s just wait until we have more information to make these decisions.

Engineering Lead and Engineering Manager

Trindade: Another thing we did that’s interesting – and Nick [Caldwell] just mentioned it in the keynote – we looked at were the leadership roles within the team. Meetup has these two roles, the engineering lead and the engineering manager, they were the engineering leaders in the team. As Nick [Caldwell] mentioned, if you go to five companies, you get seven answers of what you should do. At Meetup, if you go to five teams, you got seven answers of what they should do. We decided, looking about how these people worked and how this leadership worked within the team, and tried to make some adjustments.

One challenge we had with engineering leads was this idea that engineering leads were very busy. They’re supposed to be technical leads, they’re supposed to be the people leading the team technically, but they were very busy with meetings, coordinating meetings, coordinating retros, coordinating things across teams. What ended up happening a lot of times is that engineering leads become this ivory tower architect. They were people that would give advice on code but really weren’t coding, that would make architecture decisions but didn’t have the context on the ground and didn’t end up facing the decisions they made in the future, so living with them. We really tried to move away the technical leads with the team’s meetings, away from higher-level discussions that they didn’t need to be in, and being more in the code, and helping the team technically.

One thing I set up with all the leads I managed was they had spent half their time coding and half their time hands-on with the team, like the delivery. I was speaking with one of them a few weeks ago about the last year and what has happened, and he mentioned that the biggest and most positive change he had in his career in the last year was that he was able to go back to being technical and go back to actually having technical influence. We had this pattern, the technical leads of the teams don’t want to be technical leads anymore because they couldn’t do technical work by doing that.

Then, the other side, we had the engineering managers. Meetup had historically had this function spread across disciplines. We had web managers, and front-end managers and back-end managers, and they managed people across teams. This wasn’t something that we introduced, but as I joined, and as Zoe [Gagnon] joined we were moving to this pattern of having managers within teams, so managers that could actually help a team across the stack, and not just a particular discipline.

The challenge, because there was like a cultural history, is about this individual versus team mentality. I think a lot of our managers are used to thinking about how to make individuals succeed in their career, and they focused on that, and no one else was focusing on how to make this group of people work together effectively. The things that Zoe [Gagnon] was mentioning about process and how to run a team and how to improve the team, there was no one there responsible for that or accountable for that. We were relying on a lot of engineers through retrospectives to come up with all the changes they needed, but of course it’s unfair and not very effective to expect that people who work in the system are able to step back and figure out why the system is not working. We tried to realign that position and that direction.

Gagnon: Especially when your retros aren’t generating action items, you’re not really doing anything based on them anyway.

Did It Work?

We talked about the challenges that we’ve had, some of the things that we tried to do inside of our sphere of control in order to address them. It’s good to stop and think about this and ask, did it work? Did having this process emphasis, did having this engineering practice emphasis, did reworking these roles to have much more separate and clearly defined boundaries, was that effective? What we found is mostly yes, because it was.

It turns out, this was the easiest part of all of the change that we have driven. From 10 years of consulting we’ve definitely worked with difficult clients. People don’t pay you to help them change if they’re good at change. This was not that. People here wanted to change, and they were very excited and hungry to do it. We were able to have, in this short period of time, a lot of really good results. Teams were collaborating better, teams were producing better, they were more predictable.

This doesn’t mean that everything that we did worked. Particularly, we talked about building the engineering mastery around things like test-driven development, or iterable architecture, like an evolutionary architecture. Those we haven’t had a lot of success with. It turns out that if you want somebody to build up four years worth of experience in something, it takes about four years. One person can’t give eight people those skills in just one year, it’s not going to work. That’s something we’re still going to try on, but we’re going to have to change our approach on because just coming in and saying, here’s how you do the basic thing, here’s how you invert a dependency, go do it, is not successful approach.

Trindade: This is where I think our background backfired on us. I think both me and Zoe [Gagnon] were used to change teams and change organizations from a consulting perspective, where you are there with multiple people, and there’s a critical mass of people influencing on all levels. I think we didn’t realize that that didn’t exist at Meetup. Although there was a resistance to change, I think as Zoe [Gagnon] mentioned, a very positive aspect of the company and this culture is that everyone is happy to try something new to do a better job. At the same time, there was no expertise, and so we always struggled with the fact of, where do we put the limited expertise that we have across the company, in which level do we put it, and how do we influence that? That’s something we’re still trying to figure that out.

We spoke about what we could change, and that was the easy part. We just got there, made those changes, these are things that we didn’t have to consult many people. Within our power of influence and decision, we could actually change and try to make a move. Also, there were things that we started noticing and started dealing with. I’ve been away from our sphere of influence, so things that were in higher levels of the company, or things that were more ingrained in the company’s culture and that needed more collaboration, there were aspects of it that were really hard to move, and we didn’t know exactly what to do about them. What we’re going to talk about a bit about now is what those aspects were, what we tried to do to make a difference, and what ended up happening with that.

Project Teams and Deadlines

Gagnon: One of the first things that we really wanted to address was the projects teams that we were running. For the first six months I was there, I was on a team that was guaranteed to go away because we wanted to launch exactly one feature that had stars in it. Once we figured out how to do that, the project was over. The stars were out. Or, as the case was, we saw that they didn’t work, and we tore them down, which was really nice. This is not something we had control over. Me on a single team or Francisco [Trindade] working with two teams, we couldn’t change the way the company allocated people just because we said so. We started having conversations about this. This is really moving into much softer influence, where we just talked to people. We said to our VPs, to our CTO, to our incoming CEO, “This is a thing that’s hurting us, and here’s how we can fix it.”

We also had mentioned before that the executives had lost trust in the engineers, and they compensated for lack of trust with an imposition of deadlines – really strict deadlines, “This scope must be finished by this date, or there might be consequences.” I wanted to address this because it was really affecting my team directly. I went old school for this, and I took some sticky notes and I stuck them on a structural support column in our office. Each sticky note had a story, it had the points we estimated for that story. We just did them top to bottom and in order of most importance. Then we stuck up right next to it, team’s velocity. We’ve got eight points a week to work with right now, that’s how fast we’re going. I started counting off. Every time I got down to eight points, I’d put another week there. Eight more points, one more week and we could get the whole column full. We brought stakeholders over and we said, “Look, if you want this scope by this day, something’s going to have to give. We can’t do that, that’s not reconcilable.” Sometimes old techniques do have some effect.

Product Engineering

Trindade: Another thing that we faced quite a bit was this divide – for lack of a better word – between product and engineering. We’re both engineers, so we’re talking from our perspective, but also, this wasn’t a one-sided problem. I think it was a systemic problem that was across Meetup. Because of these problems that we’ve been talking about, like the failures in projects and failures in delivery, we got to this point that product just didn’t trust engineering to deliver. They were like, insert scope and tried to get a lot of things in the scope as much as possible, and tried to create the scope without consulting engineering at all. On the other hand, engineering would react by just defining everything as very complex, and everything very hard to do. We had this debate that wouldn’t go anywhere, and we ended up just with the strongest person winning, and deadlines and whatever the consequence of that, which was never great.

We had this broken communication that was both within teams, within product managers and engineers within the team, but also as a company with product leaders and engineering leaders, and business leaders conversing together. What we tried to do here specific on the team was working with the PMs, working with the engineers to align them the same on the scope, and actually reasonable scope conversations, reasonable complexity conversations. Then, also trying to talk to the company and, in leadership meetings, trying to assess this problem of scope versus time, and the quality. There was a point in time during last year that I used to go through every slide that was presented to leadership and just remove dates from it and say we’re not going to put dates, we’re not going to talk about deadlines, and try to have that conversation again and again with people to say why this is a challenge, and why we have to behave differently.

Gagnon: For me, this really pointed out the value of doing a very broad T-shaped skillset. Sometimes we talk about T-shaped engineers, like people who are good at back end and then ok at front end and ok at mobile. For me, I wanted to move beyond that and get good things related to engineering in the past to do design and product management. The fact that I had previously interviewed for and gotten offers for product management roles really helped me pay off here to work with product managers to cross that divide and be “I know what you need because I’ve done this. Let me help you understand what we need also,” and bring that a little closer together.

Lack of Long Term Ownership

Trindade: The last thing that we got, we mentioned before and we hadn’t made any progress in, was this lack of long-term ownership. As Zoe [Gagnon] mentioned, project teams were started and finished based on projects. Imagine the team would come up, build some software, disband. That software still exists, who maintains it? We don’t know. Then, there’s a problem, there’s a pager incident, the people on the call don’t know who to talk to to figure out what’s happening. That was a concern, and of course, that causes stress across engineering, and caused a lot of churn in terms of time and effort to solve it. It was an organizational problem that we didn’t have any solution for. These four challenges that we had, I think were things that we kept discussing and we kept talking about. Not just, of course, between us, but between the others, and the VPs and CTO and engineering and across engineering leadership. We really didn’t know exactly what to do about it until something happened.


Gagnon: Meetup is a company in transformation. One of those kinds of transformation is a reorg, and we got one. Actually, just when we needed it as it turns out, because this really helped the company. We moved to long-term squads working on pieces of the product instead of projects to add things to the product, and longer-term ownership so we could start to look at here’s a team that’s building expertise in this thing. Let’s shuffle the things related to that towards them, and they can be the owners of it. That really helped solved those problems right off. We found a really big effectiveness from our push in the conversations we had, in the relationships we had built. We saw these changes happen to really bring relief for these big organizational problems. The investment that we had made by having conversations, by being persistent had paid off here.

Trindade: I think one big thing was focus. As Zoe [Gagnon] said, as we’re discussing how the team should be shaped and how we should reshape the organization, and as we reorganized the company, we organized the teams with the idea of creating longer-living teams that were focused in one area of the company. With that, of course, that gave more focus that enables us to achieve a lot of things that we couldn’t achieve before. We have teams that actually own codebases and improve codebases in the long term. Iterating our practices, both within engineering, but also engineering product, we are more independent as a team, and more autonomous, and we could actually figure that out better. Across the board reorganizing the teams unlocked a bunch of possibilities that changed the things we could do with our team.

With this reorg happening, the opportunity that we saw was was how can we use that to reshape the influence that we have within the company? How can we, with the reorganization, try to break the boundaries that we had before, and how to we make our teams even more independent, and more able to achieve success? The way we did that was by trying to forcibly separate our team from the rest of the company. Meetup has the 17-year-old legacy code base, and they had multiple generations of it, as we mentioned. We decided to start new projects in this year, 2019, all in new codebases, new infrastructure, new architecture, and everything else.

The reason for that was both organization and technology. In the last year, one of the big problems that we had was, because teams are so on top of each other, there was all these distractions that kept happening. Some team that delivered some code that broke our pages, and then we are on-call for that, so we are distracted. We had conflicts, pipeline problems, because we were all entangled together, so delivering software was much harder. That’s where practically, by separating ourselves, we made that much better. Also, organizationally, it meant that we could make decisions that didn’t impact many people apart from us. That meant that the team could move faster, and the decision making could take more risks because the impact was controlled within our scope. That enabled better practices even in how to work together.

That wasn’t all positive, it was quite controversial within the company. It was something we did intentionally, but not sure if was the right approach without telling many people. Of course, that caused a bit of divide, so there was some miscommunication that happened that caused problems, and there was some challenges of how to deal with the consequences of that throughout the year. While this was definitely a good practice, our engineers love the code they’re working out, their practices they’re doing, and I think the quality is much better, if anyone’s thinking about that, definitely think about the communication that you do across the company when doing something like this.

Gagnon: That’s, honestly, one of the blind spots that we’ve identified. In addition, we had the mastery issue, where we thought this would be really easy based on our previous experience, and it turned out it was really hard, based on our current experience. We also found by going off into our own codebase, building something brand new, our cross-team communication and collaboration really suffered from that. We were building things, and other people also decided to do things outside of the legacy codebase.

We could have learned from each other quite a bit. Instead, we were surprised when they had launched a thing and it’s “We needed that thing.” That was a place where we weren’t doing as great as we could be. Then also, our communication with non-engineering parts of the company, with customer support, with the marketing team, those are areas where our work really does involve them quite a bit, but we’re still not very fluid at that. We’re not very good at that yet.

Results and Lessons

We’ve also had some results. One of the things we’ve seen right off is that we had a massive lift in our employee satisfaction. Our last survey shows us not only did we have a 30% lift from where we used to be, but that it’s now 30% higher than any other team in the company. People definitely are responding to what we’re doing here. We also see the team practices are getting better with real results. Our cycle time per story has dropped from a 10-day average to a 2-day average, which means we can ship things in one-fifth of the time we used to. Our volatility, which is how much we vary from our average velocity has gone from 120% to 20%, so we are a much more predictable team than we used to be as well. Those are places where I think we’ve really paid off. We’re now able to also leverage some of this to bring in new partners and build that critical mass of experience that will help us in the future.

There are some lessons we learned. It’s hard to get experience. It takes a lot of work to build experience, and you can’t just drop in with some words of truth and expect that to be experience. There’s a lot of challenge between having a great person on the ground doing work, or a great person doing leadership activities. You can’t really have them do both. That was, at least in our experience, a really challenging ask.

First, for the takeaways, it works to work within your sphere of influence, and then invest energy in expanding it. Start off with the things you can control, and then put the effort in to move that bubble a little bit bigger. People are not afraid of change, they’re more afraid of change happening to them. If you bring them in and you invite them into the change, you can create a virtual cycle of improvement, where the improvements themselves create more improvements. Then, don’t underestimate skills and experience. Waiting to build it on your team can be very slow. It can help to hire new people in, or to hire on consultancies to help you build that.

Questions and Answers

Participant 1: You had mentioned you had that one person on the team that really knew all the tricks on how to get the code deployed, the stars and stuff. When you reorg’d the teams, was there a challenge to find the right team for that person and still make them effective?

Gagnon: What was the real challenge, this person was not on a team. They were free-floating, going to every team. If you wanted help, you had to get on the list, and convince him that your problem was the most important problem. That’s not something that we’ve resolved, but by moving away from the legacy code base, we have much less dependence on that knowledge of it as well, that historical context.

Participant 2: Do you mind touching a little bit on the upper management side? You talked about how to get from a top-down approach, but what were the challenges that you guys faced with the C-levels, for example, with the managers of the managers?

Trindade: When we joined, a bunch of people were hired with that intent of upscaling engineering. We had some trust because of that, I think. We felt like within the engineering organization, Meetup is a very empowering culture. I never had to justify many of the things that I wanted to do, to my boss, and my boss’s boss, or the CTO, for example. They trusted as a lot. In terms of product and leadership, I think that was a harder conversation. Usually what ended up happening to me in the past in my career but also at Meetup is just having honest conversations.

When I joined literally in the first week, we had this meeting when this project was presented, and it was like a rich platform, like a page. They were saying it’s got to be done in three months, and I went to the general manager of the company and said “I don’t know what’s happening. it’s my first week, but I’m pretty sure this project is going to fail,” and explained why. I said “This is my experience.” Of course, the project continued, I didn’t have the trust then. The project eventually failed, and then I gained his trust afterwards.

I had one manager in the past that gave me this tip, “In order to be successful at your job, you have to risk your job every day, and what you believe in.” Being honest, and just stating what you’re seeing and what you’re thinking, and not trying to have, of course, with care and with kindness, helps you gain the trust.

Gagnon: One of the things also that I leveraged a lot was just doing things and then showing the results, and using the results to drive it up the adoption. I could take three people for two weeks and say, “Let’s do this thing, and let’s measure how it goes.” At the end of it I found, “This had a really big improvement. Look at this, we said it would come out on July 14th, and it came out on July 14th.” Let us do this more broadly, and we’ll get more of these results.

Participant 3: Meetup is a relatively new company. How about the companies that are very old and have very old legacy code, and that code, if you want, is what we ship to our clients, what puts food on the table. What would you do in that case? Because you cannot tell teams, “You’re not going to work on the legacy code, and that’s it,” and moving away from it takes years. How do you convince teams to keep working on that legacy code?

Gagnon: I’ve worked with some financial institutions that have some longevity like that, core business processes are still running on machines that they bought from IBM in the ’90s. What we found there is, you can do a two-part approach to that. The way that people interact with these business processes is nothing like it used to be. We are no longer using dedicated terminal machines when it’s green, and you have to know exactly all of the secret codes to put in because Y means three things on three different screens.

Now we’re using browsers. When we move into new things, browsers and mobile, we get to have a clean break there. Then, you can create adapter layers that are all right, everything in front of this is going to be modern and clean, and everything behind this can be the existing processes. Then, when the existing processes are written using tools that are amenable to it, you can also leverage those to push a little bubble of freshness back. That does take years, it’s true, but it can pay off as you move. Everywhere that that freshness is, it’s so much easier to move, that people want to expand.

The “Working Effectively with Legacy Code” is a great book that details how to do this. It’s going to be pretty hard if you’re still in the COBOL or Fortran ’98 situations, or hard-linked C. A lot of companies are using just Java, but Java one or two. You can still modernize that and get some more freshness in there because the language is going to support a lot of the tools that we have these days.

Trindade: Just adding to that, Meetup is 17 years, any company that lasts longer, you’re going to have multiple generations of code. I think being stuck to the legacy is a mistake, because you’re always going to have to evolve your code, and you’re never going to be able to make a clean break from going from 17 years of code written in some ways. Just clean all that up and build new; as a company, you’re going to have to learn to live with a continuum of code. There’s legacy and the new code that you’re building, and hopefully if you’re doing that well, that gap is evolving with the company. Then, technologies like that allow you to deploy things more independently and more like microservices and building smaller systems allow you to really have multiple generations of code working together, so you can still have productivity on the new things, but also maintaining old legacy when you need to.

Gagnon: Yes. I think it’s a spectrum. Stuff that has to change a lot also affords you the opportunity to make it more changeable.

Participant 4: I have two questions. One, being from more of an engineering individual contributor side, I would say my sphere of influence, those of my colleagues, tends to be more on the actual deployment, how we build code. This is based on Conway’s law, knowing that communication structure tends to be how you deploy, have you seen or do you think it’s possible that how we deploy can eventually influence our communication style and how our teams eventually get structured?

Then, second, our engineering organization is a lot like that, project teams that are formed and then blow up every three or four months. We’re also an incredibly seasonal company. Our business tends to happen right around August, where we start picking up into December. We can’t reorg right now because we would miss our season. With these feature teams or our project teams, we notice that there’s a lot of that forming, storming, norming, performing. Having done this with some of the teams at Meetup, have you seen or have some estimate on approximately how much time does it take for a team of 10 people to really enter that performing phase so that we can start selling it to our managers and to our VPs, that we need to stop destroying our teams, and maybe give us a timeline for when we can actually start performing?

Gagnon: I can talk about the first question because I’ve seen this a lot. I put up on Twitter one time, I was talking about how often I’ve seen companies try to solve their organizational problems through a deployment strategy, which is microservices. A lot of people say “Microservices is going to fix it so that this team and this team are not going to stomp on each other on the code anymore, because it’s two different things.” It turns out, that doesn’t really work in any case where I’ve seen it tried. I’ve seen it a few times tried. Fortunately, as a consultant, you get to see lots of people experiment with it.

The fact of that is your teams are going to have to do things. If those things are deployed or owned by other teams, it’s still going to stomp, even if it’s little, separate deployables. I don’t have a lot of faith in a deployment strategy being able to trickle backwards and influence the incentives, the things that are really driving a team to do the things they’re doing. I would definitely, if I could, in this situation, and in past what I’ve done is just started having lunch with people off of teams in order to start building those bridges and rewriting that Conway a little bit so it’s not toss stuff over the wall, but it’s rather, “Let’s shake hands and let’s design systems that shake hands, too.”

Trindade: In terms of how often it takes to build a team, it depends on so many things. I think it depends a lot of the experience that your team has, it depends a lot on how well your organization knows how to do process and do software, and how stable that thing is. I think at Meetup we have there’s two problems that the organization is still learning how to deliver software at the same time as engineers are quite junior. It slowly takes a long time because a lot of things are in discussion. On the other end of the spectrum working in consulting, where you spin up teams in days, because we all knew how to work together, and you were all quite experienced a lot of people would fill the gaps and make it work. I would probably not think about time, because I don’t know your context, but I think maybe a better way to frame is, what are the metrics or signals that you should see when a team is performing, and try to measure that so you can start to have a baseline of what’s happening and how that’s going.

Gagnon: You can see those ramp up, too. If you say low cycle time is a performing thing and we start off with a high cycle time, as people storm and norm, you can also watch the cycle times start to come down. It’s not like a light switch where it’s all of a sudden ten to two. It’s continuous. You can say, “Look, this is a trend. It hasn’t plateaued yet, and if we break up the team now, we’re just cutting off that trend.” We haven’t even gotten to the good part yet.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

ReactiveConf 2019 – Writing Tests for CSS Is Possible! Don’t Believe the Rumors

MMS Founder
MMS Bruno Couriol

Article originally posted on InfoQ. Visit InfoQ

Gil Tayar, Senior Architect and Developer Relations at Applitools, recently presented at ReactiveConf 2019 in Prague the specific issues behind CSS testing and how they can be addressed through methodology and tooling.

Tayar started with emphasizing a trend which has strengthened over the last 5 years in the front-end development area. Front-end developers write automated tests for their own code themselves. A key reason behind that trend is the confidence that tests provide when adding, removing, or refactoring code. In the recent years, the front-end has slowly built testing methodologies based on unit testing, component and multi-component testing (using for instance JSDOM), and browser-based automation testing (with tools like Cypress or webdriverio).

However, front-developers still often do not know how to write automated tests for CSS, resorting to manual testing, or skipping CSS testing entirely. At the same time, such testing is key to automate the testing of responsive user interfaces. A key testing technique, functional testing, i.e. testing an output produced by feeding inputs in a function under test, is not an option for testing CSS. Taylar asserted that testing CSS is a hard problem, because, it is, by nature visual, not functional. The CSS testing problem can thus be reformulated into a visual testing problem, and Tayar proceeded with a list of methodologies, techniques and tools to address the visual testing problem.

Tayar explained that the dream methodology consists of navigating to a page, taking a screenshot, verifying that the screenshot looks good or conforms to some design system. This is akin to a pattern recognition problem that could be dealt with through machine learning techniques, but as Tayar lamented, machine learning algorithms are not that good (yet). The mitigating strategy suggested by Tayar is to replace visual testing, with visual regression testing.

The following code illustrates the methodology (using Cypress’s cy module):

it ('home page visual test'), () => {
  cy.viewport(1024, 768);

The idea is to start with a validated (generally manually) screenshot and test that what was still is. Differences are automatically detected and the tester manually accepts or rejects the differences. Manual tester intervention is necessary as the testing program cannot know whether the new screenshot is the result of new features, that is, a valid and expected screenshot. The tester thus must manually invalidate the baseline screenshot when that case occurs. Provided that false negatives are infrequent (i.e. program modifications do not systematically result in invalidating previous screenshots), the aforementioned methodology is a progress vs. entirely manual visual testing.

Tayar however mentioned four issues with the previous methodology. First of all, taking a screenshot involves working around the heterogeneity of executing environments. Cypress under Chrome allows developers to take screenshots of a window, a full page, a selector or a region. Selenium or webdriver only natively proposes screenshotting a browser’s window. That first issue may be remedied by changing the tooling when needed and possible, or resorting to existing commercial tools which provide the necessary screenshotting options. Such tools for instance produce a screenshot for a page starting with the current visible window then repetitively scroll down the page, eventually sewing up all the screenshots in one.

Comparing screenshots, the second issue, is hard. The naive approach, pixel by pixel comparison, often does not provide reliable-enough results. What appears to be the same picture to a human will in fact be different image objects for the computer, as the exact pixels being represented may vary according to the graphic card used, or aliasing. Tayar gave the example of the same jpeg picture on Chrome 67 and Chrome 68, with significant differences when compared pixel by pixel. Tayar then gave another example of the same interface on the same machine, displayed twice in the same browser, with a 5 minute interval, and which also presented important pixel-by-pixel differences.

A mitigating strategy to address too-stringent pixel-by-pixel image comparisons is to configure manually an acceptable difference threshold. The difference threshold may account for slight differences in color (typically not perceptible to the human eye) or aliasing. The threshold must thus be tweaked regularly to significantly reduce the amount of false negatives. As before, tools exist to address this issue in a more sophisticated way by applying advanced comparison algorithms which try to look at images the way an human would. Tayar emphasized that these tools are the most significant advance the visual testing field has seen in the last few years. Most of these tools have free and OSS plans, and as such can be used by developers in a wide range of project contexts.

The third issue is related to managing the screenshot comparisons. As mentioned before, visual regression testing includes a manual part in which developers invalidate a past screenshot. This manual handling can become cumbersome when there are hundreds of comparisons to review. Tayar provided three mitigating strategies to alleviate the issue. One strategy consists of invalidating a large series of screenshots through the command line (with Cypress this would be npm run cypress:run -- --env updateSnapshots=true). A second strategy consists of going through the directories where the snapshots are stored, and replacing the current snapshots by the new snapshots, where needed, thus removing the false negatives. The third strategy involves using commercial tools which often include a dashboard to speed up the manual invalidation, with configurable levels of granularity.

The fourth issue originates from the need to test against all responsive widths (like 1024 x 768, iPhone, iPad), pixel densities (Retina Desktop), and browsers. There again three ways to tackle the issue co-exist. The first obvious solution is to run the same visual test multiple times for each width/density/browser configuration. The second solution improves on the first one by parallelizing the tests. While this may require extra infrastructure, a lot of companies use that technique. The last solution again consists of outsourcing the testing to commercial cloud testing services providers.

Tayar concluded the talk by running the audience through a live demo of visual regression testing, illustrating some of the solutions to the four previously described issues.

ReactiveConf is a yearly conference targeted at developers with talks addressing the latest technologies and trends in software development. ReactiveConf 2019 took place from Oct. 30 until Nov. 1, 2019, and is the fifth installment of ReactiveConf.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Faceook Releases Relay Version 7

MMS Founder
MMS Dylan Schiemann

Article originally posted on InfoQ. Visit InfoQ

Facebook Releases Relay Version 7

Relay, a JavaScript framework for building data-driven React applications with GraphQL, recently released version 7 with improvements to error handling and Relay hooks.

A new directive, @DEPRECATED__relay_ignore_unused_variables_error, was added in Relay version 7 to suppress errors after migration from GraphQL NoUnusedVariablesRule to RelayIRTransform validation. This directive temporarily suppresses errors that would not have appeared previously, allowing development teams to fix issues incrementally with a Relay upgrade.

The Relay team improved a few other features. The constraint for a @refetchable directive on a fragment no longer enforces that the argument for the node field is called id, just that it is an ID type. Developers can also now select the __id field wherever __typename can get selected to fetch the internal cache key for an entity to update records without an id.

Beyond several other bug fixes, many experimental features are available with Relay 7. Improvements to the approach of using Relay Hooks receives several improvements, including performance improvements for useFragment, correct disposition of ongoing requests with useQuery, and not suspending indefinitely when the server does not return all requested data with useQuery.

A full list of updates and breaking changes may be found in the Relay 7 release notes.

Relay is a JavaScript framework built by Facebook for applications using GraphQL. Relay provides a bridge between React and GraphQL. With Relay, React components can specify what data they need and get it, allowing components to get composed while the data needs of the app get localized per component. Relay provides static queries and ahead-of-time code generation.

Relay is open source software available under the MIT license. Contributions and feedback are encouraged via the Relay GitHub project and should follow the Relay contribution guidelines.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

ReactiveConf 2019 – Backpressure: Resistance Is Not Futile

MMS Founder
MMS Bruno Couriol

Article originally posted on InfoQ. Visit InfoQ

Jay Phelps, RxJS core team member, recently presented at ReactiveConf 2019 in Prague what backpressure really is, when it happens, and the strategies which can be applied to deal with it.

To illustrate what backpressure is, Phelps gave the example of a hard drive which can read data at 150 MB/s but can write data only at 100 MB/s. This leaves a deficit of 50MB every second. A 6GB file is read at full speed in 40s and, if immediately written to another section of the disk, will lead to a 2GB piece of memory used to keep track of the data that was read but could not be written immediately. That memory may or may not be available. This is a typical backpressure problem, caused by a producer (the reading process) which is faster than a consumer (the writing process).

Phelps defined back pressure in a software context as a resistance or force opposing the desired flow of data. In the previous case, the mentioned resistance comes from the mitigating strategy of reading data only as fast as data can be written, thus limiting (resisting) the producer speed. In most I/O libraries (like Node.js streams), backpressure mechanisms operate under the hood, and the library user needs not concern himself with it.

Backpressure may also occur in the context of UI rendering. The producer here can be the user producing keyboard inputs, which may lead to some computations ultimately resulting in UI updates. When the stream of UI updates is slower than the stream of keyboard inputs, backpressure may be necessary. This is often achieved by throttling/debouncing keyboard input.

As Phelps mentioned in an example, updating the DOM faster than the screen updates (60fps) may be wasteful. The window.requestAnimationFrame method may help synchronizing DOM updates with the screen update cycle.

After the introductory examples, Phelps summarized the four main strategies to handle backpressure. The first strategy is to scale up resources so the consumer has the same or higher speed than the producer. The second strategy, as previously illustrated is to control the producer. In that strategy, the consumer decides at which speed the producer sends data. Phelps provided a quick example in which the consumer pulls data from the producer instead of being pushed data:

const source = connectToSource();

source.pull(response1 => {

  source,pull(response2 => {

The next strategy, which was used in the aforementioned hard drive example, is buffering, i.e. accumulating incoming data spikes temporarily. Such lossless backpressure strategy may however lead to unbounded buffers, or buffers outgrowing the available memory.

Another strategy consists in allowing data loss by dropping part of the incoming data. Sampling/throttling/debouncing strategies adjust the incoming flow of data to the consumer by disregarding excess data.

Phlelps subsequently narrowed in on available libraries which may help implement backpressure strategies. Full-fledged push-based streams libraries (like Rxjs) may be used to implement lossy backpressure (Rxjs’s throttle, debounce, audit, or sample methods). However, push-based streams do not natively allow control of the producer. That means lossless backpressure requires potentially unbounded buffering. Pull-based streams (like Node.js streams, Web Streams, asynchronous iterators – cf. Repeater.js, IxJS – the pull-based version of Rxjs) may allow to control or pause the producer. Note that asynchronous iterators are inherently push-pull mechanisms, as they pull a request for data, data which when computed is pushed to the consumer.

Push and pull solutions are not exclusive and may be used together in the same application. This commonly happens server-side, but it is also true on the client. Rxjs can for instance be used, in some context, in conjunction with iterators to provide lossless backpressure without resorting to unbounded buffers.

Which strategy to adopt will depend on the problem at hand, e.g. whether it is possible, or makes sense, in a specific context to control the producer. In the case of user inputs, it is for instance not possible to scale up resources, nor to control user behaviour, while lossy backpressure strategies often help. It is beneficial to developers to understand the backpressure concept and the strategies to address it to deliver an optimal user experience.

ReactiveConf is a yearly conference targeted at developers with talks addressing the latest technologies and trends in software development. ReactiveConf 2019 took place from Oct. 30 until Nov. 1, 2019, and is the fifth installment of ReactiveConf.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

The Robot Operating System (ROS) Can Make Hospitals Smarter

MMS Founder
MMS Roland Meertens

Article originally posted on InfoQ. Visit InfoQ

The ROSCon 2019 conference kicked off with a keynote from Selina Seah from Changi General Hospital and Morgan Quigley from Open Robotics. In their talk, they outlined the need for robotics and automation in hospitals. To support robotics, the Open Robotics foundation works actively to create tools to support multiple robotics platforms, fleets working together, and tools for QA and simulation.

Currently, and in the future, there will be multiple challenges in healthcare: there is an aging population, a shrinking workforce due to this aging population, and a rising healthcare cost due to people expecting more of their healthcare. This makes the market for automation and assistance in elderly care potentially very large, as it is a skilled trade that requires a long training time (a nurse spends 4 years in school, and two years on the job, before considered skilled enough). There are challenges in all areas of healthcare, from the walk-in of patients, consulting and treating them, to observing them in both the ward and at home.

To solve the health care problem we have to look at multiple facets. One of them is connecting current systems to robots and medical devices. Another facet relates to the data analytics and intelligence for clinical decision support. Smart facilities support this by creating more data, and the analysis of this data for insight into the health of patients. A big challenge is automating parts of healthcare with robots. One goal is to individualize patient care to provide better support. Seah noted that no healthcare workers will have to be replaced, as we desperately need them. The Centre for Healthcare Assistive and Robotics Technology approaches these challenges in Singapore.

Robotics would allow healthcare workers to increase the amount of clinical work they perform, and will reduce the time spent on dull tasks such as data processing and bringing things (such as drinks) to patients. Tasks that can be automated are packaging medication, ad-hoc delivery of items, and smart actuated beds. Currently used robots are surgery robots, the pepper humanoid robot, which entertains people (and, according to Seah, receives even more attention from patients than nurses: they think the robot is a lot of fun), and gait assessment devices.

In terms of deployment of robotics, having robot fleets from multiple vendors is a big challenge. For different applications, there are different vendors that will be operating in the same space e.g. a meal delivery robot has to go to the same locations as the goods delivery robots. At the moment new companies create useful delivery robots yearly, which ideally a hospital wants to add to their existing fleets in a building.

Every vendor comes with its own traffic editor. In this editor, the hospital has to indicate where robots can drive, and where supporting infrastructure is (e.g. elevators or automatic doors). Open Robotics is creating an open-source traffic editor to make platform-agnostic robot traffic zones, traffic lanes for these robots with an indicated direction of travel, no-go zones, the location of automatic doors, the elevator topology, and the location of walls (which is useful for simulation). The traffic editor will be available later at

Fleets from a single vendor can already achieve impressive densities that work well. Robots from the same fleet already keep a large space between them and will queue nicely to go into specific rooms, such as a kitchen. However, multi-vendor fleets do not communicate, which can cause robot traffic jams in elevator lobbies and doorways. Robots will stop and stare at each other until the hospital staff shuts one of them down and moves it out of the way.

Morgan showed that most robotic throughput potential can be reached if the fleet manager API can set paths for robot waypoint control. However, many robotics providers do not have an existing fleet manager, and only support simple commands to start and stop the robots. Open Robotics is working on a multi-fleet federated integration system called FreeFleet. The user can dispatch a robot to a location, and a smart fleet adapter which will then talk to the vendor-specific fleet managers. 

At the moment it takes three years to connect and hook up all interfaces for one fleet of robots when a new platform is deployed. The goal would be to scale automation with a robotics middleware framework. This would connect multi-fleet deployments, and would create packages that are reusable between platforms. Therefore the Open Robotics foundation is creating ROS health. This platform provides secure messages between existing applications, has packages with user interfaces, can handle traffic, has a command-line interface, and contains path planning that can be shared between multiple applications. The benefit is the interoperability of fleets, optimization, and scalability. This package is called the System of Systems Synthesiser (SOSS).

ROS Health includes two types of user interfaces. One is a streaming interface that shows the location of robots from multiple robot fleets. They create this using web sockets. A second user interface is an interface for mobile devices, where nurses can get quick access to data they need for treating patients, and which allows them to give commands to the robots.

Most existing hospital applications are using the HL7 protocol. Here Open Robotics created a layer that can convert messages in this protocol to ROS messages. This means that ROS2 can connect to existing hospital applications using the DDS protocol.

To learn more about ROS for your health supporting robotics, look at the SOSS, HL7 bridge, and RMF scheduling.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Developing Cultural Sensitivity in Working with Other Cultures

MMS Founder
MMS Ben Linders

Article originally posted on InfoQ. Visit InfoQ

Cultural differences can be a challenge in an international workplace, but at the same time cultural diversity can also be fascinating, said Rachel Smets. At Positive Psychology in Practice 2019 she suggested we prepare ourselves when working with other cultures or moving abroad, and develop our cultural sensitivity by learning about new cultures as much as we can.

Smets stated that among the people she interviewed, including many digital nomads like herself, the one thing they all mentioned as a key ingredient to a successful move was “prepare well”. She suggested to do research in advance:

Depending on your current situation, your research will be based on your goals, but let’s assume you’re a digital nomad and able to work from anywhere; you need to decide about your new destination, your house or other accommodation to rent or buy, the family that will join you or stay behind, your finances, the new language, but also, the required visa and paperwork. Adding to these basics, there are topics like pets, schools, vaccinations, insurance, health care etc…

We can learn about a new culture by doing research and reading about their habits, foods, festivals, language, history, etc. Smets mentioned that it doesn’t have to be complicated; just put in some effort to inform yourself:

As a digital worker, doing business with many different cultures and showing your knowledge about certain local culture customs will demonstrate an interest in their culture, which can be very advantageous in business.

Positive psychology can be applied to help people adjust to or deal with a culture that’s new to them; it is really the answer and solution in everything here, said Smets. The science is all about happiness, optimism, well-being, satisfaction, flow, and so on, and in the pursuit of “good feeling” this also relates to your interpersonal relations with other cultures.

How are you adjusting to others? How are you communicating with others? Maybe they use a different language, but there’s much more to communication than that, said Smets. In the Netherlands people are very direct, blunt and may come across as “rude” to those from another country where language is indirect, and the message must be understood through cues like body language, facial expressions, or eye contact.

Smets gave the example of how some Asian people say “yes”, which can have five different meanings: “yes”, “maybe,” “I’ll try,” “no,” and “I’m not sure.” She mentioned her video on deadlines across cultures, which provides examples and solutions to communicating across cultures.

Positive psychology can also be related to the “Golden Rule”, as Smets explained:

We’re used to the golden rule, as in “Treat others like you would like to be treated,” however, when it comes to cultural differences, the Golden Rule that is most effective is “Treat others the way THEY want to be treated”. In essence, adjust to them, and harmony will be created.

“Working abroad has been a great experience in every country I’ve lived,” said Smets. She mentioned that a challenge is always paperwork, such as all the tax systems, getting the utility contract done, work contracts, a visa if needed, internet at home, registrations in the new country, and the administration part is always a challenge, in some countries it being faster than others. Patience and persistence is the key solution here, Smets said.

Rachel Smets, author of Living Abroad Successfully, presented on how to take the “shock” out of culture shock at Positive Psychology in Practice 2019. InfoQ held an interview with her.

InfoQ: In your talk, you mentioned that cultural diversity can actually be fascinating. Can you give some examples?

Rachel Smets: Absolutely, I love observing other cultures; the different habits, or how people dress differently, eat different foods, and eat at different meal times. In the Netherlands people tend to eat dinner around 6pm, whereas in Spain, dinner is between 8.30 and 10.30pm.

When doing business with other cultures, greeting is the first thing you do; that’s why it’s important to greet in the correct manner and avoid an awkward moment. If you’re meeting face-to-face, make sure you observe people around you; do they shake hands, bow, hug, kiss? Then, copy them.

In business, the most common international way to greet is to shake hands, but in Asian countries this can differ; this is why observation and researching ahead of time (a quick Google search) can help you greatly.

If you are doing business online, I recommend looking up a few local words in order to say “Hello” in the local language of your business partner, colleague or client. You will notice how much they appreciate your effort.

InfoQ: What challenges did you have to deal with when traveling abroad?

Smets: Traveling abroad brings challenges as well as opportunities. When I travel to different countries, I’m on my own, so the usual challenges are having to carry everything alone, and when I need to use the toilet, I have to drag my hand luggage, trolley and jacquet with me. I have learned to travel light and be practical with what I carry.

Traveling solo, I receive two very frequent questions: about safety and loneliness. Safety is solved by picking destinations, accommodations and locations that are safe (of course, one never knows 100%). Loneliness is a frequent issue for many people, but I focus on all the benefits of being solo, such as meeting people easier, doing what I want, when I want and where I want. Here’s a video that says it all: the best reasons for traveling solo.

InfoQ: How can people work on their cultural sensitivity and get better at understanding cultural differences and working with people from different cultures?

Smets: According to research and studies from experts like Hofstede, Hall and Trompenaars, there are mainly two categories: relationship-oriented cultures and task-oriented cultures.

The former are based on the relationship or bond between different employees and business members. The goals and tasks are accomplished through relationships. You spend time building that relationship before you dive into the business. Japan, African and the Middle East are examples of relationship-oriented cultures.

Task-oriented cultures on the other hand are more focused on the information and technology needed to achieve maximum productivity. For example, the United States is a typical information-oriented culture, whereby business can be completed without knowing much about your business counterpart.

Also, talk to someone who’s familiar with living and moving abroad; you will learn a LOT.

If you’re working with other cultures, I recommend you take a course or workshop on cultural sensitivity. I usually spend half a day explaining basic culture dimensions giving practical examples in specific countries. This is enough to create cultural awareness, become open to the new, and certainly take the SHOCK out of the culture shock.

To summarize being culturally sensitive: don’t deny differences but rather accept them, recognize them, and cherish them.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

High-Performance Data Processing with Spring Cloud Data Flow and Geode

MMS Founder
MMS Srini Penchikala

Article originally posted on InfoQ. Visit InfoQ

Cahlen Humphreys and Tiffany Chang spoke recently at the SpringOne Platform 2019 Conference about data processing with Spring Cloud Data Flow, Pivotal Cloud Cache and Apache Geode frameworks.

Humphreys talked about the difference between Spring Cloud Data Flow and Spring Cloud Stream frameworks. Spring Cloud Stream is the programming model used to develop applications to deploy with Spring Cloud Data Flow. It’s easier to switch middleware components (Binders) when using Spring Cloud Stream without having to rewrite the application code.

He talked about what type of projects are good candidates to use Geode. If you have large volumes of data and need high throughput with low latency, then Geode may be a good choice for data processing. Apache Geode, which was open sourced in 2017, provides a database-like consistency model, reliable transaction processing and shared-nothing architecture.

Chang discussed horizontal scalability and how to configure the Geode cluster with Locators and Servers using Gemfire shell (gfsh) tool. For high availability, you should have at least three Locators configured. Geode data store cluster is scalable independent of the application scaling needs.

Geode supports fault tolerance using partitioned and replicated regions. The region is the core building block of Apache Geode cluster and all cached data is organized into data regions. The regions are part of the Cache which is the entry point to Geode data management.

For developers using SpringBoot, Geode offers several annotations out of the box to leverage the data caching, including @ClientCacheApplication, @Region, @EnableEntityDefinedRegions, @EnableGemfireRepositories, and @EnablePdx.

She also showed a demo application with a data pipeline using Apache Kafka, Geode, Prometheus, and Grafana technologies. The demo app uses a local cluster with minikube to run on a Kubernetes cluster, and deploys a pipeline that extracts data from a file source and enriches the payload with data from Geode. The app, which is based on SpringBoot and Spring Geode Starter, also uses Micrometer to grab throughput and count metrics to send to the metrics server. The data pipeline architecture includes a Source, Processor and a Sink. The sample data pipeline uses Spring Cloud Stream which allows to easily switch between different messaging infrastructures like RabbitMQ or Kafka.

Chang showed some sample metrics from Geode versus a relational database like Postgres.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

ReactiveConf 2019 – Indecisions Are Not All Bad

MMS Founder
MMS Bruno Couriol

Article originally posted on InfoQ. Visit InfoQ

Boris Litvinsky, Tech Lead at Wix, recently presented in a talk at ReactiveConf 2019 in Prague why he thinks deferring decisions taken in the software development process can result in a better codebase. He also discussed some design and coding practices which support delaying or reversing decisions.

Litvinsky started with describing how, on a greenfield React-based project at Wix, his team struggled 3 weeks over which state management solution (Redux or Mobx) to use for the project. The three weeks were spent doing some internal debate, followed by individual research, chained with interviews of React experts at Wix, ending in a decision matrix. The understanding was that this decision was so critical that it had to be taken there and then, and if they would pick the wrong library, reversing the decision would be impossible. It turned out that no other React-based projects in production in the company at the time were actually using any state management library.

Litvinsky reminisced that they thought that they had to choose between a right or wrong decision, while there was actually a third option: take no decision. This comes from the fact that there may not be enough knowledge at the time a problem is analyzed to actually make any kind of informed decision. Specially in the cases of decisions with high impact on the project, Litvinsky suggested that postponing a decision till enough information is gathered about the problem at hand (and the possible solutions) is the best option.

After this introduction, in the first of the four section of the talks, Litvinsky then explained in details the rationale behind decision postponement. Litvinsky thus quoted Martin Fowler’s definition of architecture:

Architecture is the decisions you wish you could get right early in a project.

Early decisions have a prominent impact as they are much harder to revert and are the foundations for the whole application. Yet, developers tend to begin a project or product with a large series of questions: monolith vs. and micro-services, such as which front-end UI framework to adopt, which database(s) to adopt for which service(s), Sass or LESS and more. Not all those decisions are architectural decisions which should be made early on. Litvinsky lamented that, in absence of proper information and an holistic understanding of the context, early decisions are often biased by previous experiences with specific technologies, hype, or ego. Instead, the wisdom expressed in the YAGNI principle should lead developers to make decisions only when it is necessary to make them, enough information is available about the problem the decision is solving, and the envisaged solutions are actually effectively solving that problem.

Litvinsky justified the benefits of delaying decisions about technologies by reviewing with the audience the Hype Cycle, a concept popularized by Gartner. According to that concept, a technology evolves over time through a series of distinct phases: innovation trigger, peak of inflated expectations, trough of disillusionment, slope of enlightenment, eventually reaching the plateau of productivity. As technologies go through the cycle and mature, the use cases that they excel at addressing become clear. This means that the longer a decision related to a specific technology is delayed, the more certainty there is that the technology will be picked for the right reason.

As an example, Facebook’s CEO Mark Zuckerberg declared in 2012 that betting on HTML5 was a mistake, as opposed to native mobile application development, for a series of reasons which are easy to understand in retrospect. Last year, AirBnB explained how, after getting a better understanding of their use cases, requirements and the limitations of React Native, they decided to move off of React Native by 2019 in favor of native mobile development tools.

Another argument for delaying some decisions is that architecture is there to support requirements. Delaying to allow requirements to stabilize gives an increased confidence that the architecture is optimizing for the right thing.

For all these reasons, Litvinsky argued that no decision is sometimes the optimal decision. In the second section of the talk, Litvinsky discussed four concrete things that can be deferred.

First are estimations. Early estimations, before a proper understanding of the problem and the technical domain are often unrealistic and may be best deferred. Developers should also avoid installing any package before the need arise and is validated, and it is assessed that the need for the package does not actually arise from a design issue.

Litvinsky mentioned creating abstractions as the third activity which should not be rushed, and quoted the Rule of Three. The rule appeared in the 2004 edition of Facts and Fallacies of Software Engineering which stated:

There are two “rules of three” in [software] reuse:

  • It is three times as difficult to build reusable components as single use components, and
  • a reusable component should be tried out in three different applications before it will be sufficiently general to accept into a reuse library.

The rule of three applied to component decomposition in component-based UI frameworks thus recommends isolating a reusable component when there is at least three different possible uses of that component.

Similarly, the decision to break a monolithic application into microservices can often be deferred. Microservices have clear advantages, specifically when an application is experiencing scalability issues but come together with an increased complexity and organizational and testing overhead. Litvinsky advised waiting for the domain and organizational boundaries to emerge before considering that option. This is in line with the advice given by Jan de Vries, Microsoft MVP on Azure in his talk at MicroXchg Berlin in which he argued that a properly built monolith in many cases is superior to a microservices-based system.

The third section of the talk introduced real options as part of a thinking framework which allows its practitioners to decide when to defer a decision. Real options in the context of software engineering has been introduced by Chris Matts and Olav Maassen. Real options are the right, but not the obligation, to take some action prior to some expiration date. Real options have a price, which is the cost of the flexibility provided by the option. They also have a value, which is the benefit provided by delaying a decision. A typical example is a booking cancelling option which gives the right to the buyer to cancel its booking up to one-day before arrival.

A presentation at Real Options Agile Tour Brussels presented the expected benefits of options:

A good architecture creates option for your team, your organisation and your customer. Creating and maintaining the options, is continuous, daily work in small steps. Otherwise you create legacy systems that contain fewer and fewer options.

In a software development context, experimentation systems which use feature toggles allow product owners to postpone or reverse the decision of when a certain feature or code will be exposed to all users.

In the last section of the talk, Litvinsky focused on techniques for code to support deferring decisions. The first technique mentioned is Test-Driven Development, which is both a design and testing technique. TDD allows developers to delay two decisions: whether or not the functionality under test will have dependencies, and delaying the implementation of those dependencies. Spikes can be used to take an actual problem, create a naive and fast solution to explore the problem, then decide how to continue. Lastly, the modular monolith pattern may allow to delay, till it becomes necessary, the decision to move towards a microservices architecture.

As mentioned previously, deferring decisions has a value, but also a cost. Postponing too long may eliminate alternatives which may have been optimal. It may also affect negatively other teams and making things easy to change means necessarily adding more complexity. Litvinsky thus concluded that developers should never commit early unless there is a good reason, strive to create options that will allow to defer or reverse decision, while constantly evaluating the cost of the deferral.

ReactiveConf is a yearly conference targeted at developers with talks addressing the latest technologies and trends in software development. ReactiveConf 2019 took place from Oct. 30 until Nov. 1, 2019, and is the fifth installment of ReactiveConf.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.