Month: August 2022
MMS • Johan Janssen
Article originally posted on InfoQ. Visit InfoQ
Just over two years since it was introduced to the Java community, Spring Authorization Server 1.0 is planned for a GA release in November 2022. The Spring Authorization Server project replaces the Spring Security OAuth project that has already been declared as end-of-life. The project is led by the Spring Security team and delivers support for OAuth 2.1 Authorization Server for Spring applications.
The project is based on Spring Security 6.0 which depends on Spring Framework 6.0 and requires at least Java 17 and Tomcat 10 or Jetty 11. The public APIs and the configuration are still being improved, which will result in breaking changes for consuming applications.
GitHub’s Milestones display the various upcoming milestone releases and release candidates leading to the release of Spring Authorization Server 1.0. Additionally, Spring Authorization Server 0.4.0 will be released based on Spring Security 5.x and Java 8.
First introduced ten years ago, Spring Security OAuth evolved into a popular project supporting a large portion of the OAuth specification. It was the basis for OAuth solutions in various projects, both for the consumer and provider side, such as the CloudFoundry User Account and Authentication (UAA). Both OAuth 1.0 and 2.0 were supported, while 1.0 is obsolete by now. Unfortunately the implementation didn’t support some user scenarios and a large part of the implementation was written by the Spring team.
Written from scratch solely for OAuth 2.0, Spring Authorization Server is based on the Nimbus library, supporting more features such as JSON Web Token (JWT) claims, OpenID Connect (OIDC) and reactive programming.
VMWare Tanzu offers both Open Source Software Support and Commercial Support for Spring Authorization Server.
The Spring project welcomes contributions and recommends reading the contributing documentation for Spring Authorization Server.
New Stanford Compute-In-Memory Chip Promises to Bring Efficient AI to Low-Power Devices
MMS • Sergio De Simone
Article originally posted on InfoQ. Visit InfoQ
In a paper recently published in Nature, Stanford researchers presented a new compute-in-memory (CIM) chip using resistive random-access memory (RRAM) that promises to bring energy efficient AI capabilities to edge devices.
The fundamental idea behind the new chip is eliminating the separation between the compute and memory units, thus enabling AI processing within the memory itself. This makes it possible to eliminate data transfers between memory and CPU and increase the amount of predictive work that can be carried through with limited battery power.
Dubbed NeuRRAM, the Stanford chip stores model weights in a dense, analogue non-volatile RRAM device. This approach is not new, say the researchers, but such devices must still prove they meet their promise.
Although efficiency, versatility and accuracy are all indispensable for broad adoption of the technology, the inter-related trade-offs among them cannot be addressed by isolated improvements on any single abstraction level of the design.
To address those conflicting requirements, NeuRRAM provides high versatility in reconfiguring its cores to better adapt to diverse model architectures, according to the researchers. Additionally, energy efficiency is twice that of previous RRAM CIM chips with accuracy comparable to existing software models across various AI tasks:
We report fully hardware-measured inference results for a range of AI tasks including image classifications using CIFAR-10 and MNIST datasets, Google speech command recognition and MNIST image recovery, implemented with diverse AI models including convolutional neural networks (CNNs), long short-term memory (LSTM) and probabilistic graphical models.
Admittedly, those are not the largest models in use nowadays, but it should not be overlooked that NeuRRAM is addressing applications running on the Edge or on low-power devices. Typical applications for such devices are wake-word detection for voice-enabled devices and human detection for security cameras. In the future, though, with the further evolution of this technology, “it could power real-time video analytics combined with speech recognition all within a tiny device”, says Weier Wan, first author of the paper.
The NeuRRAM chip is currently fabricated using a 130-nm CMOS technology and Stanford researchers are at work to understand how they can scale this technology while keeping it energy efficient and reducing latency. Their hope is that scaling the current design from 130-nm to 7-nm technology could deliver higher energy and area efficiency than today’s state-of-the-art edge inference accelerators.
MMS • Kate Wardin
Article originally posted on InfoQ. Visit InfoQ
Transcript
Wardin: My name is Kate Wardin. I’ll be talking about onboarding to a new team during a global pandemic: a houseplant’s story. I want to start by telling you a story of Sig the Fig. I purchased this plant in 2019 before the pandemic hit. The greenhouse employee as we were shopping warned me several times that this was not a great fit, a great starter plant for someone like me who is new to plant ownership. In fact, they referred to figs as the Divas of the plant world. They are high maintenance and require a bit of knowledge and care to keep healthy. I was confident in my abilities and from the tips I was reading from my fellow plant mom bloggers that I was up for the challenge. Shortly after we brought fig home, I actually ended up leaving for a couple weeks in December. I was devastated when I discovered that Sig didn’t make it. It turns out the combination of sitting near a drafty window in the harsh Minnesota winter mixed with no water for us almost a month was not the loving welcome that I had intended.
A Better Experience: The Monstera Deliciosa
Let’s talk about a better experience, Miss Swiss, my pride and joy. After Sig passed away, I decided to take a couple months off plant parenting. The pandemic hit, of course, at that point, and like several others across the world, I needed a new hobby that didn’t require leaving my house. After doing a bunch of research on what types of plants were easy to take care of, especially in my climate, I carefully selected the Swiss cheese plant. I’m happy to say that after one and a half years, she’s still thriving. I learned a lot from the experience acclimating Miss Swiss into a new home. In 2020, like many others, I found myself working remotely for the first time. This experience brought many new challenges and also silver linings. For one, I was able to nurture that plant and also welcome several other plants into our home. During this experience, I had several opportunities to also hire and onboard several engineers. Also, I onboarded myself to a new organization. In reflecting back to Sig and Miss Swiss, I started really just contrasting those experiences, and also ways that I could translate those learnings to the experience of onboarding to a remote team.
Tip 1: Prepare the Onboarding Experience in Advance (Pre-Boarding)
I’m going to walk through seven tips for you to onboard a person to your remote team, along with hopefully a helpful tip with each for acclimating a house plant to its new home. Number one is to prepare that plant’s environment before bringing it home. How much sun does it need? Do you have the right pots? Do you have the right place location? Are there any drafty windows or doors to avoid? Don’t just bring it home and plop it in the most convenient spot, do that research on that plant and the type of environment that it prefers. On the human side, of course, preparing the onboarding experience well before that person starts working, aka pre-boarding. As soon as the offer letter is accepted, you should continue validating that person’s decision to join your team and make them feel really welcome leading up to their first day.
One good idea would be to send an email about two weeks before their first day, including an itinerary for their first week so that they know what to expect. Ideally, setting up meetings already on their calendar so that they have a little bit of structure that first week. If possible, mailing a box of customized swag with their laptop and equipment, ideally early enough so that it arrives before day one, along with login details for email and how to connect with the team. Then a thoughtful reminder letter of why they were hired and why you are excited for them to join. Also, in that email, a request for them to send a quick bio so that you can welcome them to the team properly on their first day. Then perhaps also, a fun video of team members saying welcome just to show again that excitement of them joining.
How about preparing with the team? What can we do to make sure that the fellow team members are ready for that person to onboard? People just want to usually start coding right away, and so this means trying to speed up the amount of time it takes to onboard new team members. Like I said, let’s mail them that laptop ideally before their first day with instructions for getting set up. As a team, you can also take this as an opportunity to do an audit of your documentation. Everything from, do we have the right access request? To, do we have our release process documented? Do we have local environment? Can we walk through how to set that up in a nice documentation? Take this opportunity again to audit the things in place and make it better before that person joins.
I know a lot of teams who save specific tasks for like onboarding tasks. Perhaps those things that are not necessarily in the critical path, or super urgent to get done that you could save for that new hire, such as bug fixes, small enhancements, maybe adding tests or upgrading a library. Again, it’s just really help that person to build confidence, get familiar with the code base quickly, and also understand that full developer workflow. One thing that could work really well is to create a really repeatable onboarding task that embodies actually a mini-version of your app, or maybe a real feature that they can contribute, that, again, your team would create, so that they can get to know that full developer workflow.
Tip 2: Introducing the Person to Mentors and Key Partners
Number two is to introduce that plant to its housemates. I have a 1-year-old, so I’m teaching her, we don’t pull on the leaves of Miss Swiss, or to our dog, Ralph, that it’s unacceptable to use Miss Swiss as a fire hydrant. On the people side, introducing that person to key mentors and partners. Being a remote employee can be super isolating, and so it’s important that we help them build relationships immediately. I’ll talk now about this onboarding trio, which includes, of course, the new hire, a peer buddy, and a technical mentor. This peer buddy is the person who is really going to be accountable for that person’s onboarding experience. Perhaps maybe they were the person who recently onboarded the most recent, so that they’re really an expert in what’s that experience of joining this team. What did they wish were different from when they onboarded? They’re the go-to person for any question. They’re going to block out time specifically to touch base whenever needed for the first couple months.
Then the technical mentor. A mentor is someone on that same engineering team who is going to work with the same technology as the new hire. This person is going to be more hands-on with them to perhaps help with specific technical onboarding, as opposed to that peer buddy who’s going to maybe focus more on the cultural onboarding. That technical mentor is going to be, again, maybe a more seasoned engineer to help them with perhaps their first more complex feature that they pick up. This person is knowledgeable about the system, processes, domain, has also demonstrated that they can give constructive feedback and also explain complex concepts. The last thing we can do just as a team to really speed up that get to know you process is to perhaps record an interview with that new person, and then share it with the team so that they can all get to know some of those introductory questions about that person.
Tip 3: Introducing the Person to the Team’s Ways of Working
Number three is introducing that plant to its new environment, or introducing the person to the team’s ways of working. If we think about some of those things that in-person might be easier to pick up on, let’s make sure that we explicitly communicate and write out these types of things so that the person isn’t left guessing. What are those teams’ core hours? That time that the people agree to be online and available to collaborate? When are those recurring meetings? Is it customary to have your video on? How do people keep each other updated on their work in progress?
Tip 4: Share As Much Context without Overwhelming
Number four, is to feed your plant to help it grow, so getting on that watering and fertilizing schedule. Or, sharing as much possible as context without overwhelming the person. Of course, we want to set up that person for success, and that means providing the right amount of context to help them get up to speed on the organization, their role, and where it fits within the organization. We don’t want to overwhelm them. We already know it can be overwhelming. It can feel like you’re drinking from the fire hose, as we start at a new company, especially when onboarding virtually, because it can be so easy to send hundreds of links and readings and documents with a couple clicks. Let’s be really mindful and consider what topics, resources, and artifacts are going to help this person get up to speed without overwhelming them or distracting them with unnecessary detail their first week.
In my experience, I got this really nice, well-curated, prioritized reading list that contained details specific to my role. You could also set up automated emails to share key milestones and the associated reading lists, reminding that person of helpful links, what is important for them to consume when, and learn about at that specific time during their onboarding experience. Again, this is going to help to not overwhelm them on that first week. What else could this list include? The first week, it’d be helpful at a minimum to know who’s who. Again, we’re not going to be in an office to be able to more casually and actually get introductions. Include the names of that onboarding trio, HR business partners, the leader of that person, also a snapshot of the teams and names of the organization, and maybe the top level groups in engineering and how that person might interact with them. Then, additionally, a list of people, a recommendation of who to start meeting with in maybe that first week or the first month. Also, can you visualize the major pieces of the infrastructure that that team owns or supports? Perhaps that includes links to other doc sites or information, as well as the point of contact on a team that they can go to for questions in the future.
Also, do have a list of other memos that person should get up to speed on, whether they’re high level objectives over the next couple months, tech 101, again, release processes. What are those specific things to your organization that will be good to know right away? Lastly, just a list of all those bookmark worthy links, such as team Google Drives, Slack channels, acronyms, HR systems, travel portals, code repos, project documentation. How does that team track progress? How do we communicate? What are some key milestones and goals for the next couple months? Communication is going to be of paramount importance when working remotely. We have to make sure that whenever possible, we can document decisions both verbally and written. Those water cooler chats that sometimes lead to a decision, make sure we document that really well so that other people can weigh in or ask questions asynchronously. Because that new hire isn’t going to be able to simply turn around and ask someone a question. Also making sure that they know that it’s ok to reach out for help, even if setting up a call requires a little bit more friction.
Tip 5: Make It Easy for them to do Their Jobs
Number five, as I learned the hard way, my plant was not set up for success sitting next to that drafty window during a brutal Minnesota winter. Making it easy for that plant to thrive in its new environment, or making it easy for that person to do their job at home. If possible, maybe you have an allowance to make sure that they can afford the equipment they need to do their best work, such as noise cancelling headphones, a second monitor, upgrading their home internet connection, or even maybe subscribing to a coworking space.
Tip 6: Setting clear Expectations
Number six is to set clear expectations. As part of my onboarding experience, I was given these really useful milestones to help me understand what my journey might look like so that I could set expectations and feel really confident that I was meeting them at that time. At the end of that onboarding experience, this person should know where and how they can access information that they need on an ongoing basis. They also should know what is expected of them. What should they be working on? How do they know that they’re making the appropriate impact at that time? Also, on the lines of setting expectations, discussing team signals against some of these things that are just more difficult to pick up on if you’re not sitting in an office with each other. Documenting really well those consistent boundaries and signals for each other, everyone in the team to know each other’s availability. How do you signal to your team if you’re available, connected, or unavailable and offline?
Also, we want to know of course that flexibility isn’t a one-size-fits-all solution, so focusing on establishing expectations with that person, as an individual, providing the best options for them with the team, and of course, the organization. Making sure that people know that if you’re in a different time zone, remind them that you don’t have to respond immediately to those off-hours emails or communication. Of course, the results that you deliver are going to be more important than the time that you spend online.
Tip 7: Facilitate Ongoing Connection Points
We have to continue feeding that plant to help it grow, of course. This onboarding journey is going to last a lot longer when onboarding remote. There are cultural learnings, getting to know people, those interpersonal interactions that just feels a little bit slower when we’re getting to know a new team and an organization virtually. I cannot speak highly enough about the Planta app. It allows you to set up each of your plants and determine which room is best to put it in your house, depending on the lighting and temperature, maybe it is outside. Then it tells you when to water, when to fertilize the plant. I highly recommend the app, Planta.
Back to that human tip. We want to make sure that we’re facilitating ongoing connection points. I’ll talk through some of these in detail, so those one-on-ones. That time to check in on that person. How are they managing work, home, and wellness? Where do they need help? Office hours, so perhaps engineers on the team can host office hours where they can know that they can hop on to ask any question that they have. Then also, team connection. We want to make sure that we are setting business aside and talking about topics of that team’s choosing to help them get to know each other and build camaraderie. One thing my leader did was started a thread in our team channel when I joined to have my peer colleagues share a tidbit of advice for me as I was onboarding. This was a really helpful way to get to know folks and feel really welcome on the team. One-on-ones are an essential way to take a pulse on that person, make sure that they have everything they need. Week one, here are some examples of questions that you could ask. Of course, we’re having one-on-ones, ideally, on a weekly basis with that individual, so here are some questions to ask for the next couple weeks.
As important as those one-on-ones, of course, we want to make sure that that person gets acclimated to the team and gets to know the team, starts to build those trusting relationships. How do we do that? A lot of it is to, again, set that work chat aside and focus on things outside of work. What are their hobbies? Can we create water cooler type channels, such as dogs, pets, for them to exchange pictures of their pets or house plants, gardening? What are those hobbies that they enjoy, and that they can connect over? Also, every Monday, I like to do a photo of my weekend thread. It just really helps to humanize our fellow colleagues and see, and connect about hobbies, and what team members are doing. Of course, these are all optional options for team members not to feel like they have to participate. Virtual games, another fun way is to start team meetings with an icebreaker question for us just to get to know each other.
I just want to note that onboarding shouldn’t end after the first week, month, or of course even year. It can be an easy way just to take a pulse on that person via a survey. A survey that includes questions of things that we care about. We want to make sure that the person is confident in those tools and technologies, gives you an opportunity to establish relationships, have quality one-on-ones, no expectations of their role. Know what the goals of that team in the organization are, the short and long-term goals. Also, of course, just feeling welcomed and productive. We can take a pulse by sending surveys, and then monitoring and measuring that over time. Then, lastly, we can involve that person who just onboarded to help us improve the experience for folks who are onboarding in the future. A great way to add value is to contribute to that onboarding experience of the team.
Key Takeaway
I hope you’re able to leave with some ideas for making the onboarding experience more enjoyable on your team. I also want to note that while this presentation, of course, aims to bring some joy and make a little bit of light of the situation, a lot of people are really struggling in their situation. Regardless of the tips, if you take away one thing, please just give each other grace and patience, as everyone has different levels of challenges that they are working through as a result of this pandemic.
Questions and Answers
Verma: Would you choose a different set of questions based on new hire seniority? You talk about the questions you should have. Let’s touch upon that, give us more detail.
Wardin: I think it’s always good to cater your questions and discussions based on what that person needs as an individual. As you’re getting to know them, their seniority could come into play based on the questions that you ask. For example, perhaps it’s someone who maybe it’s their first role in the industry. Or probably asking questions like, can I help connect you with any mentors, or what are you interested in learning about? Of course, the learning about can be for anyone with any seniority because we want to make sure folks are constantly learning in their roles. I would say getting to, what are those fundamentals and those essentials for that person to really feel successful? That’s really going to change based on their seniority, whether it’s someone brand new to the industry, or just brand new to your organization who’s getting acclimated to the different tools that your team uses? Yes, that is absolutely something to take into consideration.
Verma: Also implicit to seniorities, this is new. If you’ve worked in industry for a long time, then you’re probably used to certain ways. Does that spark any thoughts? Do you do something different? If you know people are coming from a non-remote background, how do you approach that versus somebody who’s been there?
Wardin: Folks who maybe remote is newer to them, or they’ve had several years of working co-located with a team, you could share some things that have been successful for your team, those team norms that I walked through, like here’s our ways of working. Here’s what’s worked well for us. Here’s maybe where we would love to hear your insight for where we could do better. Then maybe someone who’s coming to your organization, who, again, has maybe a lot of experience working remote, invite them to help your team continue to find what that looks like, based on what’s worked well for them. A lot of times we’re trying to break some habits of just the symptoms of working co-located for so long. I think that is one thing, is like just being really explicit and setting those expectations, and also inviting them to help you improve your team processes too.
Verma: That’s true. It’s a team effort and involving them early on makes a lot of sense, just by asking questions, it seems like.
Since we are talking about tenure, a slightly related question is the time based progress. Let’s say somebody did join your team, what questions do you ask them along the way, as they complete their first month, let’s say three months, six months? Do you have a framework you use to set them up for success there?
Wardin: I had one slide on after one month, and then after two months, and after three months, and some of the questions after of course we get the foundations of like, I’m confident with my ecosystem. Who are the people I work with? What are our ways of working? After maybe that month or two months, like, do you have any questions or need any more context on our organizational goals, or the specific things or priorities that we’re going after as a team? Now that they have that foundation and can start looking into like, how does my work tie into the bigger picture? Or after that three months like, do you have a good idea of what you can do to have a really successful career here? What does success look like to you? Or, does your experience match your expectations when you joined? You could ask that at the one-month, two-month, three-month time period. Then also like, how productive do you feel? What’s getting in the way of your productivity? Do you feel welcome? Do you feel like you have an equal voice compared to the rest of your colleagues? Do you understand what’s expected of your role? How are those relationships that you’re working to build? I think progressing through the months after they’ve onboarded, to get into some of those questions to understand, how are they embedding into the organization and also starting to provide value? How confident do they feel? Do they feel like they’re supported?
Verma: All this points to the preparation. As a hiring manager, bringing people into your organization, you’re well prepared to set them up for success. How much should the company lean into creating a generic pre-boarding or preparation for onboarding somebody, versus how much should it rely on individual teams to do things, in your opinion?
Wardin: I think that really depends on the organization. There’s always going to be a minimum of links, like, here’s our company mission, company values. Those things that are going to be consistent across teams. At least 80% of that onboarding template or plan will be customized for that person, their seniority. What you think that they’ll need support in. As you’re learning more about them, you’re going to be tweaking it. Saying, based on our conversations last week, here are some additional things that you could read, your team, that organization that they sit in. I think that’s going to be a little bit more customized based on their role, at least in my experience. I think 20% of it could be, here’s the company, here’s our priorities for the year tech setup 101 courses, and links like that, that you could absolutely reuse.
Verma: One thing that stood out to me as you were introducing your particular situation, like you were a manager, there’s a duality to it. You are onboarding as an employee, and you are then onboarding into a team also and getting to know the team. Maybe, can you point to things your team did to onboard you? In other words, what are the things you would do to onboard your boss remotely?
Wardin: My team was amazing. All of my one-on-ones for the first month, I just had a list of questions that I was asking, and they would help really onboard me, provide context. Each of them are leading somewhat like independent projects, or at least are captains for those projects. Within the first one-on-one, they would provide a recap, or like, what’s going on right now, overview of the mission and what that project was aimed to accomplish. That was a wonderful thing that the team could help me as a manager get up to speed on, as well as just like a high level overview of themselves, their career, what they were looking for. As a manager, especially, those are things that I really want to do immediately is get to know my team and how I can best help them. It’s the irony of like, they’re helping me but I’m helping them. To me success is like, how can I help my team? Being so new I’m like, “I don’t know what to do here.” I think it is the accountability of the team to of course help onboard their manager when they are new. Then of course, I leaned on my peers, my colleagues, and then my leader a lot to help provide me the context of my role in my team too.
Verma: I’m going to switch to the topic of peer buddies. Let’s talk about how you set up peer buddies or train them. What coaching you provide them, so that they can be really effective? Of course, in the beginning part, we were all scrambling, and we did it as we went. Do you have any suggestions how somebody could set up peer buddies for success.
Wardin: For the peer buddy role, that one, I look to someone who either onboarded most recently onto the team. Maybe even they onboarded remotely, within the last year and a half or two years, and so that they understood what were those gaps that they wish they had, so that they were really familiar with how was that experience onboarding, personally, and so that they can help that person. Also, that peer buddy, in your question about getting them ready, making sure that they’re really familiar, of course, with the code base, all the tools used, because they’re going to be the go-to person that that person is going to want to reach out to. Also, that they are feeling really prepared and confident to give feedback. Making sure that that person is willing and also coached to be able to practice some of that with you, perhaps, before the new hire joins. Also, that they have time carved out. They’re not on a project, perhaps, like in the critical path of like, we have a deadline for this. Maybe that person, if that person is in the critical path isn’t the best person for a peer buddy knowing that they’ll probably be pinged throughout the day, perhaps a lot of context switching as they are really dedicated to helping that other person join the team.
Verma: The last point is actually super important. You cannot just add one more task and then make them feel like, this is maybe an aside task you have to do. It’s probably a part of their main job. It’s really important for them to bring somebody on board and make them feel welcome, set up for success and all that.
Wardin: Maybe it’s that peer buddy is working on something that would also make sense for that onboarding activity, like, why don’t you just pair up? You could do some paired programming, and that person could just observe you as you’re working, if it’s something that’s straightforward, and he can also demonstrate various areas of code base as they’re working on it, but again, isn’t super time sensitive so they don’t make it a stressful experience.
Verma: Make the time and space, always be aware of that.
Let’s take the same peer buddy thing and talk in the context of maybe you as a manager having peer buddies, or maybe think about if your boss were to have a peer buddy, what would that look like? Does your advice change or modify in any way?
Wardin: I had a peer buddy type role, a mentor, because he, of course, wasn’t on my direct team, he was a leader of another team. He had onboarded within the last six months, and so that was wonderful because he could even say, “I went through the same thing. Here’s what I did.” The advice was very timely. He was also still seeking to understand how the organization works. Sometimes we’d go together and say, here’s something we’re a little confused about, how could we clarify this for the next person? Super approachable, again, was just always available for me to reach out to, and of course, approachable and available.
Verma: With leadership positions, obviously, decision making and culture are super important, how you, for example, give direction to the team. How far do you go in terms of leaning in towards providing context or actually telling people what needs to be done? There’s a continuum there. As a leader, how can peer buddies help them find that balance, that, at this company, we do things this way, and that will be the best way for your teams to take off?
Wardin: Specific to cultural callout that I did lean on my peer buddy a lot, because at Netflix we do exactly what you said. We want to lead with context and not control. There’ll be some situations in which I’d say, there’s a lot of chaos here. I want to apply a process or I just want to say, we should do this this way. Especially being new, I think that I should do a lot more observing, and then leaning on my peer buddy to say, “This is my gut reaction. Can you just check my thoughts and feelings and let me know if there was some type of ‘handbook?’ How would I go about this like the X organization way?” Or, “What would you do in this situation?” That was very helpful to have someone just to bounce ideas off of. Also, one thing that as you’re onboarding folks too, have a lot of patience. Especially with remote teams, like that culture is going to be a little bit more difficult to pick up on, unless it’s explicitly written out, or verbalized, or communicated. I think being really direct with the culture norms that are special and valuable to those teams, and then the organizations is going to be that much more important too, as you’re helping people join that new culture.
Verma: As a peer buddy sometimes, I end up thinking of myself as providing the director’s cut of whatever happens. Like a meeting happens later, you could tell the person, you noticed this happened, and this is why that is. Replaying the events and giving them more context. It can be really interesting.
Wardin: I love that because you can say, yes, I didn’t want to call you out right there, but we could go through this meeting or this occurrence that I witnessed, and here’s some feedback, or here’s what I would have done differently, or this went really well and here’s why. I like that, director’s cut.
Verma: Let’s talk about hiring people and onboarding people. One of the things that struck me was this effort around good onboarding starts before the onboarding. I would just take it all the way to the extreme, which is, when you advertise your open job, so maybe it even starts there. Can you speak to, like now when you go out and hire people, what signals do you start sending at the very beginning? Walk through that continuum or timeline?
Wardin: Retention starts at that hiring process, or that like posting a job, because we want people to really understand and start imagining themselves in those roles, to your point, as early as that job description. How can you bring that culture to light, so that people as they’re applying for the role know like, this sounds like I would enjoy working here, or I fit in, or I would add to this culture. What barriers could they imagine that they could then ask in the interview? Or, how does this go? It could just really provide some really good context to have a lot more productive conversations throughout the recruiting process. Then, of course, the onboarding process. I think everything should be set up to make sure that it’s as inclusive also as a hiring, recruiting, interviewing, onboarding process as possible. Also, so that person just knows what to expect. I don’t know if that’s just like a, my personality thing, but I loved being prepared for interviews too, like, here’s what you can expect. Why not? I think that’s so wonderful to help folks feel as prepared as possible, and welcome, and included throughout that process.
Then also, sending signals. I got a lot of really useful information just to help understand exactly what that first week would look like, so that I could mentally prepare. Like, the first day I might have quite a bit of meetings, whether it’s the formal onboarding meetings with these different people. I loved being able to mentally prepare for what that week would look like, before I even had access to my calendar that first day. It just helped me lessen those nerves, and that anxiety that comes with joining a new company is just to know as much as possible what you’re going to face that that first week. That can start with job postings, like, here’s what it means to be on this team. You can just start to prepare, do some research.
Verma: Why not? We should go out there and let people know, what is it to work in the team explicitly? Give them examples of things you do, or even like, have you considered making videos or other artifacts available outside for candidates or potential hires to consume?
Wardin: I think that’d be so cool. Yes, if you did a demo of what your team does. Maybe it is not proprietary at all, but you could share a demo of the appy, but at least the tech stack, things like that, or just like, welcome the different members of your team. I think that would be great, in the job posting.
Verma: There’s a question around Zoom. Socializing over Zoom feels really awkward, especially for new people. For people who have been in the team, you can form rapport and you’re probably ok with it after some time. How do you get around that awkwardness of initial socializing over Zoom?
Wardin: I would say more one-on-ones than if you were in-person, so every week, depending on the size of your team. Also, just like the icebreaker questions. I like to do a photo of your weekend. Like this morning, we all got to see what each other did this weekend in Slack. We could talk about it. That brings up topics that you could discuss. Like, “I see you dressed up as this for Halloween, how did it go?” Stuff like that. Coming prepared with questions helps me try to get over that awkwardness of, you can’t pick up on the nonverbals. Being as prepared as possible with questions, but also, just acknowledging that this isn’t human nature to talk to a screen, and so just have patience and grace for each other and know that they’re likely feeling the same awkwardness and that’s ok, too. We don’t have to make it totally natural because it’s never going to be.
See more presentations with transcripts
MMS • Alyssa Ransbury
Article originally posted on InfoQ. Visit InfoQ
Transcript
Ransbury: My name is Alyssa Ransbury. I’m going to talk about how we’re protecting user data via extensions on metadata management tooling at Square. I’m currently a security engineer on the data security engineering team at Square. I work on libraries and services that enable strong data security practices across the data lifecycle. Prior to my current position, I spent almost two years on our sister team, privacy engineering. There, I worked on laying the groundwork for Square’s current data inventory tooling. I also spent a lot of time thinking about compliance and how metadata could be used to help guide development of preventative data security tooling.
Outline
In the first part, I’ll start broadly with an overview of metadata management tooling. In part two, I’ll talk a little bit about how we use this type of tooling at Square. Then I’ll transition into talking about some very specific work my team and I have done to use metadata to prevent data leaks via printed protocol buffers.
Metadata
To get started, let’s go back to the basics. When we talk about metadata, what do we mean? In a basic obvious sense, it’s data about data. There’s nuance and complexity in how we can mentally compartmentalize this information. When we talk about data, we can mean a single item, an aggregate of many items, or an entire database. At each of these three levels, we can expect the data to have three main features, content, what the data contains or is about. Context, the who, what, when, where, and how associated with the data’s creation. Structure, a formal set of associations within or among data. I’ve also seen this broken down into structural, formatting, and storage. Descriptive, the description, usage and context of the data. The relationship, the linkages or lineage of the data.
Why Care About Metadata?
Why should we care about metadata? First, over the last few years, we’ve experienced an increasing need for data governance to help manage regulatory and compliance requirements, while still enabling teams to use data safely, what some call data enablement. Laws like GDPR and CCPA have given companies the opportunity to review their data holistically from a data privacy and protection perspective. Metadata can be used to guide whether internal data can be used or shared. Rather than blanket denying or allowing certain data use patterns, companies can use metadata to allow specific actions in regards to specific data, while disallowing or blocking everything else. Second, metadata can provide increased business value for a company. Extra contextual information can help data teams choose the most high quality data, leading to more trustworthy and accurate analytics. Last, metadata can help companies mitigate risk. If we know who the data is about, how it’s secured, and what kind of data it is, we can insert additional controls at various stages of the data lifecycle to ensure that it is handled properly.
Metadata Management Tooling
We know what metadata is, and we know why it’s useful. Now let’s talk about tooling built specifically to help with metadata management. At a high level, these tools do a couple of things. They oversee collected data across its lifecycle and help us track associated metadata. They also provide a common interface for people to interact with metadata, collaborate, and effectively manage their work. They link automated metadata and human added metadata effectively.
Common Capabilities
More specifically, metadata management tools often provide some or all of the capabilities listed on this slide. I’ll run through what each of these means quickly even if you don’t choose a metadata management tool that provide each of these things. It’s also possible to mix and match. First on this list is data inventory. Data inventory is the act of ingesting and translating data so that we can understand and store answers to questions like who, what, where, when, and how. Data enrichment is the act of making the data more meaningful. This could mean writing code to automate gaining a deeper understanding of the data, like checking actual data values, and adding a PII tag if it’s a phone number in plaintext, so that the data is subject to more rigorous privacy controls. Data lineage is the act of understanding the origin of data, making it easier to identify downstream effects of changes. For example, data lineage would help us understand which tables were derived from some parent table. If we have metadata stored about the parent table, we can make educated assumptions about what’s in the children tables.
Active metadata management is when we augment the metadata we have with human acknowledge, and do things like use metadata to drive automation with machine learning. User experience can be a really important element of metadata management tooling. Storing metadata is only useful if it is usable. The best metadata management tools offer an efficient way for people in various roles across an organization to interact with and use metadata to work faster and more effectively. Business semantics are variations in terminology across team. A good metadata management tool provides an easy way to link data that is referred to slightly differently across the company. Business rules should also be made easily visible and accessible. Rules should be tied to actual data so that it’s clear what pieces of data the rules do and do not apply to. Depending on use case, and consistent with our internal and external policies governing privacy, it’s sometimes also useful to exchange metadata with third party tools. Last, a good metadata management tool will provide support for security and privacy by making it easy to visualize and manage rules and policies for specific data.
Ecosystem Overview
Over the last few years, we have seen metadata management solutions come to market with increasing maturity. Today, there are a multitude of paid open source and in-house examples. I’ve included some logos in each of these three categories. This is not a full list by any means. The in-house logos I’m including here are for companies who have written about their strategies for metadata management.
Initial Drivers
That was the overview. Now let’s jump into how we handle metadata management at Square. I’m on the data security engineering team. This talk skews towards security rather than business intelligence or data analytics. I’m going to speak only to how my team uses tooling to understand and manage metadata. We had a couple of initial drivers that pushed our security team this direction. When I first started at Square, we relied mostly on manual work by individual teams to understand our data, but we had a lot of data and it was only growing. We wanted to be able to scale and automate insights into what data we stored, collected, and processed. This would allow us to not only continue to meet legal requirements from laws like GDPR and CCPA, but also aim for broader goals related to data privacy at Square. We ended up forking Amundsen, an open source project originally developed by Lyft and became one of their early users. We added a lot of custom functionality to support metadata that could power privacy and data security initiatives. We also added some functionality that we ended up contributing back to Amundsen. In particular, we added support for using AWS Neptune as a backing store. We introduced three new metadata types through Amundsen fork, with the express purpose of affording greater privacy protection for our data. We introduced PII semantic type, data subject type, and data storage security.
PII Semantic Type
PII semantic type is a set value describing the general contents of data in a column with the focus on discovering data that fits into one of three buckets. Sensitive by itself, could be sensitive when taken together with other data, or it links to sensitive data like an internal identification token. Our goal is not to categorize every possible type of data. With this metadata, we only wanted to categorize potentially sensitive data and bucket everything else as not PII. This information would allow us to better understand individual data for our specific privacy purposes. We developed a list of possible PII semantic types based on Square’s data policy, plus conversations with our legal team and product teams. We developed values that were specific enough to be useful, but broad enough that we didn’t end up confusing ourselves and our teammates with hundreds of different options. If two different pieces of data should be handled the same way, for example, a cell phone and a home phone, then they should be the same PII semantic type, phone number.
Data Storage Security
The next metadata type we introduced was data storage security. Examples of this could be plaintext, encrypted, or adapted. This piece of metadata specifically refers to ways the data has been manipulated to improve its security. If we had a column that held the last four digits of some sensitive information, the PII semantic type would describe the type of sensitive data. Since we were only talking about part of the original data, the data storage security value would be truncated.
Data Subject Type
Data subject type is the third metadata type we introduced. It describes the type of people the data could be about. We worked with product teams across Square to define data subject types that were more specific enough to be useful and broad enough to be easy to use and apply. If we take the example from the last slide, some truncated sensitive data, we’d also want to understand what type of user this data refers to. If we had a product called Test product, the data subject type would be test product user. Just to clarify here, we expect that some columns can have multiple PII semantic types and data subject types. The column in this example may actually hold data for two different products, which would be stored as two separate relationships between the column and two separate data subject types.
Example
In this example, two columns hold data that is the same PII semantic type for the same type of user, but one is stored in plaintext, and one is encrypted. With this setup, we can now write a simple query to uncover what data that we deemed to be less protected based on our risk model, and make quick changes as necessary. We can also write queries that tell us which datastores contain the most sensitive data. These queries are preventative. They help us increase awareness, reduce surprises, and allow us to provide greater security around these hotspots.
Flagging Possibly Sensitive Data Locations
Aside from the additional metadata types, we also tried to find ways to automate data inventory where possible. We have a lot of data, we still face the tall challenge of keeping the information up to date, and ensuring that information given to us by data owners remained correct over time. New data sources also get created all the time. Reducing the burden on engineers and analysts at Square who own the data source to properly annotate their tables when they’re created, can lead to less work for them and better metadata for us. Mistakes happen. We take privacy quality control very seriously, which means a lot of double checking. While a data owner might tell us that something is not sensitive, if the name of the column is user phone number, it might make sense to take another look. Checking the first few rows of randomly named column could also reveal sensitive data that someone missed.
Just to give a really rough example. Let’s say we have a table column and it has the data storage security type of encrypted. I’m a data user and I know that this recently changed. I know that we’re actually storing this information truncated instead of encrypted. To reduce manual work and with future scale in mind, we introduced a concept of being able to flag a column if metadata is missing or incorrect. Flags are meant to highlight data storage locations that we’ve judged as likely to hold sensitive data, but that are possibly missing the correct metadata. They can be created both manually by humans and through an automatic process.
Flags
What exactly is a flag? A flag is a piece of metadata linked to some specific data. It contains information about the metadata type it relates to, like PII semantic type, or data storage security. The possible value, which might be tokenized, if we use our example from the last slide, and the human readable description. We also store information about the reason the item was flagged, the person who eventually reviewed the flag, and the result, which would be true or false depending on whether the reviewer decided the flag was correct or not. This allows us to perform checks to ensure that flags were resolved appropriately by authorized reviewers. This also allows us to run queries on our database to understand the accuracy of our flagging system over time.
Automatic Checks
We worked with product teams to understand basic heuristics people looked for when they applied our new metadata types to some data, and came up with automated checks. After receiving the first set of metadata from our data sources, in this case, table schema information, we run it through a first level of scanning, looking for low-hanging fruit. We ask questions like, is the column name an exact or a partial keyword match for a value we expect? We keep a list of keywords associated with various PII semantic types. In this system, a column name first name is an exact match for one of our person name keywords, while a column name for account first name or name would return a partial keyword match. Does the column type match what we expect? A column name email status triggers a partial keyword match for email address. However, since its type is Boolean, and we only expect a string for this PII semantic type, we don’t flag it. Is the column name literal PII? There are some cases where data gets ingested from a spreadsheet, and it’s technically possible in a very edge case that the headers are actually the first row of data.
Is the table name an exact or a partial keyword match for a value we expect? A table called some sensitive data type sends a good signal that the data in the table might be sensitive, even if the columns aren’t named in a way we expect. We also have a job that it samples values from specific columns we’re interested in, and sends the actual data to the Google DLP API. If this API tells us that the values are likely sensitive, we flag the column. If not, we tag the column with the date it was last checked against this API, so that we can wait before checking the same column again. We’ll still flag the column based on the answers to the questions we asked earlier. This chart gives us additional signal into whether something is sensitive, whether or not we answered yes to an earlier question. If some things match up exactly, we actually skip flagging altogether and set some additional metadata instead.
Example – User Flow
In this example, a column was flagged for two different things. A data owner reviewed these flags and accepted that the data storage security is plaintext, but denied that the data is a phone number. After a data owner reviews a flag, we set the metadata appropriately in the background. In this case, we set the accurate data storage security value for this column. The reviewed flags contain a history of actions taken to verify metadata by individual Square employees.
Mitigating Risk – Protocol Buffers/Protos
In this next section, I’ll talk about how we applied flagging to those new metadata types to address the different kind of data altogether. If you don’t work regularly with protocol buffers, or protos, I’ll give you a quick overview. Protocol buffers are a language-neutral, platform-neutral mechanism for serializing structured data. They’re like XML but with schemas. In structured data like JSON, YAML, and XML without schemas, data can take any structure and you’re on your own to make sure a specific instance matches what you expect. Here’s an example proto message called Person. It contains three typed fields. If you use this proto schema for a message, you could use a compiler to generate a language specific implementation to serialize, unserialize, and represent the message. For example, a Java class. Before my time at Square, someone added a special annotation to protocol buffers so that engineers can annotate when a field contained something that should be redacted when printed or logged. Square has maintained a fork of the proto compiler for over five years to ensure that we actually honor this special annotation. Our version of the compiler modifies the generated proto message code to respect the annotation. Any time a proto is printed, the fields with a redacted annotation are obfuscated. This comes into play, in particular, when protos are printed for logging purposes.
Adding Redaction Annotations
We wired up our metadata management tooling to ingest proto definitions the same way we ingested other types of data, and ran the definitions through the same flagging logic I described before. The one change we made was that if a proto message field already had a redaction annotation, we automatically resolved flags for the data storage type. We knew based on the annotation that the type would be redacted. With a fully flagged set of proto schemas, we now had the ability to guess when a field might be missing a redaction annotation. If a field was flagged as being anything other than not PII, and had either an unknown or plaintext data storage security, we could guess that it was missing a redaction annotation. At this point, our flagging logic also hadn’t been seriously tested or challenged. We were adding flags to different data locations, but we didn’t have an effective way of tweaking our flagging logic other than to keep an eye on how flags were getting resolved. Since this was a new feature, we didn’t get a lot of feedback.
Our mission here was twofold, we needed a way to add redaction annotations to protos defining code bases across Square without requiring too much effort from product teams. Relatedly, we did not want to burn anyone out with false alarm, some faulty flagging logic. We needed to make sure our flags had a very low false positive rate. The result was a strong customer-first approach to design and rollout. We started with five fields that our logic told us were probably missing an annotation. I handcrafted PRs for each field and manually followed up with code owners to interview them about a potential automated feature process. We had a test that checked the false positive rate for a set of data with each change to our flagging logic, and we continued to drive the false positive rate down with each PR. Once our false positive rate was low enough, and it felt like PRs were useful, we wrote code to automatically generate PRs on offending repos.
Automated PR Flow
We created a job that would create our metadata using the criteria I mentioned earlier. It would then check the database to see if we already had tried to fix that proto field in Git. If there was no existing branch, the job would create a branch, update the appropriate files, make a commit, and open a pull request. If there was an existing branch and the branch was closed or deleted, the job would update the PR status in our database, and note that data owners had decided the flag was incorrect. Data owners also often left useful feedback comments for us on the fields that they thought were flagged incorrectly. If the PR was still open, the job would comment a reminder if no one had taken an action in some amount of time. The job was also smart. If the contents of our database had changed since the PR was opened, it would update the existing PR to include additional code changes. We parsed an updated proto files in Python using an EBNF grammar.
There were some challenges with this. For example, it was usually not enough to just add the redaction annotation. To prevent red CI builds, we had to make sure helper files were updated correctly, and that the file had the correct imports to support the redaction annotation. On top of this, we ran into challenges in making sure that we could properly parse and update fields that already had annotations or a multi-line. We also had to work to support both proto2 and proto3 files. We stress tested this parsing code on all protos at Square and adjusted our grammar until we had full coverage. We also eventually expanded into adding redaction annotations on protos defined in language-specific files like Golang. We started automating four to five PRs per week. We didn’t generate more PRs until we had closed out the ones that were open. For each of these PRs, we followed up manually with teams and made incremental improvements to the design each week. Improvements included updates to the PR descriptions, changes to the flagging logic, and tweaking the query we had for finding fields missing redaction.
Wrap-Up
Our approach to mitigating potential risk in protocol buffer code gave us time to improve our logic and feel more confident in our sensitivity checking code. In the future, we have the opportunity to extend these checks to other data, anything that can be checked for sensitivity and updated in some way to handle sensitive data more effectively. This could mean JSON objects, GraphQL APIs, or YAML files. We are continuing to build tooling to make this possible.
Questions and Answers
Anand: Another question I had was about data ownership. I’ve, in the past, seen the case where there are datasets, and then there are teams and individuals in those teams, and not every dataset is owned by somebody. Sometimes people leave, teams fold, and that responsibility somehow doesn’t get carried over. Has that been a problem?
Ransbury: That was definitely a problem, especially when we were first rolling out the solution that I was describing with the protocol buffers. We have a sister team at Square who has been working on solving this problem of how do we as a security org make sure we always know who’s responsible for pieces of code, pieces of data. That has been an ongoing project, somewhat manual of trying to make sure that we have all code accounted for. If it’s not accounted for, it always has to roll up to someone. It’s not a perfect solution right now.
Anand: Typically, every company has a head of product. That head of product creates a roadmap, and then every team has to execute on this roadmap in their own areas. Then you have a security team that’s trying to influence everyone to make sure they’re complying with your needs but it’s not necessarily stated in their OKRs. How do you handle this?
Ransbury: This is also part of my presentation, I was talking about how slowly we were rolling things out. Part of that was because, yes, it’s on nobody’s roadmap, or it wasn’t at the time when we were rolling this out. What does it mean for us to actually make progress while not annoying people, and in fact, making people feel like they have successfully helped you solve the security issue without putting them out basically? I think that’s been our answer so far is just to automate as much as we can. We literally had our own code make the PR for them, we opened it for them. All they had to do was just merge the PR, or say, looks good, or say, this doesn’t look good, we’re going to close this PR. I think that was really helpful. We always followed up with people if people were upset or concerned. Especially in the beginning, people were like, what are you doing? We tried to just have as much communication as we possibly could.
Anand: You mentioned a bit about your forking of Amundsen. Can you talk a little bit about that, about the system you have and how it differs, and how do you see it evolving over time?
Ransbury: When we first forked Amundsen, it was maybe two years ago now. At that time, the project was just not as mature. It was definitely hard. It started off mature. Compared to what it is today, it was really different. At that time, coming from a security team, who we didn’t really have any frontend people, and knowing that we wanted to have a UI component, the fact that Amundsen came and shipped with a UI was really important to us. We forked Amundsen because we are not a data team, or we didn’t start as a data team. We wanted to be able to ingest information via jobs that we could run programmatically and write a bunch of code, and we wanted to really get into the weeds there. We didn’t necessarily want to set things up the way that Amundsen had already. The default was that you were using ETLs basically.
What I was talking about in my presentation with the different metadata types, in our version of Amundsen, we surfaced all of that information. We also surfaced in our UI, things like, who has been accessing this data? We also brought in different data sources like Snowflake, which wasn’t supported at the time when we were working Amundsen. Overall, it’s been a really positive experience, I think that probably everyone has the same experience that data inventory is hard. We’re trying as much as we can to still make progress and do things with our inventory, even though it’s still ongoing because we have so many data sources.
Anand: Was your team the one that brought in the first metadata catalog to Square?
Ransbury: Yes, we were.
Anand: What does that mean for your team? Did it become a data team and a security team?
Ransbury: What it’s meant is more in the last year, we’ve been working with the data infrastructure teams to share the load. We were the first ones to do a proof of concept with metadata management tooling.
Anand: What are some of the other challenges that you currently face? Maybe not at the time of thinking about this talk, but maybe things you didn’t have time to add to the talk that you’d like people to know.
Ransbury: The solutions that I’m presenting to you are not some crazy, like ML backed magic solution. The things that I’m presenting are pretty much, here are things you can do tomorrow. Here’s the script you can write. I think that sometimes we can lose sight of these easy fixes. We all have a lot of data. We don’t need every solution to be a magic solution. Sometimes that’s ok. I think when I’ve been doing this work, I’d have to continue to come back to, how do we just get full coverage? What is the baseline? We don’t need to jump ahead of ourselves.
Anand: Is every dataset under your management? Do you have a coverage goal?
Ransbury: Yes, we do. Our coverage goal is basically making sure that we know about all the data, because the reality is Square is huge. We acquire people all the time. What does it mean to have these acquisitions to onboard them to the data inventory? What’s the timeline? That’s all still being ironed out?
Anand: Your coverage goal. For example, someone said, do you review every single schema change for every dataset that Square owns? Then you’ve mentioned, I think, for the ones that you own, you do?
Is your coverage goal for just the datasets you own? Let’s say it’s 100. You said you check every day if there are any compliance or changes, or things that need to be fixed. Let’s say overall, there are 1000 datasets, is your goal to cover the 1000? How do you think about coverage? Ignoring acquisitions, which is yet another challenge on its own.
Ransbury: Just for scope and understanding our scale right now, we are currently looking at hundreds of thousands of tables. When I say we look at them daily, this is not manual. This is our flagging logic runs, basically, and checks to make sure things are what we expect. Eventually, we would like to get full coverage so that we can understand all the data that exists and how it’s protected. Then we can do additional things based on that. Right now, we’ve started by saying, let’s at least get full coverage of this one type of database. We’ve got Snowflake covered. We’ve got MySQL covered. We have the on-prem MySQL. The way that things work at Square is that we let teams choose whatever backing store they want, and so that makes security harder, because then we have to be like, you’re using Dynamo? You’re using whatever random database. That is where we start getting into the weeds of how do we know what’s there. Make sure we’re understanding what exists in that database.
Anand: That is very challenging, and it’s mostly social outreach within the org. You ping them and say, “I’d like to just learn what you’re doing, what tables you use, what databases.” Then they ask, “Who are you?” You say, “I’m in the security org.” They say, “I don’t want to talk to someone in the security org.”
Ransbury: We do have this concept of a security partner at Square, and they’re almost like their own mini-CISOs. Each security partner is responsible for a different business unit, and they really help us understand like, what data exists and where it exists. Help us make that timeline of making sure we can integrate with their data sources, basically.
Anand: That’s good to know. It’s always a challenge. The other question was you were saying, what other things you think about. You said, over the weekend, you thought, in this world, you have to scan and then pattern match, and then write tagging rules or flagging rules, and then follow up. There’s this follow-up case, which is, your data is not clean. Now it’s up to you to figure out how to fix it. How much of your time do you spend on these four, or these different aspects? I think one is, once you write the automated job, and things start getting flagged, there’s that last mile of bringing it into compliance. Can you speak to that?
Ransbury: We actually have a team of project managers within security. They help us do that last mile follow-up. In general, we try to make it as automated as possible, for example, making those PRs for them. You just have to click the button, the bot will literally message you and be like, “Hello, can you please check this?” If someone’s still not responding, it’s not owned, whatever, those project managers help us follow up. That sometimes looks like then going to the highest level of the hierarchy and saying, you need to help us convince people on your team. I don’t have to do as much of that.
Anand: I saw one question here about the original agreement tech version, but I didn’t totally grok this topic. Could you speak to it a bit more?
Ransbury: Do you track the agreement tech/version that was used, when a given piece of data was originally stored by the user?
At Square, we have a separate service that tracks compliance. That means that if I go to the Square website, or I use a Square product and I agree to some terms of service, it does track the exact version of the text of the agreement and whatever. That is carried and follows your user. Remember how I talked about this idea of a data subject. You as a data subject now have this agreement you’ve agreed to, then we can follow your data like that.
Anand: How do you protect PII data from rogue employees, so internal attacks?
Ransbury: This is something that we have to think about. Because there’s definitely an issue if I’m a rogue employee, and I go on and I say, this isn’t PII, but it is. That goes into what I was talking about, where we still check those things. We’re still doing random checks, because, not only could somebody be rogue, but they could have just made a mistake. We want to make sure that we’re encompassing that. Also, things like these decisions are public. We store in our database, we know who told us that this is not PII, and we know the time and we know what you were doing. You can only make those decisions if it’s data that your team owns, which is scoped pretty specifically. I couldn’t just go to some random team and be like, I’m going to say this isn’t PII. No, that’s not even possible. We have a trail of that, basically.
Anand: For example, let’s say my team has read access to a column. That means that I have the permissions in a team that has read access to that PII column. What I do with it, there’s no way to know. If I’m rogue, I already have permissions. There’s nothing beyond that that you can do.
Ransbury: Then it becomes something for our detection and response teams to deal with.
Anand: You do have those teams that deals with it?
Ransbury: Yes. Those are the people who are managing your corporate machine and who are looking at what data you’re looking at.
Anand: I worked at PayPal, and they definitely know if you engage a USB drive, and copy from a cloud storage, they know that. They also check all the keystrokes. Everybody’s keystrokes are captured and checked against patterns. There are a lot of things checked for this case, especially in, I think, anything having to do with payments.
Ransbury: We’re continuing to find ways that we can be passing data to these detection teams so that they can have more information, and respond appropriately.
Anand: How do you know about all the existing datasets? You mentioned the mini-CISOs. These mini-CISOs are outside of your team, but they talk to the project manager somehow and try and give datasets and stuff?
Ransbury: One part is there’s these security partners, and they’re within our org, but they’re working with the higher level of every department. The reality is we are a payments company, and we have a lot of compliance things that pertain to us. We have data security governance teams who that is their whole job, is making sure they know where the data lives. I think a lot of what we’re solving for is like, how do we make this less manual? Of course, they already because of compliance reasons had to know what all the data sources are, and what data we expect to be stored in those data sources.
Anand: That’s their job. You’re really the proper engineering team that helps make this whole thing more automated. It’s easier to stay within compliance.
What’s the next big challenge for you and your team, something on the horizon?
Ransbury: These protos, I think, open up a lot of opportunities for us. We’ve had a couple teams here moving to GraphQL, and so what does it mean to make sure those API schemas are secured properly? We can use some of the same flagging logic there. Where securing the protos also comes up a lot is in logs, so people might log a full proto. We are doing additional checks and research into what we can do to continue to make logging more secure, not just by flagging and adding those redaction annotations, but also by addressing logs at different levels of the log lifecycle.
See more presentations with transcripts
MMS • Steef-Jan Wiggers
Article originally posted on InfoQ. Visit InfoQ
Amazon EventBridge is a serverless event bus that allows AWS services, Software-as-a-Service (SaaS), and custom applications to communicate with each other using events. The service now supports integrations with GitHub, Stripe, and Twilio via webhooks using Quicks Starts.
AWS made Amazon generally available in July 2019, and since the service has evolved through several updates and new features such as event replay and archive capability, schema registry, support for Cross-Region Event Bus Targets, API Destinations, and Amazon S3 Event Notifications.
Through AWS Quick Starts, developers can now use AWS CloudFormation templates to create HTTP endpoints for their Amazon EventBridge event bus for GitHub, Stripe, and Twilio. From their respective accounts, developers can configure their GitHub, Stripe, and Twilio webhooks; by simply selecting the events they want to send to the newly generated endpoint and begin receiving events on the event bus.
To receive the events from GitHub, Stripe, and Twilio Amazon EventBridge event bus uses an AWS Lambda function URL created by the AWS CloudFormation template. According to the AWS documentation:
With function URLs, the event data is sent to a Lambda function. The function then converts this data into an event that can be ingested by EventBridge and sent to an event bus for processing. Once the event is on an event bus, you can use rules to filter the events, apply any configured input transformations, and then route it to the correct target.
On Twitter, Richard Simpson, a lifelong programmer, remarked on it in a tweet:
On the surface, it looks like direct support for webhooks into EB. What’s actually happening, however, is AWS has built (and built into CloudFormation?) prebuilt lambdas to forward webhooks to EB. Weird.
With the addition of GitHub, Stripe, and Twilio, Amazon EventBridge now supports over 30 SaaS Partners like Shopify, Segment, SugarCRM, Zendesk, OneLogin, or Auth0 – some available through the quick start integrations web page.
Nick Smit, a senior product manager for Amazon EventBridge at AWS, tweeted:
We’ve just launched an easy way to receive webhook events from Stripe, Github, and Twilio in Amazon EventBridge. These are some of our most asked-for events, so I’m really excited they’re now easily accessible in EventBridge.
In addition, he tweeted:
You’ll notice the new “Quick Starts” navigation in the EventBridge console. We plan to add more common patterns, like these webhook ones, directly to the Quick Starts section of the EventBridge console.
Currently, Quick Starts are available in the following AWS regions: US East (Ohio and N.Virginia), US West (Oregon and N.California), Canada (Central), Europe (Stockholm, Ireland, Frankfurt, London, and Milan), Asia Pacific (Mumbai, Tokyo, Seoul, Singapore, Hong Kong, Osaka, and Jakarta), Middle East (Bahrain) and South America (São Paulo).
Lastly, Amazon will charge customers the number of events published to the event buses in their account, billed at $1 for every million events. Note that Amazon will not charge for events published by AWS services. For pricing details, see the pricing page.
MMS • Anthony Alford
Article originally posted on InfoQ. Visit InfoQ
Researchers at Amazon Alexa AI have announced Alexa Teacher Models (AlexaTM 20B), a 20-billion-parameter sequence-to-sequence (seq2seq) language model that exhibits state-of-the-art performance on 1-shot and few-shot NLP tasks. AlexaTM 20B outperforms GPT-3 on SuperGLUE and SQuADv2 benchmarks while having fewer than 1/8 the number of parameters.
The model and experiments were described in an Amazon Science whitepaper. Unlike other large decoder-only language models such as GPT-3 and PaLM, AlexaTM 20B is a seq2seq model; that is, it contains an encoder as well as a decoder. The encoder stage gives AlexaTM 20B better performance on summarization and machine translation (MT) tasks than larger decoder-only models such as PaLM. The model is multilingual and achieves state-of-the-art performance on few-shot MT tasks on the Flores-101 dataset, even on low-resource languages. According to co-author Saleh Soltan,
All in all, we demonstrated in our work that the proposed style of pretraining enables seq2seq models that outperform much larger decoder-only LLMs across different tasks, both in a few-shot setting and with fine-tuning. We hope our work presents a compelling case for seq2seq models as a powerful alternative to decoder-only models for LLM training.
The Alexa research team noted that their work is subject to several constraints that do not generally apply to language models. The Alexa digital assistant supports multiple languages, and the input text is “spoken-form” which can be different from the written form of text used in training datasets. Further, because their work is intended to be used in an edge device, memory is at a premium and the model inference must be low-latency; both of these favor smaller models.
To further reduce model size, the Amazon team investigated knowledge distillation. In a paper to be presented at the upcoming Knowledge Discovery and Data Mining Conference (KDD), the researchers demonstrated using a large model as a teacher. The team then trained smaller student models which were only 0.2% the size of the teacher (for example, 17M parameters vs 9.3B).
The researchers evaluated the 20B teacher model on several NLP benchmarks. On the MLSum benchmark, AlexaTM outperformed the state-of-the-art for 1-shot summarization in German, Spanish, and French and on 1-shot MT tasks for most language pairs. In particular, on low-resource languages like Telugu, Tamil, and Marathi, the improvement was “significant.” The model outperformed GPT-3 on MT tasks “in most English centric cases.” Although the model outperformed GPT-3 on most SuperGLUE NLP tasks, it trailed behind Google’s much larger PaLM model.
Several users discussed the work in a thread on Hacker News. One pointed out the advantages of AlexaTM 20B over GPT-3:
Building a model downstream of GPT-3 is difficult and usually yields suboptimal results; however, 20b is small enough that it would be easy to finetune this on a smaller dataset for a specific task. You could then distill that model and end up with something that’s a fraction of the size (6b parameters for example, just under 1/3, would fit on commercial GPUs like 3090s).
The AlexaTM 20B model has not yet been publicly released, but the researchers created a repository for it on GitHub and note that it will be released soon.
MMS • Arpit Mohan
Article originally posted on InfoQ. Visit InfoQ
Subscribe on:
Transcript
Shane Hastie: Good day folks. This is Shane Hastie for the InfoQ Engineering Culture Podcast. Today I’m sitting down with Arpit Mohan. Arpit is in Bangalore, India. I’m still in little Ōtaki in New Zealand, and we are going to talk about people in culture. Arpit, welcome. Thanks for taking the time to talk to us today.
Arpit Mohan: Absolutely. Thank you so much, Shane, for having me on.
Shane Hastie: My normal starting point with guests on the show is, who’s Arpit?
Introductions [00:33]
Arpit Mohan: I’m a software developer and I’ve been in this industry for about 13 or 14 years and a good part of the previous decade, I’ve largely dabbled in and around startups. And by virtue of being in these startups, I’ve had the fortune of seeing a lot of different products. A lot of different industries, ranging from AI, FinTech, Mobile Gaming, Telecom, eCommerce, and this time around with Appsmith, which is our third startup, which is our third venture, we are building in the area of Developer Tools. We are building a Low-Code Application Builder platform. It’s an open-source project and has been for the past two and a half years. And I’m super excited about finally, I get to do open-source in a paying job. So, that’s the part I’m super excited about. And throughout my history, I’ve largely led most of these engineering teams.
And while I got into software quite accidentally, if you will, I’ve been always been fascinated with technology, right from childhood. I’ve pretty much unscrewed most gadgets that my dad bought in the house. Right from the video recorder, the VCR to Tamagotchis, which was a little toy that you had early nineties, to computers. So while I’ve always been fascinated with technology, with hardware, et cetera, I learned the skill to put everything back together, much later in life. So while I could disassemble stuff, I couldn’t assemble things again.
And if each time I disassembled, I was always fascinated by that little green chip that always came out of each of these devices, which was the little board, the little micro controller that came in. And I remember as a child, being fascinated that this green chip is what powers this fantastic world of Road Rash or Tamagotchis or televisions, et cetera. And ever since then, I’ve been fascinated with robotics, with powering and building stuff and creating stuff on that green chip. So I had decided that this is super powerful and yeah, and most of my life since then has been based around creating worlds, creating products that run on this little green chip.
Shane Hastie: Moving away from the little green chip to building the software, but going beyond the software to the people behind the software. One of the things that intrigued me in our earlier conversation, you made a wonderful statement. Code is easy to change, people not so much. Let’s explore communications and communication skills in teams.
The history of software is the history of teams [02:58]
Arpit Mohan: So most engineers, when we get started with our or during our, whether it’s our internships or early jobs, et cetera, most of us, we tend to focus a lot on the actual coding. What is the Java Syntax look like? What are our design patterns? What does architecture look like? Et cetera. So there’s a lot of emphasis over there. On the other hand, interpersonal relations, being able to communicate with your team, all of these are termed as “soft skills”, but these skills are much harder to, a, learn and master and secondly software, as a principle, so the history of software is the history of teams. So there is no great product that has been built by an individual developer. So if you and me, we want to build great lasting software, we have to build it with teams.
We have to build it with other people. That is why interacting with other people and team members is a lot more important and code can be changed, super easy. You just change a line of code, you commit it to Git, get a pull request, you’re done. But on the other hand, if I don’t like something about you, Shane, I can’t just do a git commit. I can’t just raise a pull request on Shane and say, hey, you know, Shane, I don’t like this aspect about you. Here’s a pull request. And bada bing bada boom, Shane, here’s an improved Shane, right? Or a different Shane. That’s not how the way the world works. And the way as humans, as people, as team members, we navigate this is, we have to, at each given point of time, we keep trying to find that bridge, that connector between each of us.
And that is a very unique connector between each team member. So if you’re a team of say, 10 people, as an individual, you will have nine individual connections and you’ll have a unique way, a unique idiosyncrasy, a unique thing that you will do with each nine of these. And this comes very intuitively to humans. Human brains are geared for this. But I think just being able to develop that, being able to identify that and say that, oh Shane does this really well, so maybe Shane likes predictability a little more. So when I’m talking to him, I’m going to focus a lot more on predictability. While somebody else prefers a more wild west way of working. So with them, I’m going to let them take a free hand at the problem. I’m not going to keep a tight leash on the problem statement.
People change over time and relationships need to change along with that [05:22]
Arpit Mohan: And just going to let them run wild, because that’s when they perform best. So everybody has a unique way of operating. And the interesting part is this way of working, people change. So if I figured out a way of working with you today, a couple of years down the line, we’ll be re-evaluating our entire way of working or the way we talk to each other, the way we work with each other. And people who have life partners see this day in and day out. There are very rare occasions where you work with a team member for 30 years or 40 years, but having a life partner for 30 or 40 years is quite common. So this is very apparent in those scenarios where, with your partner, you keep re-evaluating your relationship every few years because you see that the person has evolved, they have changed.
And so have you, and you have to find that new normal. And this happens with every team that you work with. So if it’s a new team, it’s about establishing that normal, but if it’s a slightly longer-term relationship with them, you’ve been in the team for a few years, a couple of years, you’ll try to keep finding that new normal. And all of this comes under soft skills. But to be honest, I think that’s much, much harder. And if engineering colleges or coding camps or et cetera, just did a three month boot camp on, forget about Java or Python, you learn that on the job. That’s the easy bit. Here’s a three month boot camp on how to work with people. This is how you speak. This is how you document. This is how interact in a, which is in a most stressful environment. So on and so forth. If people did that, I think we would have much more empathetic, much more calmer and more productive, happier people overall.
Shane Hastie: I’m not aware of a coding boot camp that puts people first. I am aware of some that do today, bring in the focus, the understanding on, you are going to be working in a collaborative team and with the adoption of agile approaches, every organization claims to be using an agile approach today. So we see more of that emphasis, but how do we help people build those skills?
Helping people build interpersonal skills [07:31]
Arpit Mohan: The first thing is the will to actually invest time and effort over here. So it starts from obviously the will. The second is being conscious about the language and the tone of voice that is being employed in a given situation. So just being conscious about, hey this is a stressful environment, let’s say production went down. So everybody’s under stress and being conscious about the way you are expressing yourself. The type of words that you’re using, the tone of voice that you’re using at that given moment. And if it’s written communication, I would highly recommend people to go back and reread what they said a day later. Once they’re out of that situation, once they’re out of maybe that stressful situation, or you’re trying to get multiple stakeholders on the same page, so on and so forth, so just reflecting upon, hey I said this in the meeting, could I have done anything differently?
Did I do well? Did I not do well, et cetera. So just being able to reflect. So, that’s a very critical piece, of being able to communicate better. The second is something that I have done personally, and I would recommend is, I ask myself, who’s the best communicator out there. And one of the names that popped in my head was Barack Obama, the president of the United States. And I was like, okay, what can I learn from him? And I literally went and I watched, I think about 70 or 80 of his speeches, back to back. I just watched him through and through to get a sense of, how does the president of an entire country communicate? What are they saying? Where are they laying emphasis? On what point are they driving home? So find that gold standard that you think is the gold standard in your industry, in your field, in your team or etcetera, and just be very aware of what and how they’re saying it.
And the more you, either read or listen to a particular person, the more you start writing or something like that. Because your brain implicitly starts to generate what it just heard. So, that’s the other way to get much better at this. And the last bit I would say is, when it comes to, especially written communication, there are some very basic principles which we tend to miss out on, and that is consistency of language or communication in terminology. So the consistency in, if I say a server and you said instance, so while we might mean the same thing, but we are probably not on the same page. We are probably talking about some things that are slightly different and they may diverge.
The importance and value of consistent language [10:05]
Arpit Mohan: So being aware of this, and it’s much easier to do this in written format than in a verbal format, but once you start looking for those inconsistencies that, hey let’s get on the same page with terminology, let’s get a consistent language going first. Just having and drilling down on that, the consistency, goes a very long way in establishing that relationship because the person in front of you also knows what to expect. There is a predictable behavior, and once it’s predictable on both ends, life becomes much easier. Versus you’re constantly second-guessing, they said X, but did they really mean X? I’ll deliver in one week. Okay. So if Arpit said one week, it’s probably going to be 10 days or it’s probably going to be one month. You’re not second-guessing. If I say one week, it just means one week.
I know I said last, but sorry. The other bit I would also recommend is, there are some really good books around this. So a few that I’ve found useful were, Managing Humans and then there is the other one by Camille Fournier, she was the CTO of Rent the Runway, sorry, I’m forgetting the name, but Technical Managers Path or the Manager’s Path. So these two books are really good. So yeah, I would highly recommend reading about this.
Shane Hastie: And we’ll make sure we include the links to those books in the show notes. And that’s the thing that you were saying to me earlier is, you are a distributed systems engineer, but you would advise people not to build distributed systems. And in fact, Appsmith has been built as a monolith. So why is this?
Don’t start with building a distributed system [11:29]
Arpit Mohan: I’ll nuance that answer a little bit, I’d say 90% of the systems that we built do not require a distributed system because when you are doing a startup or you’re getting started with a new project, you should largely, as an engineer, focus on just getting to product market fit. Just getting the job at hand completed. That is a lot more important than massaging some ego or some skill set around, hey, I know how to do Kubernetes or I’m a Kubernetes engineer or I’m a Distributed Systems Engineer. Those are much more complicated pieces of software. And while Kubernetes or systems like Kubernetes or Apache Spark or any distributed system, they’re absolute beasts, they’re fantastic pieces of software, but highly unlikely that you need a distributed system that scales, that auto scales, that auto-scales up and down in the early days of any project or any product in your startup or team.
So unless you’re Netflix or Google, it’s very unlikely that you need a distributed system for, I don’t know, your blog. If you’re writing a blog, just put it on a single VM, like a small, tiny VM, even if it’s serverless, it’s even better. You don’t need to do all of that. Take the overhead of complexity. And this comes from a lot of experience. This comes from having burned my fingers with distributed tracing, with debugging in a distributed systems environment. I really don’t want to take all of that overhead for some product that’s maybe not yet reached that particular skill. And the philosophy over here is, do what works. Do what you know, and just do what works. Instead of constantly overcomplicating the solution at each point of time.
Shane Hastie: And the other thing that we were talking about was taking the ideas from distributed systems and applying them to managing distributed teams and working in distributed teams. How does that play out?
Taking the ideas of distributed systems into working with distributed teams [13:28]
Arpit Mohan: So early in Appsmith’s life cycle, so very early on, we became a distributed team. We were an open-source project and being an open-source project we said, hey we are going to be a distributed team because Conway’s law applies. So we said, if we are distributed, we’ll be able to work with contributors better. We’ll be able to work with our community a lot better. But at that point, this was the first time that we were doing a globally distributed team. And I was struggling a little with how the modalities of operating across different regions, across different time zones. And even if you’re in the same region or time zone, just not being in the same room. And while I did try to read up on how GitLab has done it in the past, how Basecamp has been doing it for many, many years.
I was trying to read how some of those best practices, one thing that helped me get onboarded on this journey is, okay what do I know? Do what you know. And I started from the principle, okay, what do I know about distributed teams? And I was like, okay, I know the word distributed. Okay. So let’s start there. And I looked at a lot of the engineering principles that we’ve developed over the years to manage and develop these systems and like, okay, can this be applied to teams as well? So the first one, for example is, like I was just saying earlier, is having a consistent language. So in an engineering context, you would use an interface file, a Thrift file, a Protobuf file, or a published documentation to say that hey, this is my API interface and this is how you will interact with me.
Similarly, if you’re working in a team, just having that consistent language and terminology and saying, this is the way I work, or these are the keywords that we will use within our product, within our team. These are the acronyms. So if I’m in AWS, for example, I will call it an EC2 Instance. If I’m in Azure, I’ll call it a VM. They mean the same thing, but they’re slightly different again, because I will not call something in AWS, a VM. I’d call it an Instance. So those little, little things, so just establishing the common base of consistency in language, that’s the first thing. In case there is anybody who is using different keywords, then you immediately know that they’re probably not on the exact same page as you. They’re probably in the vicinity, but they’re not at the exact same page.
So, that’s the first thing. So first is, establish an interface of communication. And this interface of communication by the way, is for humans, that’s the interesting part, is different for Arpit to Shane, is a slightly different person than Arpit to my co-founder, versus Arpit to my partner. So humans have this unique way of having many, many different interfaces to the exact same thing, the exact same person. And the fun part about distributed teams or people is that you get to figure this out without documentation. So, that’s one. The other is, there’s a concept called Fail Early. In an engineering context, if a particular service is misbehaving or it’s not doing too well, more often than not, what we would like to do is just to remove that service, like get it to fail, get it to crash and we responded in some other VM, or we just restart the service.
But that’s the concept of Fail Early. And this applies to teams as well, where if we are going through a sprint, we are going through a project cycles and we see that something is not working as per our timelines, something is not going just right. By default, we should try to fail early, either term the project to failure much earlier than going to the absolute end and then saying, oh the project was a failure in this quarter. I mean, everybody probably knew within the first month of the quarter that we are so not making it. So calling out failures a lot earlier in the life cycle, and this applies to people as well, that in case for example, if I’m not doing too well in the team, instead of letting me coast along for a while, before we come to a point where I’m put on a performance plan or et cetera, or I’m asked to leave the team, just fail early. Just call out that failure much earlier in the life cycle.
And it might be a much smaller failure and it’s much easier to correct, to course correct, to just respawn the project. It’s much easier to correct that if it’s called early versus much later in the life cycle. So, that’s the other one, is fail early. The third one is around, what should I say? Service SLS, in any engineering system, reliability and latency are two metrics that are looked at for any service. And they are typical, hey, I have a microservice, how reliable is it? Out of 100 times, how many times does it return the correct result? That’s reliability. And latency is how quickly did it return that result? And any service that is super fast that has very low latency, but is super unreliable, is crappy service. It does not matter. You might as well have taken a lot more time, but just be correct about what I’m asking.
The way to establish credibility is to be reliable [18:26]
Arpit Mohan: And that applies to teams as well. Reliability is the first thing that you want to establish within the team. Latencies is much lower down that order. So when we add any new engineer or any new person in the team, one of the things that we are going to tell them is, any new person, there’s a human tendency to establish credibility by doing work, that hey I’m being a part of this tribe, a part of this group. And for me to establish my credibility, I’m going to try and do a lot more things than I probably can, or I’m going to push myself in the first few days, few weeks, few months to show that Arpit is a great guy. But what we tell a lot of team members within Appsmith today is, the way to establish credibility is to be reliable.
So commit to fewer things, do lesser things, but be reliable about them. So attend fewer meetings, participate in fewer projects. It’s okay. We are not going to judge you on the number of ticket GitHub issues you closed, or the number of things, what everybody innately judges other people on is, if Arpit said that he’s going to get X done, did he get X done in that timeframe that he committed? So establishing the reliability, especially when you’re coming in as a new member in the team that is the first and upmost requirement that you should do. That’s the best way to get credibility and then comes latency. Where, are you consistently slow? Are you consistently fast? Or are you like spiking up and down, up and down up and down. So if you’re consistently slow, everybody in the team knows how to work with you.
If you’re consistently fast, again, everybody knows how to deal with you. And obviously as humans, we will go through some cycles, like some slight curves where you’re productive some days you’re not really, and that’s okay, but at a two week timeframe, in a month’s timeframe, are you by and large consistent? But if you see a lot of spikes and that unpredictability again, becomes very hard to deal with. And with any team member again, that we see where we see maybe a spike in productivity, and then we see a dip. So, the next time we see a spike, we actually call it out. At times, we’ve called out people who are being super productive, we reach out to them and say, hey, what’s happening? How did you suddenly become so productive? Are you on your path to burnout? Are you doing something that is unsustainable?
If so, please scale back. It’s okay to be again, instead of closing 10 issues this week, if you consistently try to close three to five, it just makes things easier versus you closing 10 this week and then going AWOL for the next two weeks. I might as well have you close at a more consistent pace. So we’ve actually called out productive people as well saying, you’re too productive. Please, please stop. And there’s something else that’s up. So those are the latency requirements. And with everybody in the team, I’ve learned to write down their latency numbers, what I call latency numbers, where I literally, after each call, they say, oh, I’m going to do X issue or whatever X problem in so much time. I just literally, for my personal thing, I just write it down. And now I have a multiplier factor for everybody in the team.
So now when you say two weeks, I have historical data to know that two weeks, what does it really mean? And that’s what, then I go ahead and commit to my co-founders or I commit to any other stakeholder. This way, everybody is on the same page. Again, I know what the latency numbers are for everybody. Sorry, I’ve gone on for a little bit over here, but the last bit is having a single source of proof. Where in any system you want to have a database or a centralized source of proof because when you have different stakeholders or different people who are operating on the same information, you’d ideally like to have a centralized place where you can refer to if there are any discrepancies, that’s one. So having a centralized documentation. So that’s one piece of it and the other is having a leader.
The role of leadership in groups [22:25]
Arpit Mohan: So in any distributed system, you have a leader and then you have followers. And in case there is any discrepancy in results, the leader takes precedence, the leader chooses which answer to actually pick, and every system needs to have that leader. So even if you’re a small team of three people, in case of any discrepancies, whose voice will finally be accepted? And establishing that again, within the team is important. And it can’t just be pure anarchy, or even in a democracy, you choose a leader. So everybody gets to choose a leader, but then you actually still choose a leader. And this should happen in distributed teams as well, regardless of how big or small that group is. Is having a voice or a leader that you might agree and commit to, or even disagree and commit to. But the important part is committing and following that voice that you’re on.
So yeah, so these are some of the distributed systems principles that helped me get at least started and establish some protocols within the team and get my head around it. Been trying to get much better and trying to start looking at it from a more human perspective as well. But this helps me bridge that gap between the engineer’s mindset that I’ve had to a more people first mindset.
Shane Hastie: What advice do you have for the newly promoted technologist who is now in a people leadership position? What are the things that they should look at?
Advice for new leaders [23:50]
Arpit Mohan: The first is a lot of people who are new to technology leadership, they believe that the important part is technology and the less important is leadership. So I would A, switch that in your head, the title around and say, Leadership of Technology, rather than Technology Leadership, or instead of saying an Engineering Manager, I would say Manager of Engineers, because the more important part is the leadership and the management. So letting go of the technology bit in your head and saying I’m no longer a developer, I’m not a coder, I’m not a designer anymore. I am actually a leader. I’m a manager. Making that your identity, that’s the first switch that I would make. I would recommend people to do. It’s not an easy switch that happens, but that’s important for it to happen. So, that’s the first thing. The reason that the switch is important is because if people need to get better at leading teams or managing teams, I have seen very, very rare occurrences of people doing coding or technology bits well, as well as people leadership well.
So there’s only one of these two that you can do really well. And if you want to get really good at either one of them, so either you stay as an individual contributor, there are companies, there’s an engineering track that you can take, you can become a distinguished engineer or a whatever staff engineer, or et cetera. But if you want to get into the leadership or management side of things, which leads to maybe VP Engineering or Head of Engineering, et cetera, then let go of the day to day technology bits and focus on just learning about the people, learning about the teams and how you update and focusing a lot more on the processes and how things are moving. That’s one. The other, I would say is a lot of people who get into technology leadership early on, they think about what can I do in the engineering?
Or what can I do over here? They look at engineering as a little bit of a microcosm because they’re Head of Engineering or they’re an Engineering Manager, but ideally what they should be doing is starting to zoom out and starting to look at the larger picture around what is blocking my team. Just start from that question, rather than looking at it from an engineering perspective. What is happening in engineering is less important as what is blocking my team from delivering value. And this could be a myriad of other things. It could be that maybe the designs are coming in too late. The product requirements are not clear. And just starting with asking that question is a very easy and quick way to define what your OKRs for the next three months or six months should be. Otherwise, a lot of people who are new to this role, find it hard to define, what should I be doing?
What should I be judged for? What should my OKRs be? And a way easy way is just keep your OKR, what is blocking my team from delivering value? And this can be measured, quantifiably measured, like do we go to production 10 times a day or 10 times a year? So whatever that number is, oh, I want to improve that number. And just look at that one or two numbers and say, okay, now this number needs to go up. That’s it. Now I will do everything which is, either engineering or nonengineering for that matter and try to improve that metrics. So, zoom out.
Shane Hastie: Arpit, thank you very, very much indeed. If people want to continue the conversation, where do they find you?
Arpit Mohan: I think the easiest way to find me is on Twitter. My handle is @mohanarpit. And otherwise you can always reach out to me on email. My email is arpit@appsmith.com. So these are the two easiest ways to reach out. Apart from that, I’m super easy to find otherwise on the internet.
Shane Hastie: Wonderful. Thanks so much for taking the time to talk to us.
Arpit Mohan: Thank you so much, Shane. This has been an absolute pleasure.
Mentioned
.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.
The Announcement of Discontinuing Google Cloud IotT Core Service Stirs the Community and Customers
MMS • Steef-Jan Wiggers
Article originally posted on InfoQ. Visit InfoQ
Google Cloud IoT Core is a fully-managed service that allows customers to connect, manage, and ingest data from millions of globally dispersed devices quickly and securely. Recently, Google announced discontinuing the service – according to the documentation, the company will retire the service on the 16th of August, 2023.
The company released the first public beta of IoT Core in 2017 as a competing solution to the IoT offerings from other cloud vendors – Microsoft with Azure IoT Hub and AWS with AWS IoT Core. In early 2018, the service became generally available. Now, the company emailed its customers with the message that “your access to the IoT Core Device Manager APIs will no longer be available. As of that date, devices will be unable to connect to the Google Cloud IoT Core MQTT and HTTP bridges, and existing connections will be shut down.” Therefore, the lifespan of the service is a mere five years.
The decision to discontinue the service is remarkable considering the current State of IoT blog post stating:
Growth in the number of connected devices slowed in 2021 but is expected to re-accelerate in 2022 and beyond. While new headwinds, such as inflation and prolonged supply disruptions, have emerged for the IoT market, the overall sentiment continues to be relatively positive, with the number of connected IoT devices expected to reach 14.4 billion by the end of 2022.
In addition, over the years, various companies have even shipped dedicated hardware kits for those looking to build Internet of Things (IoT) products around the managed service. Cory Quinn, a cloud economist at The Duckbill Group, tweeted:
I bet @augurysys is just super thrilled by their public Google Cloud IoT Core case study at this point in the conversation. Nothing like a public reference for your bet on the wrong horse.
Last year, InfoQ reported on Enterprise API and the “product killing” reputation of the company – where the community also shared their concerns and sentiment. And again, a year later, Narinder Singh, co-founder, and CEO at LookDeep Health, as an example expressed a similar view in a tweet:
Can’t believe how backwards @Google @googlecloud still is with regards to the enterprise. Yes, they are better at selling now, but they are repeatedly saying through their actions you should only use the core parts of GCP.
In addition, on a Reddit thread, a respondent wrote:
I really don’t understand Google Cloud sometimes. The number one argument wielded against them by Azure and AWS proponents is that they shut things down. All they need to do is to not do this kind of thing to give them some credibility.
And a Hacker News thread on the discontinuation of Google IoT Core also contains a lot of comments with a mix of sentiments ranging from distrust to somewhat understanding of the position the company is taking.
Lastly, already a Google Partner, ClearBlade announced a full-service replacement for the IoT Core with their service, including a migration path from Google IoT Core to ClearBlade. An option for customers, however, in the Hacker News thread, a respondent, patwolf, stated:
I’ve been successfully using Cloud IoT for a few years. Now I need to find an alternative. There’s a vendor named ClearBlade that announced today a direct migration path, but at this point, I’d rather roll my own.
MMS • David Bator
Article originally posted on InfoQ. Visit InfoQ
Key Takeaways
- The key to retention in the workforce is meaningful recognition. Meaningful recognition requires stating how an employee specifically moved the needle, embodies unique characteristics and made a monumental difference.
- Employees rank social recognition as more important than frequent low-monetary recognition and infrequent high-monetary recognition.
- Workers who are regularly recognized by their managers in a way that makes them feel valued are more likely to recognize others, contributing to an overall culture of recognition.
- Companies need to train leaders to be high-impact recognizers and advocates for the organization as whole.
- Investing in HR tech can be the true and mighty retention driver to ensure praise doesn’t get lost on the to-do list.
It’s been 28 months since the world of work was turned on its head – and still, professionals are embracing remote work with 75% of software developers saying that they’d like to work remotely at least three days a week.
While many employees are basking in the joy of signing on from their bedrooms and home offices, some are using this newfound freedom to reevaluate their roles and life priorities, causing millions of people to jump ship from their long-standing jobs.
Today, HR leaders are often starting their morning with a resignation letter sitting in their inbox. Too frequently, their top performers are leaving for better pay and the cycle of recruitment, hiring and onboarding continues. With the impending recession making counter offers harder and harder to lay on the line, HR departments are searching for a hail mary and answers to their problems.
Achievers’ Workforce Institute recently released its 2022 State of Recognition Report which recognized and analyzed the high levels of turnover.
This report surveyed over 4,200 employees and 1,600 HR leaders across the globe. After sifting through the raw data, we found that the key to retention in the workforce came down to two words: employee recognition. And now, we welcome you to the “Great Recognition.”
How to Get Recognition Right
It’s human nature to yearn for a sense of appreciation and belonging – and if that feeling is missing, employees will look elsewhere. So much so, more than half (57%) of employees say feeling recognized would reduce the likelihood that they would take a call from a headhunter. Today, recognition even outweighs the perception of a fair salary as a driver of employee advocacy, job commitment, and productivity. While offering employee recognition has incredible potential, it must be done right to reap the optimal results. Here’s how to build the groundwork for a culture of recognition:
Be intentional
Growing up, we were continually rewarded for saying, “thank you.” While a “thank you” is always nice to hear, the bliss doesn’t linger quite as long as thoughtful words of affirmation. In fact, 64% of workers admit they would prefer to receive more meaningful recognition, as opposed to more frequent recognition. This kind of recognition can be achieved by stating how an employee:
- Specifically moved the needle
- Embodies unique characteristics
- Made a monumental difference
Given this, next time a coworker writes a clean code with a never-seen-before sense of confidence or leads a project with admirable poise, make sure you sing their praise. The feeling of accomplishment will be felt for a lasting period of time.
Make praise public
The economy is on the rocks and most companies are prepping for the potential storm. While many managers now congratulate colleagues for a job well done with gift cards or free lunches, the budgets may no longer have the wiggle room for these types of rewards. The good news: employees ranked social recognition as most important (42%) before frequent low-monetary recognition and infrequent high-monetary recognition. So instead of a fancy lunch on the town, leaders can simply rave about their employees’ work in the company staff meeting. It will go further than a crab cake.
Start at the top
Let’s face it: Great managers are the backbone of great teams. Between the laundry list of daily to-dos, it’s vital that leaders are (virtually) patting their team members on the back – and often. Interestingly, workers who are regularly recognized by their managers in a way that makes them feel valued are more likely to recognize others, contributing to an overall culture of recognition. Undeniably, managers are the linchpin to an organization’s culture – and the data illustrates their potential impact on a new level.
Invest in training
Though managers can make or break a company culture, most are not being set up for success by their organization. Shockingly, 90% of HR leaders claim they offer recognition training while only 41% of employees say they’ve received it. Further, only one-third of those trained were instructed on how to send a meaningful recognition. It is pertinent for companies to proactively set up training sessions, empowering leaders to be high-impact recognizers and advocates for the organization as whole.
Lean on HR tech
Technology can turn blueprint into a moving vehicle and a waning company culture into a never-want-to-quit workplace. Our research showed the deep value of HR tech and how finding an optimized recognition platform drives real results, whereas a manual or ad hoc platform is destined to weakly support business objectives. Providing a dedicated platform allows employees to give recognition with ease, investing in HR tech can be the true and mighty retention driver to ensure praise doesn’t get lost on the to-do list.
Having a strong company culture is a non-negotiable for many employees and job seekers, and once you put employee recognition at the heart of the business, the cheerful environment will come together with grace and longevity. For many, like PointClickCare Technologies, creating a desirable culture began with investing in a HR platform.
How to Get Recognition Right
PointClickCare Technologies is driven to transform healthcare via a sophisticated cloud-based software that supports senior care. The CEO is a firm believer in the power of company culture, dubbing it as the singular competitive advantage for organizations of all shapes and sizes. Given this, the leadership team was on the hunt to better celebrate and encourage their 1,800+ employees – and to do so, looked to Achievers for support.
The Challenge
Like any forward-thinking company, PointClickCare Technologies surveys their employees regularly, gathering honest feedback and making proactive changes. After reviewing the employee responses, a clear theme was reveled: people overwhelmingly value recognition and yearned for more of it.
Almost immediately, the organization rolled out an in-house recognition system, which had limitations and operational challenges. Despite the drawbacks, the employee’s response was overwhelmingly positive. Knowing this recognition program was here to stay for the long haul, the leaders seeked out a more formal system and selected Achievers’ Employee Success Platform.
The Solution
For PointClickCare’s program, they focused on peer-to-peer recognition and celebrating employee milestones like service anniversaries and birthdays, which was dubbed iCare. The iCare initiative was rolled out to all employees in the U.S. and Canada, with some departments even creating their own dedicated modules to reinforce meaningful behaviors specific to that particular team’s success.
The love of the iCare initiative was quickly felt across the whole company. So much so, PointClickCare created a new employee-wide annual event – the iCare Awards. This award was created to celebrate employees who have given and received the highest number of recognitions over the past year, taking the initiative to the next level. When the awards were around the corner, employees would up their praise, with a 1,209% increase in recognitions the week prior. The event was flashy and red-carpet style, featuring in-house musical talent and an opportunity to really hone in on employee connections.
In addition to the iCare awards, the organization also hosted company-wide town halls that celebrated service anniversaries. Also, for Employee Appreciation Week, PointClickCare chose the theme “Donut Know What We’d Do Without You” and handed out donuts to every employee at a large-scale event. The recognition platform became a cornerstone in PointClickCare’s company culture – and still remains that way to this day.
The Impact
iCare was felt across the office and even seen in the cafeteria – with a screen displaying the recognition newsfeed daily. The Achievers platform made employees feel seen, appreciated and inspired, which bumped employee engagement by 3% in the first year (moving from 86% to 89%). The noteworthy results don’t stop there, given the platform has:
- 66% active users
- 99% employees that have activated their accounts
- 100% activation across managers
- 99% activation among individual contributors
It’s easy for a business to get stuck in their ways and be laser focused on results, but PointClickCare knew there was more to work than metrics met. Since partnering with Achievers, the organization has been able to create a strong culture of recognition. Work is often taxing, but when your impact is expressed regularly, it makes the 40-hour work weeks more manageable (and even, more enjoyable).
The bottom line: the battle to attract and retain talent doesn’t just lie in how much money can be offered but instead, can be won by timely and meaningful recognition. Half of employees looked for a new job in 2021 and almost as many (41%) say they will job hunt in 2022, but when your employees feel supported and appreciated, they’re unlikely to have one foot out the door.
We’re days or months away from a recession, which has the potential to shake up the workforce once again. As we look ahead to a cloudy future, this is my call to action for leaders: tell your employees they matter, show your employees you care and don’t overlook an opportunity to reach out to a coworker to tell them you feel their impact.
MMS • Wavell Watson
Article originally posted on InfoQ. Visit InfoQ
Key Takeaways
- Cloud native networks are not SDN reborn but a fundamentally different way to look at networks.
- While SDN seemed to take physical network machines and virtualized them, CNFs are not merely containerized network virtual machines. The need to split network functions into services is one difference here.
- CNFs are network functionality that live somewhere on the stack of the OSI model (and the lower in the stack, the more difficult the implementation seems to be), which are implemented using cloud native practices.
- While SDN dataplanes (here we are talking about what forwards packets) resided on the hardware ASIC or in a virtualized box with traditional kernel network forwarding, CNFs explore user plane forwarding or the newer eBPF datapath forwarding.
- In the cloud native data center, there is a bias toward layer 3 solutions, but a big driver for CNFs is the Telecom service providers which often drop down to layer 2 functionality.
Of the three types of cloud resources (compute, storage, and network), the network seems to be the most resistant to cloud native non-functional requirements. Compute elasticity, for instance, is reasonably allocated with virtual machines, containers, and orchestrators and managed with CI/CD pipelines. Network elasticity seems to be lacking in implementation. In this article, we show that cloud native network functions are an attempt to bring network applications into the cloud native world. But just what are CNFs exactly, and why are they important?
SDN Reborn? Haven’t we tried this before?
Software-defined networks (SDN) were and are an attempt to automate the provisioning of networks. Cloud native network functions are not SDN reborn but a fundamentally different way to look at network provisioning. In one sense, cloud native network functions are like SDN in that they are software-based and not hardware-based solutions. But cloud native networks have an entirely new set of non-functional requirements separate from SDN. Cloud native non-functional requirements prioritize elasticity and by extension, automation 1, a lot more than SDN. The implementation of this requirement leans on declarative configuration. In other words, cloud native configuration should prefer saying “what” it wants done, not “how” it wants it done. For example, one implication of declarative configuration for networks would be the prohibition of hard-coded IP addresses. Declarative configuration allows for the whole system to be self-healing2 because it makes it easier to read and respond to what the system should look like. The system can then be made to continuously correct itself. Other non-functional requirements of cloud native systems are resilience and availability but implemented with scale-out redundancy instead of scale-up techniques. Cloud native systems try to address reliability by having the subcomponents have higher availability through higher serviceability and redundancy. For example, in the cloud native world having a top-level component with multiple redundant subcomponents where several components are available but a few have failures, is more reliable than a single tightly coupled but “highly reliable” component3.
Beyond Virtualized Network Boxes
There is a sense in which a “network function” is not decoupled. Virtual network functions (VNFs) started as the virtualization of network hardware. VNFs had a one-to-one correspondence of hardware to virtualized hardware, down to the network card, application-specific integrated circuit (ASIC), or a whole switch. While SDN seems to take physical network machines and virtualize them, CNFs are not merely containerized network virtual machines. CNFs are about decoupling network functionality even further. CNFs group networking functionally into components that have similar rates of change, based on the release cycle of an agile product team, which moves away from the large release cycle of big companies. Software that is released by a product team4 could be thought of as a “thick” definition of microservices. A “thin” definition of a microservice would be software delivered as a single process type5 inside of a container. By following developing software as a product team, we find that the thick microservices often look like thin microservices in practice.
Orchestrators have emerged to help manage microservices. Orchestrators are in charge of the scheduling, starting, stopping, and monitoring, (the lifecycle) of the microservice. There are many orchestrators, with Kubernetes (K8s) being the most popular, but there are also domain-specific orchestrators, such as those in the telecommunications domain. One of the early commitments of the cloud native ecosystem was to keep the orchestrator, K8s, from being “balkanized.” This was done by having an official K8s certification, maintained by the CNCF, which makes sure that any forked version of K8s will support the APIs the community mandates, and best practices.
What exactly is a Cloud Native Network Function?
A cloud native network function is functionality that lives somewhere on the OSI6 stack that has been brought down the cloud native trail map. The lower down the stack the CNF is, the more difficult a good cloud native implementation seems to be. This may be because the networking needs to be integrated with the orchestrator and underlying host while retaining its cloud native properties. It also may be because separating previous network functionality, such as that of the forwarding plane, into a shared-nothing process model7 from a shared memory/threading model reduces performance when not done carefully.
To understand the impact of decoupling network functionality, it helps to know a little bit about the reasoning behind network layers. The development of the OSI layers allowed for network innovation to occur while keeping interoperability between layers up and down the stack. At the network layer, the IP protocol ended up being a big winner. At the data link layer, ARP emerged. Multiple vendors iterate at the protocol level within each layer, creating new protocols and new implementations of protocols. Cloud native network functions have the opportunity of being implemented as a protocol within a library, within a microservice, or even being implemented as a group of microservices within a network application.
Ed Warnicke of the Network Service Mesh project once stated that for network services the “packet *is* the payload.” This means that network applications or services actually operate on (transform, route, or analyze) the network packet or frame. Here are some examples of network functionality at the various layers of the OSI model:
- Layer 7: CoreDNS
- Layer 6: NFF packet inspector
- Layer 5: Rsocket
- Layers 4 and 3: Envoy/Network Service Mesh/Various CNI plugins
- Layer 2: VPP-based VSwitch
For cloud native network applications, or higher order cloud native network functions that span multiple layers, some examples are the 5G Converged Charging System by MATRIXX Software and the BGP server by PANTHEON.tech use cases.
The cloud native trail map describes somewhat of a maturity of cloud native applications. Things get more complicated when we dig into one of the stops on the road to cloud nativeness, as is the case with networking, policies, and security. This is to say that there is a cloud native reflexiveness within the tools that help you to be cloud native. When applying this to cloud native network functions, we end up having to implement the network function just like any other cloud native application. A summary of this is as follows:
- The first step starts with coarse-grained deployments, usually implemented as containers.
- The second step is having the service or application deployable in a CI/CD pipeline with stateless and declarative configuration.
- The third step is to support an orchestrator (e.g., K8s) deployed on homogenous nodes which manages the lifecycle of the service.
- The fourth step ensures that the network function has telemetry, this includes metrics (e.g., open metrics-compatible Prometheus), tracing (e.g., open tracing compatible Jaeger), and event stream compatible logging (e.g., Fluentd).
- The fifth step of cloud native maturity, service discovery, allows the network service to be discovered by other consumers inside or even outside of the cluster.
- In order to facilitate declarative configuration, the sixth step outlines the importance of policies, especially network and security policies, as being applicable and supported through the service.
- The seventh step is distributed storage, applicable where stateful workloads are used, to ensure compatibility with cloud native environments.
- Cloud native messaging, registries, runtimes, and software distribution are other stages of cloud native maturity that round out an application’s journey.
The CNF Dataplane
With CNFs, the dataplane 8 (also known as the forwarding plane) moves even further away from traditional hardware. Since cloud native principles value scaling out instead of scaling up, this means that having more homogeneous commodity nodes is preferred over having fewer heterogeneous and specialized nodes. Because of this, there is a disaggregation movement that uses commodity servers in place of the application-specific integrated circuits (ASICs) of a specialized network switch. One benefit of this is the emergence of dataplanes that support a more agile rate of change. While SDN dataplanes (here we are talking about what literally forwards packets) resided on the hardware ASIC or in a virtualized box with traditional kernel network forwarding, CNFs have begun to explore technologies like user dataplanes (e.g., VPP), extended Berkeley packet filters (eBPF) with the eXpress Data Path (XDP), and SmartNIC forwarding.
Layer 3 Ascension
In the cloud native data center, there is a bias toward layer 3 solutions. Being able to declaratively specify and automate the configuration of layer 3 networks has been a determining factor in the development of the Kubernetes networking model. These new cloud native networks rely on IP addresses to connect the cluster’s nodes and applications, not layer 2 MACs and VLANs. However, this is mainly the networking story of the orchestrator and its applications. The data center has multiple moving parts, with different rates of change in this story. These three layers could be described as below the orchestrator (with network operating systems like SONIC, provisioning tools like Terraform), within the orchestrator (e.g., Kubernetes) itself, and above the orchestrator but within containers (e.g., CNFs). The network infrastructure fabric below the orchestrator, such as a (possibly disaggregated) top-of-rack switch in the data center, continues to have layer 2 configuration. The telecom space, a big driver for the adoption of CNFs, also continues to have layer 2 use cases that can’t be avoided, such as Multiprotocol Label Switching (MPLS). The story for the layer 2 fabric is still being written with new implementations of switching software, such as SONiC.
Conclusion
The configuration, deployment, and automation of networks are some of the reasons why elasticity, a staple of cloud native environments, is hard to achieve. It can be the deciding factor for moving to a hyperscaler, such as Amazon, even when a more customized deployment is warranted. This is particularly relevant to the telco space because they have custom network protocols they may want to support for their enterprise customers (e.g., MPLS). Cloud native network functions address these deployment concerns by decoupling network functionality based on the rate of change, down to the coarse-grained image and process (e.g., container) level. This avoids the traditional deployment-in-lockstep problems that networks are prone to have.
CNFs are network functionality, which is functionality that is traditionally thought of as being located on the OSI stack, implemented following cloud native practices which is coupled with the cloud native ecosystem. Networks, and especially telecommunication networks, have a long history of non-functional requirements, such as resilience. Telecommunication service providers use the example of a 911 call as a mission-critical system that demands extreme resilience and availability. Even so, the cloud native ecosystem has non-functional attributes that have gained the attention of service providers. These attributes, such as availability (the cloud native type), ease of deployment, and elasticity, have driven telecommunication service providers to put pressure on the telecommunication equipment vendors (both physical and software) to be more cloud native. This requires that these new network components follow cloud native infrastructure best practices in order to become mature solutions within the cloud native ecosystem. This is not easy, as it is exceedingly difficult to take traditionally tightly coupled components that have demanding performance requirements, such as a networking dataplane, and decouple them.
Dataplanes in the CNFs space are a work in progress and have many solutions. The mere concept of dataplanes complicates the understanding of CNFs, given that CNFs are not just a virtualized representation of a physical box. At a trivial level, networking in a cloud native data center could avoid this complication by concentrating on default kernel networking and layer 3 IP4/IP6 networking. This is often not feasible for telco use cases or the implementation of network fabric. These problems are part of the natural progression of decoupling network software, so there isn’t a way to avoid them. CNFs done right promise a new level of deployability, elasticity, ease of configuration, and resilience not previously realized.
To learn more about cloud native network functions, join the CNCF’s cloud native network function working group. For information on CNCF’s CNF certification program.
1. “Cloud native is about autonomous systems that do not require humans to make decisions. It still uses automation, but only after deciding the action needed. Only when the system cannot automatically determine the right thing to do should it notify a human.” Garrison, Justin; Nova, Kris. Cloud Native Infrastructure: Patterns for Scalable Infrastructure and Applications in a Dynamic Environment. O’Reilly Media. Kindle Edition.
2. “A self-healing infrastructure is an inherently smart deployment that is automated to respond to known and common failures. Depending on the failure, the architecture is inherently resilient and takes appropriate measures to remediate the error.” Laszewski, Tom. Cloud Native Architectures: Design high-availability and cost-effective applications for the cloud (pp. 131-132). Packt Publishing. Kindle Edition.
3. “Intuitively it may seem like a system can only be as reliable as its least reliable component (its weakest link). This is not the case: in fact, it is an old idea in computing to construct a more reliable system from a less reliable underlying base.” Kleppmann, Martin. Designing Data-Intensive Applications. O’Reilly Media. Kindle Edition
4. Cross-functional teams put all of the people responsible for building and running an aspect of a system together. This may include testers, project managers, analysts, and a commercial or product owner, as well as different types of engineers. These teams should be small; Amazon uses the term “two-pizza teams,” meaning the team is small enough that two pizzas are enough to feed everyone. The advantage of this approach is that people are dedicated to a single, focused service or small set of services, avoiding the need to multitask between projects. Teams formed of a consistent set of people work far more effectively than those whose membership changes from day to day. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 6457-6462). O’Reilly Media. Kindle Edition.
5. “The best way to think of a container is as a method to package a service, application, or job. It’s an RPM on steroids, taking the application and adding in its dependencies, as well as providing a standard way for its host system to manage its runtime environment. Rather than a single container running multiple processes, aim for multiple containers, each running one process. These processes then become independent, loosely coupled entities. This makes containers a nice match for microservice application architectures.” Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1708-1711). O’Reilly Media. Kindle Edition
6. In an effort to minimize proprietary solutions, to create an open market in network systems, and to enable management of communications complexity, the International Organization for Standardization (ISO) has developed a reference model for open communications [ 78 ]. This reference model, called the ISO Open Systems Interconnection (OSI) Reference Model, proposes an abstract and layered model of networking. Specifically, it defines seven layers of abstraction and the functionality of each layer. However, it does not define specific protocols that must be used at every layer, but gives the concepts of service and protocol that correspond to each layer. Serpanos, Dimitrios,Wolf, Tilman. Architecture of Network Systems (The Morgan Kaufmann Series in Computer Architecture and Design) (p. 11). Elsevier Science. Kindle Edition.
7. “Processes do not share memory, and instead communicate with each other through message passing. Messages are copied from the stack of the sending process to the heap of the receiving one. As processes execute concurrently in separate memory spaces, these memory spaces can be garbage collected separately, giving Erlang programs very predictable soft real-time properties, even under sustained heavy loads. […] Processes fail when exceptions occur, but because there is no shared memory, failure can often be isolated as the processes were working on standalone tasks. This allows other processes working on unrelated or unaffected tasks to continue executing and the program as a whole to recover on its own.“ Cesarini, Francesco, and Vinoski, Steve. Designing for Scalability with Erlang/OTP: Implement Robust, Fault-Tolerant Systems (p. 29). O’Reilly Media. Kindle Edition.
8. The data plane of a router implements a sequence of operations that are performed for typical network traffic. As discussed earlier, these steps include IP processing of the arriving packet, transmission through the switch fabric to the output port, and scheduling for outgoing transmission. One of the key operations in the dataplane is to determine to which output port to send the packet. This process is known as route lookup […] Serpanos, Dimitrios, and Wolf, Tilman. Architecture of Network Systems (The Morgan Kaufmann Series in Computer Architecture and Design) (p. 117). Elsevier Science. Kindle Edition.