Mobile Monitoring Solutions

Search
Close this search box.

Podcast: Idit Levine Discussing Gloo, Service Mesh Interface, and Web Assembly Hub

MMS Founder
MMS Wesley Reisz

Article originally posted on InfoQ. Visit InfoQ

Today on The InfoQ Podcast, Wes Reisz speaks with CEO and founder of Solo Idit Levine. The two discuss the Three Pillars of Solo around Gloo, their API gateway, interoperability of service meshes (including the work on Service Mesh Interface), and on extending Envoy with Web Assembly (and the recently announced Web Assembly Hub).

Key Takeaways

  • Gloo is a Kubernetes-native ingress controller and API gateway. It’s built on top of Envoy and at its core is open source.
  • The Service Mesh Interface (SMI) is a specification for service meshes that runs on Kubernetes. It defines a common standard that can be implemented by a variety of providers. The idea of SMI is it’s an abstraction on top of service meshes, so that you can use one language to configure them all.
  • Autopilot is an open-source Kubernetes operator that allows developers to extend a service mesh control plane. 
  • Lua has been commonly used to extend the service mesh data plane. Led by Google and the Envoy community, web assembly is becoming the preferred way of extending the data plane. Web assembly allows you to write Envoy extensions in any language while still being sandboxed and performant.
  • WebAssembly Hub is a service for building, deploying, sharing, and discovering Wasm extensions for Envoy.
  • Wasme is a docker like an open-source commandline tool from Solo to simplify the building, pushing, pulling, and deploying Envoy Web Assembly Filters.

Subscribe on:

Show notes will follow shortly.

About QCon

QCon is a practitioner-driven conference designed for technical team leads, architects, and project managers who influence software innovation in their teams. QCon takes place 8 times per year in London, New York, Munich, San Francisco, Sao Paolo, Beijing, Guangzhou & Shanghai. QCon London is at its 14th Edition and will take place Mar 2-5, 2020. 140+ expert practitioner speakers, 1600+ attendees and 18 tracks will cover topics driving the evolution of software development today. Visit qconlondon.com to get more details.

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and the Google Podcast. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Do’s and Don’ts: Avoiding First-time Reactive Programmer Mines

MMS Founder
MMS Sergei Egorov

Article originally posted on InfoQ. Visit InfoQ

Sergei Egorov discusses some of the problems encountered when creating a reactive system.

By Sergei Egorov

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Apple Acquires Edge-Focused AI Startup Xnor.ai

MMS Founder
MMS Anthony Alford

Article originally posted on InfoQ. Visit InfoQ

Apple has acquired Xnor.ai, a Seattle-based startup that builds AI models that run on edge devices, for approximately $200 million.

GeekWire first reported the story, based on information from “sources with knowledge of the deal.” Because XNor.ai’s technology focuses on running AI models on low-resource edge devices without sending data to the cloud, many observers speculate that the acquisition ties into Apple’s strategy for data privacy. However, GeekWire and others report that Apple has given its standard response to inquiries:

Apple buys smaller technology companies from time to time and we generally do not discuss our purpose or plans

Xnor.ai was spun out of the Allen Institute for AI’s (AI2) incubator program in 2017, with funding from Seattle VC Madrona Venture Group.  Company co-founder Ali Farhadi was featured in a 2016 New York Times article discussing his team’s research on reducing compute requirements for neural networks. Xnor.ai’s latest products include a solar-powered FPGA chip that runs a computer-vision neural network as well as a person-detection algorithm embedded in Wyze security cameras—a partnership that was recently and unexpectedly terminated.

The key idea behind Xnor.ai’s technology is quantization of neural network weights and operations—reducing the number of bits needed to represent the numbers in the model. Quantization is a feature of most deep-learning frameworks, including TensorFlow and PyTorch; however, Xnor.ai’s team took the extreme approach of using just a single bit to represent each weight and activation value in the convolutional layers, the main component of state-of-the-art computer vision models. Their result was a neural network that achieves an accuracy only 2.9% less than a full-precision model. In addition to a 32x memory savings, representing the data as single bits also allows the computations to be performed with a simple logic operation, XNOR. Besides giving the company a name, this also means that the vision models can run 58x faster on a regular CPU, obviating the need for a GPU.

The memory and compute savings also mean that the models can be used in devices with resource and power constraints, such as security cameras or mobile phones. This on-edge processing is attractive to the privacy-minded, as it eliminates the need for data to be sent to the cloud for AI processing, which is a major concern for consumers and regulators. Apple CEO Tim Cook’s 2019 commencement address at Stanford University stressed the importance of privacy, prompting observers to speculate that this is a motivation for the acquisition. On the other hand, Cook mentioned in a 2019 CNBC interview that Apple acquires a company “every two to three weeks on average,” having acquired nearly two dozen in the first half of that year. Apple recently overtook Google as the leader in the number of AI acquisitions among the “tech giants,” and has made several high-profile hires of ex-Google AI researchers, including Ian Goodfellow, the inventor of generative adversarial networks (GANs).

Xnor.ai is the second startup from AI2 Incubator to be acquired by a large tech company. In 2017, Baidu acquired Kitt.ai, an AI2 startup that developed a chatbot framework. The Incubator website lists three other funded AI startups, along with alumni Kitt.ai and Xnor.ai.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


C# Futures: Covariant Return Types

MMS Founder
MMS Jonathan Allen

Article originally posted on InfoQ. Visit InfoQ

A frequent API design problem is the inability to use a more specific return type when overriding a method. A good example of this is your typical Clone method.

public abstract Request Clone();

In a subclass, you may wish to implement it like this:

public override FtpRequest Clone() { ... }

Since FtpRequest is a subclass of Request, logically this makes sense. But you actually do it in .NET because overrides have to be an exact match. Nor can you have an override and a new method that only differ by the return type. So usually you end up with something complex such as:

public Request Clone() => OnClone();
protected abstract Request OnClone();

Then in the subclass:

public new FtpRequest Clone() => (FtpRequest)OnClone();
protected override Request OnClone() { ... }

The ability to change the return type of an overridden method is being explored in Proposal 49, Covariant Return Types.

When originally proposed in 2017, this feature would have been implemented using some “compiler magic”. As of October of 2019, the focus has changed towards making this a first-class feature of the CLR.

In the Covariant Return Types – Draft Specification, the IL directive .override will be changed to,

The override method must have a return type that is convertible by an identity or implicit reference conversion to the return type of the overridden base method.

Currently the rule is,

The override method and the overridden base method have the same return type.

Properties and Indexers

Properties and indexers are included in this feature, but only if they are read-only. There will not be matching support for contravariant property and index setters.

Interfaces

Methods on interfaces can override covariant methods on base interfaces using the same rules as sub-classes/base classes.

When a class implements an interface, the implementing method can be covariant with the interface method.

For purposes of interface mapping, a class member A matches an interface member B when:

A and B are methods, and the name and formal parameter lists of A and B are identical, and the return type of A is convertible to the return type of B via an identity of implicit reference conversion to the return type of B.

This rule change for implicitly implemented interfaces could result in a breaking change. This would happen in the unusual situation where a subclass re-implements an interface already implemented by the base class.

interface I1 { object M(); }
class C1 : I1 { public object M() { return "C1.M"; } }
class C2 : C1, I1 { public new string M() { return "C2.M"; } }

Andy Gocke proposed a slight modification to the rule to avoid the breaking change,

Could we change the search for mapping members to consider implicit implementations with different covariant returns iff there is no other implementation (including a default one)?

Unfortunately, this is not compatible with default implementations on interfaces. Neal Gafter writes,

I don’t see how that would work in binary compatibility scenarios. If a new version of the interface is released with a default implementation then the runtime would change to use that one instead of the implementation from the base class?

Prioritization of the necessary runtime support for covariant return types is being tracked internally at Microsoft.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


QCon London – Keynotes & Workshops on Kubernetes, Apache Kafka, Microservices, Docker

MMS Founder
MMS Diana Baciu

Article originally posted on InfoQ. Visit InfoQ

QCon London is fast approaching. Join over 1,600 global software leaders this March 2-4. At the event you will experience:

  • Talks that describe how industry leaders drive innovation and change within their organizations;
  • A focus on real-world experiences, patterns, and practices (not product pitches); and
  • Implementable ideas for your projects and your teams.

Keynotes for QCon London

The Internet of Things Might Have Less Internet Than We Thought?
Alasdair Allan, Scientist/Maker/Hacker.

Alasdair will look at the implications of machine learning on the edge, and the possible implications around privacy and security.

Interoperability of Open-Source Tools: The Emergence of Interfaces.
Katie Gamanji, Cloud Platform Engineer @condenastint.

Katie will explore the evolution of interfaces within the Kubernetes landscape, including networking, storage, service mesh, and cluster provisioning.

The Apollo Mindset: Making Mission Impossible, Possible
Richard Wiseman, author of The Luck Factor, and previously a Consultant for National Geographic Channel’s Brain Games, History’s Your Bleeped Up Brain, & Discovery’s MythBusters.

Richard’s closing keynote will discuss the history of NASA’s Apollo space missions.

10 Workshops in 2 Days

Following the 3-day conference, there will be an optional 2 days of 10 workshops:

  • Kubernetes Intensive Course with Jérôme Petazzoni (Staff Container & Infrastructure Engineer @enixsas)
  • Debugging Microservices Applications with Christian Posta (Global Field CTO @soloio_inc), Nic Jackson (Developer Advocate @HashiCorp)
  • Apache Kakfa and ksqlDB in Action: Let’s Build a Streaming Data Pipeline! with Robin Moffatt (Developer Advocate @confluentinc), Sven Erik Knop (Solutions Architect @ConfluentInc)
  • Microservices: Mapping & Implementation with Susanne Kaiser (CTO @JustSocialApps)

Heads up: workshops are selling out! These workshops are for senior engineers that are looking to adopt new technologies. If you’re interested in a workshop, be sure to save your place soon.

New: AI and ML Learning Paths

New for QCon London 2020 are AI and ML Learning Paths taking place March 5-6. These learning paths offer a structured, intensive, 2 days focused session on this emerging trend to help you develop your knowledge and skills. Get hands-on experience and engage in practical assessments to apply your learning.

Registration is £1,670 (£320 off) for the 3-day conference if you register before Jan 25th

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Designing a Reactive System

MMS Founder
MMS Stephane Maldini Ryland Degnan Andy Shi

Article originally posted on InfoQ. Visit InfoQ

Stephane Maldini, Ryland Degnan, Andy Shi discuss what options are to build reactive systems.

By Stephane Maldini, Ryland Degnan, Andy Shi

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Reducing Build Time with Observability in the Software Supply Chain

MMS Founder
MMS Ben Linders

Article originally posted on InfoQ. Visit InfoQ

Tools commonly used in production can also be applied to gain insight into the CI/CD pipeline to reduce the build time. Ben Hartshorne, engineer at honeycomb.io, gave the presentation Observability in the SSC: Seeing into Your Build System at QCon San Francisco 2019.

Honeycomb is a company and product that is built around trying to understand the workings of complex production processes. They chose to apply their own tool to their build process. Hartshorne said that “the visibility into this normally opaque black box has been stunning”. He mentioned two areas where adding instrumentation has really changed his understanding of how the build was behaving: variety over time and running in containers.

Hartshorne mentioned that he knew that their build was getting slower as they were adding a lot of code. What they didn’t realize was that the slowness was not just unequally divided, but that there were some areas of the build that fluctuated dramatically (over the course of months) going both up and down:

As we had added and removed code while building Honeycomb, we shifted frameworks and made architectural changes to code that dramatically influence the amount of time some areas of our build took. My naïve expectation that all build stages got slower over time was wildly inaccurate – some stayed the same, some went down, and others went up by a little, with some going up by a lot. Understanding this variety changed my mental model of how the build was behaving and helped us focus efforts where it would have the most impact.

Hartshorne mentioned that they had switched SaaS providers with a side effect of changing the build from running in VMs to running in containers. This was mostly irrelevant to their decision to change providers (the main reason being for easier parallelization), but they figured it’d be nice to have the reduced startup time you get in a container:

While our numbers did show that the median time per build step dropped by a bit, the 95th percentile increased dramatically! This was not something we had expected and is not something we would have noticed without powerful tools to visualize performance over time. We dug in to it a little bit and talked with our CI provider but the culprit for increased time eluded us. The closest we’ve come is recognizing increased co-tenancy issues due to tighter container packing, but one limitation of keeping instrumentation in user-space was our inability to get numbers to confirm the idea. Thankfully, the time we gained from running independent steps in parallel far outstripped the increased variance per step and the overall build times still dropped significantly. In this case, the instrumentation identified a previously unknown characteristic of our build system – the wider variance in per-step time – and including that in our model for the process lets us make better decisions about the architecture of the system and future work.

Those two examples illustrate the part of this experiment that’s been the most interesting, said Hartshorne. It’s taking the tools that they normally apply to production infrastructure (whether it’s tracing or metrics or anything else) and using those to influence more of the software supply chain, and more of the build and test processes. Every step along the path from commit to deploy could benefit from using the same toolset that they (as operators) are already experienced with for running complicated applications, Hartshorne said.

InfoQ interviewed Ben Hartshorne about the challenges they faced, what they have done to get insight into the performance of their build process, the benefits they have gained and what they have learned.

InfoQ: What challenges did you face when developing Honeycomb?

Ben Hartshorne: As a young startup, we hit a number of issues along the way that are totally common. We have our automated builds and as any codebase grows, the build slows down. We spend a certain amount of time making them better.

Most of these changes were rather boring (when you’re just starting out even the obvious things get skipped), but to give you a sense of the first steps:

  • Run tests on more capable hosts (scale up)
  • Run independent tests in parallel instead of serially (scale out)
  • Increase parallelism within tests (for languages that support multiprocessor builds)
  • Cache dependent libraries that don’t change between builds
  • Re-use built results instead of rebuilding at each step

Each of those can be done without really understanding why your builds are slow and they’ll almost certainly have a positive impact. Different SaaS providers may make some easier than others, but eventually you’ll exhaust the obvious easy answers and need better data in order to choose where to invest your time.

InfoQ: How did you get a deeper insight into the performance of your build process?

Hartshorne: Some of the tools you might use in the software supply chain export some metrics. Github has its commit history, and build systems have a small amount of timings and their status around the builds. We have other really fancy tools, and as developers and operators, we know how to understand very complex systems using these tools and we should take advantage of that.

The key to fitting these pieces together is to insist that our vendors have APIs that expose this data, both in realtime and after the fact. APIs are the glue that let us push a commit to GitHub that triggers a build in CircleCI that pushes release artifacts that trigger a deploy… APIs are how we create a software supply chain – and they will be how we instrument it as well. The APIs that expose timing and performance data are less complete than those that provide direct functionality because not enough people want them.

As our industry realizes that some of the data around how code moves from development to production can be closely correlated with the ability of a business to quickly respond to a changing environment, I think more people will want access to the numbers that let you better see these processes go by. Two of the factors identified by the State of DevOps DORA report as correlating to high performing teams are directly tied to how long it takes code to move from concept to production (deploy frequency and change lead time) – there’s no question in my mind that this will be an area of growth in the near future.

InfoQ: What benefits have you noticed?

Hartshorne: By hooking these processes together and understanding patterns in the data flow, we can really improve the lives of our own developers. We can improve the efficiency of our development process. This is an exciting area because there hasn’t been a whole lot of work focused there for the most part. I’m not really sure why this focus has been missing, but hazarding a guess I’d say that the CD portion of the CI/CD world seems to be the part that’s hanging on to custom patchwork setups the hardest. Seeing developers hook up their code to a continuous testing environment is standard fare these days. Automatically moving the artifacts built by that continuous process into production seems to be far less standardized. Folks that work in PaaS environments might be the first to see progress here (in something like Heroku or JenkinsX or Kubernetes, the deploy step is much more likely to be easily integrated with the test step), but we’ll see.

InfoQ: What have you learned?

Hartshorne: First, it became clear that build systems are not as impenetrable as they appear to be. The purpose of a build system is to run a bunch of commands, and by hooking into that and using some normal operating system and process tricks, you can get an enormous amount of insight into your build processes very easily. Second, you can use this insight to focus your developer actions around maintaining your build system. And third, that there are huge areas of software supply chain outside of build and test that will benefit from this same kind of extra analysis.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


JetBrains Releases Coding Typeface, Mono

MMS Founder
MMS Erik Costlow

Article originally posted on InfoQ. Visit InfoQ

JetBrains has released a new font, Mono, that strives for legibility of code that is presented in Integrated Development Environments (IDEs). The font maximizes the space used for each letter and visually differentiates between characters like the number one, lower-case L, and capital I.

Mono is a fixed-width font that joins the ranks of other open-source coding fonts, such as SourceFoundry’s Hack and Mozilla’s Fira Font. The work that differentiates Mono is its clear-cut strokes to terminate each letter and curves that keep many lower-cased letters at the same height. By keeping letters at the same height with similar curves, developers can more easily direct their vision across a line of code without moving up or down. The mono page showcases a side-side-side comparison between several fonts, including Fira and Consolas. Between each font, the letters of mono were larger and had fewer unique curves.

The new font appears in several varieties to work together, including regular, italic, and various bold combinations. Beyond these families, it also includes several ligatures designed to combine the way that various coding parameters have evolved. For example, Mono includes a special character for the structure of conventions such as Java lambdas’ arrow -> and HTML comment <!– –> as well as boolean comparisons like less-than-or-equal-to.

Mono is a fully free and open source (Apache 2.0) font that can be downloaded directly, or will appear in future versions of the IntelliJ IDE. The font can also be used within other IDEs like Apache NetBeans by following JetBrains’ instructions to install the font within the operating system. This widespread availability can assist coders outside IDEs. Readers of the book, Developer, Advocate! may speak at conferences or showcase code in presentations on slides with no direct IDE available. Using the Mono font within presentation software like PowerPoint can help clarify code against other text and add familiarity for technical audiences to hear about what problem the code is intended to solve. A crucial reason for putting code into slides is to draw attention specifically to a small section rather than the entire document or wrapping material. Similarly, existing developer advocates that perform live-coding to write applications with the audience can leverage the font in a way that is easier to read for audience members sitting further from the stage.

Another benefit of Mono is its ability to cater to international audiences of over 143 languages, including Malay, Afrikaans, and Scottish Gaelic. Developers from all across the world can create Strings or name variables in ways that fit their vocabulary, rather than restricting thought to letters that are available.

The Mono font is available directly from JetBrains.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Building a Successful Remote Culture at Scale With Strong Ownership

MMS Founder
MMS Sushma Nallapeta

Article originally posted on InfoQ. Visit InfoQ

Transcript

Nallapeta: Before I get into the actual content, I want to start off by telling a story. This is a story of how I learned about ownership in my career for the first time, and it was through a situation of bankruptcy – not of my own, but of the company that I worked for.

Ownership Through Bankruptcy

I think you all are familiar with Kodak. I used to work for a startup that was acquired by Kodak, called Ofoto. It was renamed Kodak Gallery. They were into photo-based products, so we made photo mugs, photo books. In December of 2011, Kodak filed for bankruptcy. We read the news online and found out. It was a very shocking moment. We didn’t know what was going to happen. Were there going to be layoffs? Will the company be sold to someone else? We had no idea. Kodak definitely made a lot of memories but not money.

I was four months into my job as a first-time manager and I kept asking myself, what did I get myself into? The leadership got into a closed-door meeting at Kodak Gallery. We are expecting the worst to come out. We think they’re going to tell us so many people would be laid off and here is the timeframe in which we’ll be laid off. They come out and they say something very astounding. They said, “We are going to launch a brand new product and we’re gonna launch it in two months. It’s going to be the photo magnets.” At that time, none of the competitors had photo fridge magnets. They said, “You have all the autonomy and ownership. Go figure out how do you execute this and how you want to get this done.”

We didn’t know if this was good news, bad news. We had no idea. We just said, “Ok. Let’s go with excitement. Let’s go focus on this and execute this.” Everyone from product design to engineering were busy figuring out how to get this out to production. We were thinking about, “Should we build a responsive site? How do we overlay the photo on the magnet?” We just went into a full-on execution mode. While the company Kodak was trying to shop us off and trying to sell us off for peanuts, there we were trying to make a decision about how to wrap a thin film around a photo on a magnet to protect the photo. It was an incredible journey and an exciting moment of my career.

We launched exactly on time, a couple of months later. It was a huge success among our customers. They loved it. There was an outpour of messages. They said, “This is the best product ever.” If only we had survived, maybe we would have become a profitable company. A month later, we got the news that Kodak has sold us off to Shutterfly, and the company was going to shut down. We didn’t feel sad. We had actually got on through such an incredible journey that we built a strong bond. We had a strong sense of purpose. We had autonomy and we were all self-motivated. This was my story of how I learned ownership. This definitely was my Kodak moment.

I am Sue Nallapeta. I’m currently head engineering at Apartment List. When I joined Apartment List, we were a team of 25 engineers. Now, we are about 45, double the team in about a year. I have worked at startups as well as large companies and across different domains. The most exciting domain that I worked at was my previous job in a dating industry.

Reflecting on my story of ownership, one of the key ingredients to achieving ownership is having a clear goal. What happens in startups is that the distance between employee – that’s me – and the goal is very short. You can easily connect yourself to the goal. Whereas as the companies become bigger and bigger, the distance increases because there are a lot of people in between and these people need to communicate and trickled down these goals to the larger org. It’s very hard in larger companies for everyone to be actually communicating a goal clearly. A lot of times we go into an all-hands and we have to repeat the same thing over and over, and the fifth or sixth time, you repeat it, it really resonates with people. Then, how do you actually bridge the gap to employees and the goal? I have a story.

Is Remote Hard?

I had a team member who worked with me for two years. Meet Sam. He grew from an individual contributor to a manager, and then a director. He and his team launched several successful products. They were known to be highly productive team. Sam always communicated the goals very clearly. Whenever I had one on ones with people on his team, they said, “There’s nothing that I can not tell Sam that I can tell you.” That was the culture that Sam had built.

What did he look like? He looked like that. Yes, he was a remote employee. He used to dial in through a beam robot. It’s already hard enough to communicate clear goals than you’re co-located. It’s even more harder to become a successful bridge by being remote. Today, remote has become a necessity because we are in a highly competitive market. We are competing against the Fangs, we are competing against the Unicorns. We are competing against the next big cool thing. Not everybody, and not everything is considered cool.

Is remote hard? The good news is, it’s only a 55% problem. What do I mean by it’s a 55% problem? There’s a famous psychology professor called Albert Mehrabian. He did a lot of research into what constitutes the perception of a person. He found that interestingly, 7% weight is given to the words that you use, 38% is given to the tone of your voice. How often haven’t you been in a situation where someone appears fine when you have an in-person conversation but they appear rude on Slack or another platform? Tone of voice is important.

The biggest of them all, that carries the most weight is the body language. Body language carries 55% weight. It’s one thing that does not exist in a remote setting because you only get to see a headshot of a person. How do you actually fix this problem? This is an engineering conference, and I’m an engineer. I love to talk about protocols. What better protocol than TCP? It only means trust, culture, and process. You need a combination of trust, you need to build the right culture, and you need to build a process that supports that culture to be able to succeed in scaling remote organizations.

Drake Equation

How do we build trust? It’s very important to have a very clear goal and purpose, like my story at Kodak – we had a very strong purpose, and it was to launch a new product in two months. It’s very important to communicate that goal and purpose. Then, it is very important to provide autonomy to people so they can execute on that goal. Let’s talk about goal, and how we can articulate the goal, and how can we actually reduce the distance between goal and an employee.

Let’s talk about Drake. Drake was an astronomer and an astrophysicist. He was very excited to work on identifying extraterrestrial lives outside of our galaxy, and also even within the Milky Way galaxy. Why am I telling you about aliens at an engineering conference? I have a reason. I took a general management course with Harrison Metal, and I found out that you can actually use this equation to articulate the goal clearly to the company. What Frank Drake did was he wanted to communicate the civilization to the other members at the city. He broke down and he found that this was a good way to start the discussion.

If I have to break down a Drake equation for my company at Kodak, I would break it down this way. What makes up the company’s revenue? Number of new users times those who registered, times number of photos uploaded, etc. Having an equation at a large company level, and then the magic happens when you break it down into OKRs, objectives and key results for each part of this equation, and hand it off to each and every team at the company. This way, each and every team can relate to the larger vision and the larger equation, and understand what is the part that they play in making that equation come alive.

We did a similar exercise at Apartment List. We just launched our Drake equation. We are going to start using this in our quarterly planning moving forward to start planning out our goals. There is another problems that companies normally face. One, I think people don’t know how to connect their work to the larger goal. Then also, it’s important that people understand how the goals work with the rest of the teams in the org because when you increase one thing, you might actually be decreasing something else. How do you actually have that larger picture in your head and focus on executing your immediate goal? This technique definitely helps.

Communication

We talked about communication being a very important factor. It is. It’s very important to get teams to not have emotional conflicts about personalities, and instead have decision or more decision-based conflicts – how I disagree with certain decision and I want to have a conversation about why. How do you get the teams to do that? One thing that we do at Apartment List is, as we started scaling and as we started adding more remote engineers, we started using this app called Donut. It is a Slack app. It connects to people every week or whatever setting you do. These two people either grab coffee or they get into a conference room and chat. This really helps because you need to connect with every individual on a personal level to be able to understand and make them be vulnerable.

The second thing is, the medium of communication is also very important. Some people love to communicate through visual representations, like on a board, like drawing big diagrams. Some people prefer words and explaining their architecture in the form of words. Some people are just feelers, they go with the gut feeling. These are the people who are huggers. How do you actually tailor your communication to resonate with different types of audiences is also very important as you start to think about building trust. One size does not fit all and you need to have a tailored approach to each of the type of people you have.

Autonomy

Next is the autonomy. How do you actually give people autonomy to execute towards the goal? One thing we did at Apartment List is, we have one day reserved for every two weeks, one day every sprint for engineers to work on an idea that they’re really passionate about. It was somewhat similar to Google’s 20% time but a little bit different. The rule is that the engineers have to work in groups, and the work that they do has to help the company in some form or another. This actually has been a huge success because you’re letting engineers bring something to the table by giving them autonomy and saying, “This is your time. Go and build something and us, in product and design, we will support you to get that to production.” We actually have shipped a couple of features to production with this approach. Some of them have even been metric movers, which has been great.

Trust is both a noun and a verb. Trust exists. I trust you. The more of the verb you do, the more of the noun you get.

Inclusive Environment

Next, how do we build culture? I think the key part of building a culture is to build a very inclusive environment. What does inclusive environment even mean? I used to work for a company that did payments and gift cards, and we had a remote team across a 12-hour time zone difference. I had three engineers here and I had four engineers in India. I wanted to operate this team as one single Scrum team. I brought the whole team together. We started discussing what the goals for the team are going to be, who’s going to be the product manager or the product owner, and what are the different roles each team is expected to provide.

Then we got into the discussion of how can we actually organize our meetings that needs to happen. We were using Scrum. What the team did is, they discussed and they decided if they have meetings either early morning or late evening, it’s inconvenient for both parties, and you cannot have those meetings in between the day because it’s just off hours. They decided to compromise. They said, “Two days a week, the U.S. team will have stand-ups at 7:00 AM, so in morning and two days a week the team will have standups in the evening.” We did the same thing with grooming. We said, one day every sprint we will have groomings in the mornings and in the next sprint we’ll have grooming in the evening. This way we split out the meetings and there was a balance of power. This really helped get everybody’s voice heard and then also made people feel that they’re on an equal plane and equal ground.

This was a very high performing team. I worked with this team for about two years, and they really developed the respect for one another. It’s not enough to just talk about building an inclusive environment here, to find creative ways on how can you get people voice their opinions, voice their ideas, and what forum as leaders are we creating to make that happen.

Career Growth

Let’s talk about career growth. This is a topic that’s very close to my heart because I don’t believe in performance reviews because it feels like a reflection in the past. I do believe that we need to be professionally developing people and be looking ahead. At Apartment List, when we started thinking about developing a career ladder, one thing we decided was that it’s very important for a ladder to match our values and what we want from people. We were no longer a engineering team of 25. We were going to be doubling and the way we operate today was not going to be enough.

We came up with this funnel approach. Think of it like a marketing funnel, where you have more audience on the top and then the conversion is lower at the bottom. When engineers are early in their career, they focus on mastery. They’re focusing on mastering a particular technology, particular code base, or even a product feature. As they become more seasoned, they start to focus on ownership, either ownership of a codebase, ownership of a feature. It could be executing on a feature end to end collaborating with product. As they become even more senior engineers, they should not only focus on their work but also that of the team, and ensure it’s the team success so they move towards leadership. Leadership is a very important aspect even among individual contributors because there is so much that you can do and you have a lot of power in your hands. We focus on things like mentoring other engineers. Also feedback is a very important key aspect, you need to be able to see feedback at this level and even provide feedback. It’s not just the responsibility of the manager. Then we move towards impact. You create an impact not just at the team level or the company level, but also across the whole company interfacing with all the other teams.

This split up our career ladder into these levels. We have six levels. Two levels focus on each of these different parts of the funnel. If you can see the colors at each level, they’re always trying to either touch upon the next level or master it. Having this approach really helped us because, one, the engineers were super excited. Two, it defined something that goes beyond just your technical skills. It talks about soft skills and what’s important at each of the levels. We also define anti-patterns, what not to do at each level, not getting your code reviewed and not taking feedback from code reviews is a big no-no at SE1 level. Whereas, a senior engineer, you should be able to build consensus across the team, whether if it’s an architecture decision or a very complex technical problem.

Recognition

Recognition is a very key important aspect as well. There are a lot of companies tackling the problem of recognition. There are companies that are just built around incentivizing employees to recognize one another. I used to actually work for a company that did that, where you hand out points. Every employee is given a set of points and then you just hand it out to people that you really enjoy working with.

One other thing that we did at Apartment List is, our Scrum teams do something called [inaudible 00:20:07]. In the team retrospective, every sprint, they take five minutes to recognize each other for the work that they did. Recently one of the engineers said, “I’m going to [inaudible 00:20:20] Aden because he was very helpful to me in getting this feature completed and he helped me do the good code reviews on time.” It’s very important at every step of the way to recognize people for the work that they do and you’re building their career along the way as opposed to the reflecting in the past.

It’s even more important to sit down and have a conversation with employees as they’re developing and say, “These are the things that you’re doing that are really great and I want you to get to this level. These are some more things that I want you to do and I can help you get there. I’m going to put you in this position so you have more visibility to prove what you can do at each level.”

Hiring

Next, how do we build process? I think, as you start to build remote teams, it’s very important that you look at a strategy across the board in terms of your hiring, in terms of your architecture, and your dependencies. We had to tailor our hiring process to match the needs of a remote team. Not everyone can work remotely. Not everyone is good at expressing themselves and expressing their ideas when there is not a lot of feedback loop. You need people who are very self-motivated. You need people who can actually be ok with silence on the other end and take that initiative for the first time.

Our hiring process included bringing all these candidates to different Slack channels with a group of engineers, and we would give them an assignment, and basically tell them, “Go ahead and execute this, and you can ask whatever questions you have. You can ask all these engineers on the channel.” When we started this process, we didn’t get any questions because people were just quiet and awkward. They didn’t know what to do. We were questioning ourselves, “Is this process even going to work? Are we like looking for the right things?” Then they started finding people who had a lot of questions, who really collaborated with engineers, who started having architectural discussions among engineers. That proved that we really need to hire people who can really work remotely well and that’s very hard to find.

We also did another thing in our architecture interviews. We told people, “You’ll be tested on architecture. You can decide what medium you use to communicate your architectural decisions and choices.” Some people wrote it on a piece of paper and held it up to the camera. Some people just used words and described it really well. People actually brought up the camera to a whiteboard that they had and then they started drawing on it. There are different ways of doing this. No one’s way is better than the other, but you need to find people who can engage on their own.

Architecture

The second thing is about architecture. Imagine having a monolith and having a distributed team across the globe, and expecting them to deliver on that monolith. It’s going to be a nightmare. We need to evolve our architecture as our team structure changes as well because you need to make people successful. The way that you can make people successful is like providing them with ownership and autonomy. That’s not going to work if you don’t have a scalable architecture to support it. That’s another key aspect.

Dependencies

Then managing dependencies. When I joined the company, we were functionally operating teams. We had a web team, we had a native team, we had an API team because that is what was needed at that stage of the company, because we were just building out these platforms from scratch. As we started growing, we really needed to structure ourselves in the form of cross-functional teams. We started structuring into product-based teams. Now, that’s not enough either because we are working on a lot more strategic initiatives and going after new areas of business. Now, we are moving towards mission-driven teams. I think the point there is, the team structure will evolve as the company evolves. You implement the process, don’t get stuck to it. Just know that you have to change it in six months or one year. I think knowing to reset yourself constantly will really help guide your progression.

I get a lot of questions about, do you go remote-friendly? Do you go remote first? What culture works best? Just for everyone in the room, remote only is where all your employees are distributed. There is no headquarters and you just all operate in that fashion. Remote first is, you could have employees in headquarters and you might have some remote engineers, but you operate with the mindset that it doesn’t matter where I am, but I’m going to use the same processes as I would when I’m remote. Then remote-friendly is, you primarily work out of headquarters. You start adding few engineers or a few employees, and then you start to figure out how to bring them in and include them into your process.

Definitely, there’s an argument that remote-first is a better culture than remote-friendly, but here’s a food for thought. If you just had two employees who are remote, do you actually want to go through and implement a whole bunch of processes to accommodate two people? Is there another way to actually build the right level of processes for the team that they’re part of so that they can feel included?

According to me, this is a cycle because companies keep changing and evolving, there could be an MNA that happens. We could be merging with another company and acquiring a new line of business. We might be actually just creating a new office. It doesn’t matter what it is, it’s a vicious cycle where you will end up in one of these scenarios, at some point of the life cycle of the company. You just have to remember to always reset and adapt to the new processes Trust, culture, and process will guide you through that journey.

I was reading a book by Annie Duke called “Thinking in bets.” One of the things that she talks about is that facts are not final, facts evolve over a period of time. If I asked my 91-year-old grandmother six years ago how many planets there were in the solar system, she would say nine. Now, if I asked the same question to my four-year-old, she’ll say eight. Both of them are not wrong. It’s just important to know that the facts will change over a period of time. Then, if facts themselves can change, why can’t companies change? Why can’t people and their mindset change when needed?

Facts evolve overtime, problems change over time. We have to remember to reset often and reset our mindset. Otherwise, we will not be able to scale to the level of the company. In my going back and finishing thoughts about my first story about ownership, ownership is about taking an initiative and also seeing it through. It’s very important to be that initiator. I think a lot of people are ICs and managers in this room. Some of you are on the initiator side and some of you are on the executor side. Regardless of where you are, you need to figure out, “I need goals. I need a purpose. I need autonomy.” Those will bring self-motivation. That’s how I can operate.

Questions and Answers

Participant 1: I wanted to emphasize the recognition part that you mentioned. For example, one team is in headquarters and of the others are the remote. Fom my experience, it’s a challenge for remote teams to have access, because most of top managers, senior engineers are in their headquarters. For the people in the remote section, it’s harder to access them and it’s going to be harder for them to be promoted or to get more challenging projects. In the headquarters, they can meet over the coffee or something, they can talk. Then they have easy access to the top managers and senior engineers. How you can actually mitigate this issue or, is there a way to actually handle these kinds of scenarios?

Nallapeta: When you have people who are in headquarters and remote, I think us as managers, it’s our responsibility to make sure that you create the right team structure for people to function really well. You mentioned, how do we even recognize people who are remote? What if there is a conversation that happens by the water cooler that ends up resulting in a project? That happens more often than you think. You have to expect it to happen, and figure out what’s the best way that you can push other people towards those opportunities.

One thing that I have done that has worked out for me is, when these water cooler conversations happen, and if someone’s suggesting, “Can I get this engineer to work on this? This person did a great job and I really want this person to work on my next project,” I sometimes question back and I say “Tell me what outcome you want and tell me what the goal of the project is and what type of seniority do you need, and I will figure out who will be on as part of your team.” I think you need to actually figure out how to push your team towards opportunities because opportunities will not push themselves towards your team.

You’ll have to find a way to blend your team into the process. It’s afterthought a lot of times people are not in the visibility, they’re not visible because they are remote and then they become an afterthought for a lot of people. You just have to find a way to engage them. We use Slack. One of the ways that we do is, constantly we have a shout outs channel so we give shout outs to people who did something. We are always giving shout outs to someone for something incredible that they did. They’re always on people’s mind that, “This person actually did a great job, so I maybe I want to talk to this person on how to do this.” Once you start doing that, you will shift the culture to actually start thinking about remote employees as well.

Participant 2: This is a question about the other side of that spectrum. At the beginning of the talk, you mentioned Sam, who was able to rise from the IC to the director and a couple of years. Are there specific strategies that were effective for Sam, especially as a manager and a director to work remotely, and be effective?

Nallapeta: In that situation, Sam was the only manager in the company who worked remotely. There was one other engineer who was remote, but we only had two people who were remote. Some of the things that Sam did go back to my three techniques in the protocol trust, culture, and process. He did it in a way that was very implicit. He would actually call people, use 15 minutes of his time and call one member of his team every day of the week, just to get to know them and figure out what are some of the struggles that they have, just to establish that relationship. Every day, maybe during commute time or in the evening when they have some time, he would always try to establish that relationship. That was one thing that he did.

The second thing he also did was any communication from leadership, he always found a way to translate that back to his team. If there were things that were sensitive, he would ask the leaders, “I want to be able to communicate this in some form or another to my team because they need to know and they need to be prepared for it as well. How can you enable me as a leader to communicate this?” Then people actually start asking those questions and try to push for transparency. You will really start thinking about, “How can we be more transparent across the board?” That was another thing that he was really good at doing, taking what he heard from the top and translating it all the way to the bottom. I think that goes back to the example that I gave. Anytime I had a skip level one-on-one with the members of his team, the response I always got was, “There’s nothing I have to tell you that I can’t already discuss with Sam.” I think that’s a validation for his capabilities to provide that forum.

Participant 3: We had a team where we only have one remote. Any guidelines on knowing whether that team member wasn’t cut out for working remote versus that our team wasn’t helping him out or accepting him? Do you have any strategies for drawing that line?

Nallapeta: Let me repeat your question. Is your question, one person was remote and the rest of the team was co-located, how did they include that one person to be part of the team?

Participant 2: More like it didn’t end up working out and he went away. I’m wondering whether it was us that we didn’t help him. He just didn’t seem to kind of go out and grab our attention. Then again, we didn’t go out of our way to include him either. Any strategies to know where that line is, either it’s not this person’s cut out for it or it’s the team’s deficit.

Nallapeta: I think it’s a bit of both. I think the person can only reach out and connect to what the person can see and witness. If there is a Slack conversation, there’s an email, then the person at least knows this is happening and can engage. Otherwise, it is the team’s responsibility to engage that person. It’s even more important for the manager to make that person be included. Let’s say the team decides they want to start talking about some architecture problem. They just go into a room and then start whiteboarding. The first thing that they think of is saying, “Sam is remote and I need to include him in this discussion.” They reel him in into the conversation because otherwise, nobody will know what will happen if that conversation even happened. Now, if it does occur people might just turn to each other and have a conversation, we need to have a process where people document decisions somewhere so the larger team can see it. I think this goes back to building the right kind of documentation, at least trickling information the right way to people. It could be even in your demos, sprint demos that you talk about certain decisions that were made during the sprint, even architecture, product decisions. At least the whole team come

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: The Common Pitfalls of Cloud Native Software Supply Chains

MMS Founder
MMS Daniel Shapira

Article originally posted on InfoQ. Visit InfoQ

Transcript

Shapira: They’re debugging it, building it with various tools, and then submitting it to some kind of a version control system. After that, our CI is pulling that code from the version control system, performing various tests on it, compiling it, and basically providing some kind of binary artifact. Those binary artifacts are later pushed over to the universal package managers where they are stored, and these universal package managers are basically serving as one single source of truth for our binary packages. After that, they are pushed to the publishing infrastructure, which is Docker, Kubernetes, OpenShift, or the public cloud that you know, AWS, GCP, and so on.

We’ll cover the problems that I met throughout these components, but before that, let me talk about supply chains in general, and specifically about the supply chains of the whole nation.

Supply Chains

I like to introduce you to this fellow. His name is Vladimir Vetrov, and he was an electrical engineer in the early ’50s for the U.S.S.R. In the early ’60s, he became a KGB agent, and in ’65, he was sent to France with one mission on his mind. He was sent to recruit various agents that was already deployed in various R&D centers throughout the NATO countries. He didn’t have any specific action that he would have to provide to them. He just wanted to recruit as many agents as he possibly could. He spent five years in France, and he was sent back to the U.S.S.R., where he got promoted. Now, he is the commander of the new initiative, which is called Line X. Line X is a new initiative of the Soviet Union, whose sole mission was to steal information and R&D projects from the NATO countries. They actually used those agents that he recruited in order to provide or compile a supply chain that provided them with the artifacts from the NATO countries.

After he is promoted, he is sent to Canada. In Canada, the Canadian intelligence is very quickly to pick up on him, and they are uncovering that he is, in fact, the KGB agent. He cannot perform his duties anymore, he is burnt, and he is sent back to the U.S.S.R., against his will, because he is loving the west side country life. He is loving the money, he is loving the blue jeans, and he doesn’t really want to go back. He is getting mad on the KGB, because he wants to stay. Nevertheless, they don’t just pull him back, they also demote him, because now, he is burnt. Everybody knows about him. They sent him to some remote village around Moscow, and so he begins to drink. He is drinking, and he is spending five years brewing his anger against the KGB. He decides to execute a plan and then execute revenge upon them.

He turns to his French friends, and he exposes the names of the Line X agents. The French intelligence is quickly sharing this information with the CIA. The CIA, first, they thought about that, and they thought to just arrest these agents. Finally, they decided to actually think about it a little bit more, and they decided to play a bigger game. What they did is, they filtered out which agents were responsible for stealing the most important information, the most important technology. For instance, technology that was developed for automation of utilization, for example, automation of gas pipes or maybe electricity, and stuff like that, space programs. They selected these few agents, and they actually planted buggy software for these agents to steal.

The Russians – or the Soviets – are stealing these infected artifacts and then implementing them in their products. This actually leads to one of the most spectacular explosions that ever happened on the Soviet Union geographical area, which happened in 1982. What happened is that the trans-Siberian gas pipe project basically exploded. The whole pipes went ballistic, the pressure raised to such levels that they didn’t have any way to actually manage that, and it’s led to an explosion. To summarize what’s actually happened here and how this explosion occurred, the Soviets are deploying agents throughout the NATO countries that infiltrate in actively into the supply chains of the NATO countries. They are stealing their artifacts, and they are building their products with these artifacts, which later on, because of the buggy artifacts, breaks the product.

Common Problems of the Modern Software Supply Chain

Now, I will cover the most common problems of the modern software supply chain, and we’ll see that the issues that led to this gas explosion are not so unique to supply chains in general, but they are also present in the modern software supply chains. As I said before, we begin our journey at the source code, and in the source code, we basically have two components. We have our developers who actually write the code, and we have various tools that are used to compile the code and version control systems that will actually commit the code to.

When we talk about version control and build tools, we basically talk about a few tools that are considered, today, mature and safe. Because they already have passed their phases of testing, they already got covered by a lot of security guys that uncover the vulnerabilities in them. It’s not to say that they are completely safe – there is no tool that is completely safe – but they are considered mature and relatively safe than other tools. It’s true only as long as they are actually authentic. If I’m taking some tool that you’re using, and I’m actually modifying it in some way, maybe I’m taking your ID and I’m modifying it in such way that it will actually inject malicious code in each and every project that you’ll compile with it, then you will have nothing to do about it. This tool is not authentic anymore, and basically, I will infect any product that you’ll compile with it.

Now, we’ll move to the developers; the devs are basically doing three main things. they write the code, they debug it and build it through the various tools, and then they commit into our version control system. When we use our build tools, as I said before, we need to make sure that they’re actually authentic. For example, there is a very popular instance that occurred in 2015 that involved fake tools. Specifically, it was a fake distribution of Xcode. Xcode is an ID for OS X and iOS. It was called XcodeGhost, and it alone led to an infection of over 600 million users, because it got spread and big companies got this tool into their toolkits. One of these companies was the company behind WeChat, and WeChat, alone, had around 500 million users. They actually compiled WeChat with that tool, so every user that used WeChat was infected throughout this instance.

How we can actually battle this situation and solve it. The first thing that we can actually do is utilize some kind of a single source of truth. We have our universal package managers, any kind of a repository that you use today. These repositories can actually store your binary artifact, but not only that, they can store your tools. They can store your installers, your tools that you want your developers to use. You can actually create a repository for your tools for your developers and then supply your developers only with these tools from the source and not allow them to go over the internet and download maybe possibly malicious tools.

The other thing that you can do is educate the devs to manually verify the things that they download. Today, it’s a common practice to actually publish some kind of a hash sum in the download page that you can actually verify when you download the package. This is provided because things can be modified during transit, even when you download them, so it’s very important to actually verify that what you have downloaded is what you expected to download. Last but not least, if you are actually creating this kind of a repository for your tools, you actually have to make sure that this is completely secure. If you are using such tool and storing all of your binary artifacts in it and storing your build tools in it, and it is not secure, it is open to the public, or maybe you grant anonymous access to it, so anybody could actually go into it and modify your tools. Then everybody who actually pulls these tools from this repository will be infected. We will talk about it in a couple of minutes.

We move on to the coding stage. We have three main components that our developers are actually involved in. When we code, we sometimes end up with bugs in our code. Some of these bugs may lead to very serious critical security issues during the way. To battle that, I’m proposing to actually implement fuzzing into your continuous integration, continuous testing, and so on.

What exactly is fuzzing? Fuzzing is when you take a program and provide various unexpected or invalid or I would call malicious inputs to it. Then you monitor the program and look for crashes, memory leaks, and any misbehavior of the program. Once you find it, you know that there is a possibility that someone from outside will be able to provide maybe a more elaborate input that will lead to code execution or maybe any other kind of vulnerability. What is more important for you guys is that now, in the cloud-native ecosystem, there is a possibility to actually implement a continuous fuzzing.

Continuous fuzzing is when you run fuzz tests but you run them continuously and only restart them once you add new code into your codebase. By this, you will gain higher code coverage. You will gain the ability to uncover vulnerabilities or other kind of bugs throughout your product before they land into my hand, before you actually release them to the public. This will lead to less work for me, because I will have less vulnerabilities to uncover, and will make your customer safer, and eventually, you will be happier.

There are some tools that you can utilize for that. The most probably famous of them is now ClusterFuzz which is a Google tool that they are actually using to fuzz almost all of the OSS projects that you are using today. There are others, specifically some tools that are built for specific languages, like go-fuzz, which is used only to fuzz go code, or american fuzzy lop, which is built to compile C code with it. Then, actually, the compilation is to provide instrumentation to the code and to gain coverage in statistics from the fuzzing process.

We move on to the third stage, which is committing the code to our version control system. Usually, we commit to private repositories, which are secure. They’re on-prem, no one can access them, which is good, but sometimes private repositories become public repositories or sometimes we’ll commit to public repositories. In any case, when we commit secrets to the repositories, this is a big problem. Why this is a problem, because once you commit something to a repository with public access, you should consider it as being compromised.

For example, if you commit a password or a key, you should generate a new key and change the password. Why is that? Because the repository tools are saving the history. They are saving logs about your commits. Even if you’re committing a password and later deleting that commit or swapping it with something else, the history will still contain the secrets that you have committed. Anybody with access to that history will be able to gain these credentials and later on use them to maybe log into your Git account, maybe log into your CI, and so on.

What you actually need to do to prevent that is to actively monitor what is committed to your version control systems, and there are various tools that will provide you the ability to actually do that. You will be able to monitor if there are secrets that are committed. You will be able to monitor if there are maybe some API keys, credentials, any kind of things that you want to prevent from being committed. You will be able to do that. Some of these tools are over here, Git-secrets, Git-Hound, TruffleHog. All of these tools are able to actually monitor your commits in real-time, but nevertheless, it’s also important to scan what your history already contains, because maybe your repositories already contain a lot of secrets out there. You should actually actively scan the history that you already have to prevent from all secrets to be leaked.

Tools for Continuous Integration

Now, let’s assume that we got our source code secured. We got our version control secured. We move on to our continuous integration tools, which pull the code from our version control and use it for various things. You can see some of these tools on the screen right now, and I will cover some of the most common problems with these tools and how I uncovered a lot of these instances exposed and a lot of companies being in a very dangerous situation.

The first thing is CVEs. For those of you who don’t know a CVE is, it’s an ID for vulnerability, a known vulnerability. For instance, once I uncover a vulnerability in one of your products, I’m going to write an email to an organization which is called MITRE, which is funded by the Homeland Security of the U.S.A. and is providing a database for known vulnerabilities. Once I’m reporting to them, they are giving me back a CVE-ID, which now identifies this vulnerability. Anybody with this ID can go online, put it in Google or some any other search engine, and actually find exploits for it, find very detailed technical analysis of these problems, and then utilize that data to go further and exploit instances that are still vulnerable to this kind of vulnerability.

These tools have over 300 known vulnerabilities as of today. Almost 60% of these vulnerabilities were uncovered this year alone. You saw these tools that I showed you, you know for a fact that they were not released yesterday. You know for a fact that they were not released in 2018. That tells you two things: one is that these tools only now are actually gaining traction from security researchers. We were not interested in these tools before. This is why you see vulnerabilities in them right now. These vulnerabilities were there before. Someone with the knowledge about these vulnerabilities could hack you in any moment. The other thing that I saw personally, which is very worrying for me, is that people or companies tend to not have a great system, which leads them to expose their instances with various known vulnerabilities.

A CVE is your mark to upgrade your product. I know that sometimes when you upgrade a product, it may break something, maybe something will not work anymore, but I urge you, you should do that. If you are actually leaving your instances outdated, you’re just basically opening your door and letting anybody to pass.

The other thing that I saw in these tools are secrets. I mentioned that your secrets may leak to your version control system, but if they are not leaking through a version control system, they might leak through your CI. The CI tools are working in the same manner of the version control systems, and they are also collecting logs about your build processes, about what credentials you are using, so anybody with access to your build history will be able to uncover your credentials that were used throughout your CI processes. We saw numerous instances that provided anonymous access to the logs of the system from which you could pick out these credentials and basically log into other components of the software supply chain.

Specifically on Jenkins, now it’s much more secure than it was two years ago, but still, anybody with an access to create a job on Jenkins will be able to uncover all of the secrets that the Jenkins actually stores. You should really think about what access you’re granting to what users, because we saw a lot of instances with users granted over-permissive permissions. In some cases, personnel who really didn’t need the access, got the access to build stuff or read stuff that they’re not supposed to read.

The last thing that we saw is misconfigurations throughout these tools, and we saw tens of thousands of them throughout the internet. The most common problem that I saw is basically anonymous access. These tools are providing anonymous by default, most of them, and you should actively disable this. What happens most of the time is that people forget to disable this, and they are leaving it out to the open. Maybe they think that their IP is secure information or something like that, but let me just emphasize how unsecure an IP address is.

Let’s take for an instance that you are spinning up some instance on the cloud to test your product, and you’re giving it maybe five hours of lifetime. You are thinking, “These five hours, nobody is going to know this IP address, nobody is going to hack me.” In reality, hackers are not looking for you. They are not looking for your IP address. What they are doing is picking up a product, for example, Jenkins. They are studying this product, and they are uncovering which port this product is listening on. The understand what vulnerabilities this product contains, and then they are scanning the whole internet for the product. They are looking throughout the whole range of the IPv4 for this specific port that responds with this specific header that contains Jenkins, for example. If they are scanning the internet in that window of the five hours, they will hit your IP address and they will hack your instance in less than 10 minutes, because everything is automated today.

The other thing is that instances are, in fact, publicly exposed. If you really don’t need an instance to be publicly accessible, block it. Hide it behind a VPN. Don’t let unknown users to actually be able to access it. There is no need for that. You are just opening another door for people to attack you. As I said before, over-permissive privileges are given. You should really consider what you are giving to your users. If you really don’t need something, then take it away from the user.

Another huge problem in these products is that authentication is not enforced throughout the whole components of the product. When we’re talking about these products, it’s not just the web UI that this product has. Sometimes there is an API, sometimes there are even more than one APIs. Sometimes they expose four different ports with four different APIs, and they provide authentication only to their web interface. You should actively check if any product that you use is listening on anything other than the web interface and if, in fact, there is authentication implemented throughout this interface, because a lot of them do not that and it provides a full sense of security to you.

All you have to do is a few steps. Hide your instance behind a VPN if you don’t really need public access. Regularly update and upgrade your systems to actually eliminate the danger of CVEs. Then you’ll only be exposed to unknown vulnerabilities. Not to say that this is much more secure, but the people with the knowledge of the unknown vulnerabilities are a very small group. You will cut like 90% of your attack surface just by updating your products. Try to avoid using secrets throughout your CI projects. Try to avoid using secrets in your build systems. What I would advise is pushing the use of secrets to the latest step, which is actually the runtime. There are tools that can help you with that. I’m not in liberty to discuss which tools, but you can actually look it up. Limit the access scope. Last but not least, you should actually expect to be hacked. You should actually actively monitor your hash and know that there is danger. You should actually understand that.

Universal Package Managers

We covered our CIs and we secured them, and we move on to our universal package managers. My universal package managers consist of various tools. Some of them are running on-prem, some of them are running on the cloud as a managed service. I’m concentrating on the tools that are running on-prem. I didn’t have the opportunity to actually test the managed services. What I have found in these tools is that authentication is still the Achilles heel of these tools. Most of these tools are providing dangerously over-permissive default settings, and I mean really dangerous. For example, a lot of these tools are providing default accounts. Some of these tools provide hidden default accounts that you are not aware of. You are going to the UI of the product, and you are looking to the user list, and you are deleting the default accounts. Let’s say that you have deleted all the default accounts that you see, but in reality, the database of the product is still storing an internal user that someone with the knowledge will be able to log in into your product.

Ask yourself, is there a universal package manager tool that you are using that is publicly accessible today? Yes. Do you have the chance to change the default accounts? No. Maybe just enabled an SSO and you thought that you are safe now, but in reality, it’s not true. If this is the situation, you should consider your artifacts to be infected. You should consider your artifacts to be stolen. You should really check your logs in the system and see if there happened to be an unknown access to these systems. This actually happens a lot today, for these tools alone, I have found 25,000 instances exposed to the public. Not to say that there were bad decisions made throughout these products, for the users of this product, because a lot of them actually enabled SSO. These products didn’t provide the adequate security once you enable SSO, because you could actually just bypass the SSO if you would know the default accounts and just fire up direct requests to the APIs of these products. The SSO is no longer providing any security at all, in that case.

Publishing Infrastructures

We move on to our publishing infrastructure, which is actually the final stage where our artifacts are getting into. You can see a lot of these products on the screen: Docker, Kubernetes, and so on, the public cloud that we’re using.

In 2017 I published a paper about Docker and escaping Docker containers. It was the first paper that actually showed that it is possible to escape from a Docker container by utilizing the kernel vulnerability in the underlying Linux kernel. From that point on, we uncovered at least a dozen of other vulnerabilities that you can actually utilize in order to escape from the Docker container, be it in the Linux kernel or actually vulnerabilities in the Docker engine itself.

In these products, or in these infrastructures, I would like you to think about a few specific points. The first one is who’s actually listening on these products, and my example is, again, Docker. If you are familiar with Docker, you are aware that there is Docker daemon to the Docker. This is actually listening on UNIX socket today, and you can actually execute Docker commands through that socket. Before the UNIX socket or for the first four years of the Docker product lifetime, it was listening on all available interfaces throughout your system. It would listen on any possible IP that your system provides. In addition to that, it didn’t require any kind of authentication to that socket. Anyone who uncovered that the Docker daemon is actually listening on port 2935, if I’m right, would be able to just execute any kind of a container in your Docker environment, and I mean any kind, including privilege containers, and so on, which later can lead to a full host takeover.

The thing is that Docker is secured today from that point of view. They changed the listening interface to listen only on the UNIX socket, which is only to the local interface. Other products are still happily listening on all available interfaces, and some of the products are consisted or built from various tools or various components. You might have a product from this list that is built from 13 other products. For example, there is maybe some kind of interface to execute containers, then there is an interface to monitor these containers, and there is another product to actually provide logging for these systems, and so on. You would expect these products to be secure from any kind of angle. You would expect the logging to be secure. You would expect the execution environment to be secure.

In reality, they provide security only to their main component, and they’re forgetting about all the other components that they have throughout the product. Some of these products are having a situation where you actually have the main window or the main UI secure. You can enable SSO or you can close down the API, and everything like that, but six other components are exposing an unauthenticated API that you can actually go and execute code through. What it leads to is a situation where companies are actually taking this product, implementing it throughout their systems, and thinking that they are secure, because the main actual component is secure, and this is the component that you are actually interacting with. You’re managing everything throughout that component, so you’re actually forgetting about all the other components.

In reality, all the other components are still listening on all available interfaces without authentication. I can actually look for this secure component in order to find your instances. Then, I will just need to know about all the other API ports that are actually exposed without authentication. There are thousands of these instances online. You can actually check that. It’s very easy to find instances like that. Even a seven-year-old would be able to hack these instances, and I’m serious.

The other point is, who’s actually talking throughout these environments? By default, when you, for example, execute two Docker containers, they can communicate with each other. Sometimes you actually need that. Sometimes you actually have containers communicating with each other, passing messages. If you are actually doing that, you should make sure that this communication is secure, because if not, you may actually leak a lot of sensitive information. I might take over one of your containers, then just listen who he’s talking to. Just listen over the interface of Docker and I will get a lot of other information that will actually provide me with the ability to gain latter movement throughout your container environment and maybe break into other containers and later on take over maybe the whole Kubernetes cluster. It depends on the information that is actually leaking throughout these communications.

Another thing is that metadata throughout these instances is leaking actively sensitive information, and you should actually check that out. It will be very interesting to take a look, for example, spin up an instance in the cloud and just execute tcpdump and see what is going on over there. You will see that even if you are not exposing any application, if you are not executing any application, there will still be a lot of communication going on. This communication is actually leaking information that is related to your clusters, related to your instances. This information, again, provides me with the ability to collect it and later on use it to move from one exposed instance that I have hacked throughout your whole cluster. What you should really do is explicitly allow the communications that you need. Block everything, rest. If you don’t need it, just block it.

The last thing about these tools is who’s actually accessing. The most common hacking activity in the cloud today is crypto-mining. That is because of the power of the cloud, because the hackers are aware that when you spin up an instance, it is running on a cluster that can run maybe 1,000 more instances like that. They’ll take over a Kubernetes cluster and they will spin a service that will spin 500 instances of a miner. They will just do that today instead of hacking and gaining other possibilities, because this actually provides a very ripe ground to collect money from you without you realizing that.

Another thing that we noticed is that some attackers are actually more intelligent than just spinning up miners. For example, we had an instance that we used for testing that we exposed over the internet, and this instance caught a miner after four hours of being live. Once we noticed that, we actually tried to execute various tools and see what is going to happen. The attacker noticed our activity probably by analyzing the CPU utilization. Once he noticed that, he dropped the number of instances that he used for mining in order for us to not pay attention for the CPU utilization. This is something that is very interesting. They’re actually thinking about how are you looking at the system, how are you using it, and how are you going to uncover if you are hacked or not.

Another thing about who is accessing is the configurations of the systems. One tricky configuration that we have seen is a configuration in AWS, specifically. It is about authenticated users. There is a group that is called Authenticated Users in AWS. This group represents any AWS user, not just your organization users, but any AWS authenticated user. If you are granting any access to this specific group, it means that I am able now to go open a new AWS account without any relation to your organization and log into your resources and use them. You should really pay attention to that. Today, it is documented in the docs of the AWS, but if we go about a year and a half ago, it was not documented. A lot of people actually used that, because the name is suggestive of security. We saw a lot of instances that are actually exposed to that. People could actually just open accounts and log into other systems.

What you should actually do to avoid these situations is to actively monitor your clusters, actively monitor your execution environments, see what is getting executed, what services are running on. Most of the time, you will see a new service created. You will see a new service created for your cluster that will spin out the miner instances on your cluster. Most of the time, they will use the service as against to executing a single container, because a single container is not much for mining bitcoin, but 500 containers, that’s a nice number.

These cloud providers are actually providing you with the ability to monitor CPU utilization so you can set up alerts on CPU utilization. If you are sure that your product is never utilizing more than, let’s say, 40% of the CPU, then set up an alert that will alert you once the CPU is raised to 60%, 70%. Anything that is unusual for your environment, it will be able to provide you the ability to stop the attack from going further, maybe stopping at the right time before it actually utilizing your resources and gaining money from you.

What to Ask Yourself

We have covered the components of the software supply chain, and I would like you to leave this talk with a few questions on your mind. The first question is, can anyone access anything, and I mean anything? You should know what is anything, because some of you probably are not aware of what components are actually employed throughout your tools that you’re using. You should really, first of all, check what are these interfaces that are actually exposed and then check if anyone can access them without authentication at all. We saw too many instances that were exposed in that way without any authentication, without anything that would actually provide any kind of security to them.

Another thing is, you should check what permissions are you granting to your users, to your testers, to your developers. You should use the least privilege principle. If someone is not in need of a specific privilege, you should not grant it. There is no need for that. I would also advise to actually categorize your groups of privileges. For example, one privilege is for devs, one privilege is for QA personnel.

You should check out if there is any sensitive information in your commit history, build history, or any kind of a log system that you’re exposing. I have talked about version control systems and CI tools, but there are other components that may be exposing log information and any other information that may be sensitive. You should actually scan this and look for this stuff.

You should check how and where your secrets are stored, because we have seen a lot of instances of secrets being stored in cleartext or maybe Base64 encoding, which is not secure. You should really try to utilize some kind of a tool, such as a vault or maybe Google secret management, any kind of a tool that will be able to provide you with a secure environment to manage these secrets. Do not manage them by yourself, because you probably will do it the wrong way.

The last but not least is, is there any internet-accessible component throughout your whole infrastructure? You should really go and check that becaus this is the most common problem today. I see some badges over here, and I can tell you that some of the badges, some of the companies are actually names that I have encountered throughout my research, so I would advise you to look into that.

Questions and Answers

Participant 1: I have a question regarding the fuzzing. [inaudible 00:42:46] Will it be on the integration side or will it be on the unit testing side?

Shapira: In the testing side.

Participant 1: When we accept our build, and when our build passes all the checks and our flow steps, and then when we’re going to the integration level, when we are accepting our product, the fuzzing, is it right that we have to introduce or into the level one, where we are doing our unit testing?

Shapira: I would introduce the fuzzing where we’re doing the unit testing or right after the unit testing that you perform. That will provide kind of a unified methodology that you’ll use and not push the testing in different phases.

Participant 1: My next question is relating to circle. When we are introducing session tokens for our builds, is it secure?

Shapira: Do they expire?

Participant 1: They do expire, but we have to renew them every time they got expired. We do have jobs running which can tell us our token is expired or not, but when we are in our internal company resources, when we are building our product and releasing our product, we do have session tokens in our circle steps, and then we are encrypting it on our own product level. Is this the right way to do?

Shapira: It’s a little bit of a tricky question. Once you use tokens, you still need to make sure that these tokens are secure. The thing with tokens is that, actually, if you put to a short living token a very short expiry date, then it will provide you a much safer token. For example, I talked about exposing secrets in log systems and stuff like that. You could be exposing tokens throughout these logs, but the timing of the exposure is very sensitive in that case. Maybe I will need to actually actively sit on your system and monitor once you are executing a build, let’s say, and this token is actually getting used. Then I will need to rush with the token to actually use that to hack you. If you are providing less than, let’s say, five minutes for the token, I will not have enough time to actually utilize it in an effective manner for myself.

Participant 1: TTL really matters here.

Shapira: Yes.

Participant 1: Got it. My last question is, when we are introducing our instances, security group is the right option for everything. When you say the authenticated group, I know it’s a default pool, and people use it very often. When you’re going into the enterprise level, we care about our security groups. Are the security groups the current solution of making things secure?

Shapira: Yes.

Participant 1: There is no other way as for now where we can secure our instances.

Shapira: There are more other ways you can go actually and be more explicit about it, I think.

Participant 1: I’m not going for a cloud specific, I’m just talking about multi-cloud specific.

Shapira. Yes, I understand. You could be more specific with your configurations and not just depend on the defaults that are provided, whether they are secure or not.

Participant 2: One of the challenges that we have in our company is transitive dependencies. Developers just like seeing [inaudible 00:46:22], seeing an MPM package, whatever package, and that package comes with other dependencies. All those things get introduced into the supply chain. Would you have a word of advice for that or some sort of guidance?

Shapira: Sure. I didn’t cover package vulnerability management, because I felt like this topic has been covered a lot already. To answer your question, there are various tools that are able to actually scan the packages that you’re using, both free tools and paid. There are various options that you could utilize for that.

Participant 2: We do use those tools. You talked about upgrade [inaudible 00:47:08]. It’s not that easy.

Shapira: Yes, but it is out of your control in that case. All you can do is actually hope for the maintainers to upgrade and fix that in the end for you. What we do in that case is actively talk with the maintainers and provide them guidelines how to fix the problems, and so on. There are cases that you’re not able to do anything about that. I can provide you with information that these vulnerabilities throughout the packages that you are actually utilizing, most of the time, will not lead to your product being actually vulnerable. Most of the time, these vulnerabilities are very dependent on how you actually utilize this package. We saw throughout our research that, most of the time, the usage is ok.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.