MMS • Wavell Watson
Article originally posted on InfoQ. Visit InfoQ
Key Takeaways
- Cloud native networks are not SDN reborn but a fundamentally different way to look at networks.
- While SDN seemed to take physical network machines and virtualized them, CNFs are not merely containerized network virtual machines. The need to split network functions into services is one difference here.
- CNFs are network functionality that live somewhere on the stack of the OSI model (and the lower in the stack, the more difficult the implementation seems to be), which are implemented using cloud native practices.
- While SDN dataplanes (here we are talking about what forwards packets) resided on the hardware ASIC or in a virtualized box with traditional kernel network forwarding, CNFs explore user plane forwarding or the newer eBPF datapath forwarding.
- In the cloud native data center, there is a bias toward layer 3 solutions, but a big driver for CNFs is the Telecom service providers which often drop down to layer 2 functionality.
Of the three types of cloud resources (compute, storage, and network), the network seems to be the most resistant to cloud native non-functional requirements. Compute elasticity, for instance, is reasonably allocated with virtual machines, containers, and orchestrators and managed with CI/CD pipelines. Network elasticity seems to be lacking in implementation. In this article, we show that cloud native network functions are an attempt to bring network applications into the cloud native world. But just what are CNFs exactly, and why are they important?
SDN Reborn? Haven’t we tried this before?
Software-defined networks (SDN) were and are an attempt to automate the provisioning of networks. Cloud native network functions are not SDN reborn but a fundamentally different way to look at network provisioning. In one sense, cloud native network functions are like SDN in that they are software-based and not hardware-based solutions. But cloud native networks have an entirely new set of non-functional requirements separate from SDN. Cloud native non-functional requirements prioritize elasticity and by extension, automation 1, a lot more than SDN. The implementation of this requirement leans on declarative configuration. In other words, cloud native configuration should prefer saying “what” it wants done, not “how” it wants it done. For example, one implication of declarative configuration for networks would be the prohibition of hard-coded IP addresses. Declarative configuration allows for the whole system to be self-healing2 because it makes it easier to read and respond to what the system should look like. The system can then be made to continuously correct itself. Other non-functional requirements of cloud native systems are resilience and availability but implemented with scale-out redundancy instead of scale-up techniques. Cloud native systems try to address reliability by having the subcomponents have higher availability through higher serviceability and redundancy. For example, in the cloud native world having a top-level component with multiple redundant subcomponents where several components are available but a few have failures, is more reliable than a single tightly coupled but “highly reliable” component3.
Beyond Virtualized Network Boxes
There is a sense in which a “network function” is not decoupled. Virtual network functions (VNFs) started as the virtualization of network hardware. VNFs had a one-to-one correspondence of hardware to virtualized hardware, down to the network card, application-specific integrated circuit (ASIC), or a whole switch. While SDN seems to take physical network machines and virtualize them, CNFs are not merely containerized network virtual machines. CNFs are about decoupling network functionality even further. CNFs group networking functionally into components that have similar rates of change, based on the release cycle of an agile product team, which moves away from the large release cycle of big companies. Software that is released by a product team4 could be thought of as a “thick” definition of microservices. A “thin” definition of a microservice would be software delivered as a single process type5 inside of a container. By following developing software as a product team, we find that the thick microservices often look like thin microservices in practice.
Orchestrators have emerged to help manage microservices. Orchestrators are in charge of the scheduling, starting, stopping, and monitoring, (the lifecycle) of the microservice. There are many orchestrators, with Kubernetes (K8s) being the most popular, but there are also domain-specific orchestrators, such as those in the telecommunications domain. One of the early commitments of the cloud native ecosystem was to keep the orchestrator, K8s, from being “balkanized.” This was done by having an official K8s certification, maintained by the CNCF, which makes sure that any forked version of K8s will support the APIs the community mandates, and best practices.
What exactly is a Cloud Native Network Function?
A cloud native network function is functionality that lives somewhere on the OSI6 stack that has been brought down the cloud native trail map. The lower down the stack the CNF is, the more difficult a good cloud native implementation seems to be. This may be because the networking needs to be integrated with the orchestrator and underlying host while retaining its cloud native properties. It also may be because separating previous network functionality, such as that of the forwarding plane, into a shared-nothing process model7 from a shared memory/threading model reduces performance when not done carefully.
To understand the impact of decoupling network functionality, it helps to know a little bit about the reasoning behind network layers. The development of the OSI layers allowed for network innovation to occur while keeping interoperability between layers up and down the stack. At the network layer, the IP protocol ended up being a big winner. At the data link layer, ARP emerged. Multiple vendors iterate at the protocol level within each layer, creating new protocols and new implementations of protocols. Cloud native network functions have the opportunity of being implemented as a protocol within a library, within a microservice, or even being implemented as a group of microservices within a network application.
Ed Warnicke of the Network Service Mesh project once stated that for network services the “packet *is* the payload.” This means that network applications or services actually operate on (transform, route, or analyze) the network packet or frame. Here are some examples of network functionality at the various layers of the OSI model:
- Layer 7: CoreDNS
- Layer 6: NFF packet inspector
- Layer 5: Rsocket
- Layers 4 and 3: Envoy/Network Service Mesh/Various CNI plugins
- Layer 2: VPP-based VSwitch
For cloud native network applications, or higher order cloud native network functions that span multiple layers, some examples are the 5G Converged Charging System by MATRIXX Software and the BGP server by PANTHEON.tech use cases.
The cloud native trail map describes somewhat of a maturity of cloud native applications. Things get more complicated when we dig into one of the stops on the road to cloud nativeness, as is the case with networking, policies, and security. This is to say that there is a cloud native reflexiveness within the tools that help you to be cloud native. When applying this to cloud native network functions, we end up having to implement the network function just like any other cloud native application. A summary of this is as follows:
- The first step starts with coarse-grained deployments, usually implemented as containers.
- The second step is having the service or application deployable in a CI/CD pipeline with stateless and declarative configuration.
- The third step is to support an orchestrator (e.g., K8s) deployed on homogenous nodes which manages the lifecycle of the service.
- The fourth step ensures that the network function has telemetry, this includes metrics (e.g., open metrics-compatible Prometheus), tracing (e.g., open tracing compatible Jaeger), and event stream compatible logging (e.g., Fluentd).
- The fifth step of cloud native maturity, service discovery, allows the network service to be discovered by other consumers inside or even outside of the cluster.
- In order to facilitate declarative configuration, the sixth step outlines the importance of policies, especially network and security policies, as being applicable and supported through the service.
- The seventh step is distributed storage, applicable where stateful workloads are used, to ensure compatibility with cloud native environments.
- Cloud native messaging, registries, runtimes, and software distribution are other stages of cloud native maturity that round out an application’s journey.
The CNF Dataplane
With CNFs, the dataplane 8 (also known as the forwarding plane) moves even further away from traditional hardware. Since cloud native principles value scaling out instead of scaling up, this means that having more homogeneous commodity nodes is preferred over having fewer heterogeneous and specialized nodes. Because of this, there is a disaggregation movement that uses commodity servers in place of the application-specific integrated circuits (ASICs) of a specialized network switch. One benefit of this is the emergence of dataplanes that support a more agile rate of change. While SDN dataplanes (here we are talking about what literally forwards packets) resided on the hardware ASIC or in a virtualized box with traditional kernel network forwarding, CNFs have begun to explore technologies like user dataplanes (e.g., VPP), extended Berkeley packet filters (eBPF) with the eXpress Data Path (XDP), and SmartNIC forwarding.
Layer 3 Ascension
In the cloud native data center, there is a bias toward layer 3 solutions. Being able to declaratively specify and automate the configuration of layer 3 networks has been a determining factor in the development of the Kubernetes networking model. These new cloud native networks rely on IP addresses to connect the cluster’s nodes and applications, not layer 2 MACs and VLANs. However, this is mainly the networking story of the orchestrator and its applications. The data center has multiple moving parts, with different rates of change in this story. These three layers could be described as below the orchestrator (with network operating systems like SONIC, provisioning tools like Terraform), within the orchestrator (e.g., Kubernetes) itself, and above the orchestrator but within containers (e.g., CNFs). The network infrastructure fabric below the orchestrator, such as a (possibly disaggregated) top-of-rack switch in the data center, continues to have layer 2 configuration. The telecom space, a big driver for the adoption of CNFs, also continues to have layer 2 use cases that can’t be avoided, such as Multiprotocol Label Switching (MPLS). The story for the layer 2 fabric is still being written with new implementations of switching software, such as SONiC.
Conclusion
The configuration, deployment, and automation of networks are some of the reasons why elasticity, a staple of cloud native environments, is hard to achieve. It can be the deciding factor for moving to a hyperscaler, such as Amazon, even when a more customized deployment is warranted. This is particularly relevant to the telco space because they have custom network protocols they may want to support for their enterprise customers (e.g., MPLS). Cloud native network functions address these deployment concerns by decoupling network functionality based on the rate of change, down to the coarse-grained image and process (e.g., container) level. This avoids the traditional deployment-in-lockstep problems that networks are prone to have.
CNFs are network functionality, which is functionality that is traditionally thought of as being located on the OSI stack, implemented following cloud native practices which is coupled with the cloud native ecosystem. Networks, and especially telecommunication networks, have a long history of non-functional requirements, such as resilience. Telecommunication service providers use the example of a 911 call as a mission-critical system that demands extreme resilience and availability. Even so, the cloud native ecosystem has non-functional attributes that have gained the attention of service providers. These attributes, such as availability (the cloud native type), ease of deployment, and elasticity, have driven telecommunication service providers to put pressure on the telecommunication equipment vendors (both physical and software) to be more cloud native. This requires that these new network components follow cloud native infrastructure best practices in order to become mature solutions within the cloud native ecosystem. This is not easy, as it is exceedingly difficult to take traditionally tightly coupled components that have demanding performance requirements, such as a networking dataplane, and decouple them.
Dataplanes in the CNFs space are a work in progress and have many solutions. The mere concept of dataplanes complicates the understanding of CNFs, given that CNFs are not just a virtualized representation of a physical box. At a trivial level, networking in a cloud native data center could avoid this complication by concentrating on default kernel networking and layer 3 IP4/IP6 networking. This is often not feasible for telco use cases or the implementation of network fabric. These problems are part of the natural progression of decoupling network software, so there isn’t a way to avoid them. CNFs done right promise a new level of deployability, elasticity, ease of configuration, and resilience not previously realized.
To learn more about cloud native network functions, join the CNCF’s cloud native network function working group. For information on CNCF’s CNF certification program.
1. “Cloud native is about autonomous systems that do not require humans to make decisions. It still uses automation, but only after deciding the action needed. Only when the system cannot automatically determine the right thing to do should it notify a human.” Garrison, Justin; Nova, Kris. Cloud Native Infrastructure: Patterns for Scalable Infrastructure and Applications in a Dynamic Environment. O’Reilly Media. Kindle Edition.
2. “A self-healing infrastructure is an inherently smart deployment that is automated to respond to known and common failures. Depending on the failure, the architecture is inherently resilient and takes appropriate measures to remediate the error.” Laszewski, Tom. Cloud Native Architectures: Design high-availability and cost-effective applications for the cloud (pp. 131-132). Packt Publishing. Kindle Edition.
3. “Intuitively it may seem like a system can only be as reliable as its least reliable component (its weakest link). This is not the case: in fact, it is an old idea in computing to construct a more reliable system from a less reliable underlying base.” Kleppmann, Martin. Designing Data-Intensive Applications. O’Reilly Media. Kindle Edition
4. Cross-functional teams put all of the people responsible for building and running an aspect of a system together. This may include testers, project managers, analysts, and a commercial or product owner, as well as different types of engineers. These teams should be small; Amazon uses the term “two-pizza teams,” meaning the team is small enough that two pizzas are enough to feed everyone. The advantage of this approach is that people are dedicated to a single, focused service or small set of services, avoiding the need to multitask between projects. Teams formed of a consistent set of people work far more effectively than those whose membership changes from day to day. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 6457-6462). O’Reilly Media. Kindle Edition.
5. “The best way to think of a container is as a method to package a service, application, or job. It’s an RPM on steroids, taking the application and adding in its dependencies, as well as providing a standard way for its host system to manage its runtime environment. Rather than a single container running multiple processes, aim for multiple containers, each running one process. These processes then become independent, loosely coupled entities. This makes containers a nice match for microservice application architectures.” Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1708-1711). O’Reilly Media. Kindle Edition
6. In an effort to minimize proprietary solutions, to create an open market in network systems, and to enable management of communications complexity, the International Organization for Standardization (ISO) has developed a reference model for open communications [ 78 ]. This reference model, called the ISO Open Systems Interconnection (OSI) Reference Model, proposes an abstract and layered model of networking. Specifically, it defines seven layers of abstraction and the functionality of each layer. However, it does not define specific protocols that must be used at every layer, but gives the concepts of service and protocol that correspond to each layer. Serpanos, Dimitrios,Wolf, Tilman. Architecture of Network Systems (The Morgan Kaufmann Series in Computer Architecture and Design) (p. 11). Elsevier Science. Kindle Edition.
7. “Processes do not share memory, and instead communicate with each other through message passing. Messages are copied from the stack of the sending process to the heap of the receiving one. As processes execute concurrently in separate memory spaces, these memory spaces can be garbage collected separately, giving Erlang programs very predictable soft real-time properties, even under sustained heavy loads. […] Processes fail when exceptions occur, but because there is no shared memory, failure can often be isolated as the processes were working on standalone tasks. This allows other processes working on unrelated or unaffected tasks to continue executing and the program as a whole to recover on its own.“ Cesarini, Francesco, and Vinoski, Steve. Designing for Scalability with Erlang/OTP: Implement Robust, Fault-Tolerant Systems (p. 29). O’Reilly Media. Kindle Edition.
8. The data plane of a router implements a sequence of operations that are performed for typical network traffic. As discussed earlier, these steps include IP processing of the arriving packet, transmission through the switch fabric to the output port, and scheduling for outgoing transmission. One of the key operations in the dataplane is to determine to which output port to send the packet. This process is known as route lookup […] Serpanos, Dimitrios, and Wolf, Tilman. Architecture of Network Systems (The Morgan Kaufmann Series in Computer Architecture and Design) (p. 117). Elsevier Science. Kindle Edition.