Month: March 2020
MMS • Anthony Alford
Article originally posted on InfoQ. Visit InfoQ
Google Cloud Platform (GCP) recently announced the beta launch of Cloud AI Platform Pipelines, a new product for automating and managing machine learning (ML) workflows, which leverages the open-source technologies TensorFlow Extended (TFX) and Kubeflow Pipelines (KFP).
By Anthony Alford
MMS • Steef-Jan Wiggers
Article originally posted on InfoQ. Visit InfoQ
Recently, Amazon announced, a new Linux-based open-source operating system (OS) called Bottlerocket, which is purpose-built to run containers. Bottlerocket is currently in public preview as an Amazon Machine Image (AMI) for Amazon Elastic Compute Cloud (EC2) for customers to try out.
The tech giant designed and optimized Bottlerocket specifically for use as a container host, and it comes with a single-step update mechanism. Furthermore, Bottlerocket only includes essential software to run containers. Jeff Barr, chief evangelist for AWS, stated in a blog post on Bottlerocket:
Bottlerocket reflects much of what we have learned over the years. It includes only the packages that are needed to make it a great container host, and integrates with existing container orchestrators. It supports Docker image and images that conform to the Open Container Initiative (OCI) image format.
Source: https://aws.amazon.com/bottlerocket/
Bottlerocket comes with several benefits for its users:
- Higher uptime with lower operational cost and management complexity, as the OS has a smaller resource footprint, boot times, and security attack surface compared to general-purpose OSes.
- Improved security from automatic OS updates as it uses a simple, image-based model that allows for a rapid and complete rollback if necessary.
- An Open source and universal availability since it provides an open development model – enabling customers, partners, and others to make code and design changes to Bottlerocket. The code is currently available on GitHub repro.
- Premium Support as AWS-provided builds of Bottlerocket on Amazon EC2 are under the same AWS support plans covering AWS services such as Amazon EC2, Amazon EKS, and so on.
Amazon launched Bottlerocket in cooperation with several partners, including Alcide, Armory, CrowdStrike, Datadog, New Relic, Sysdig, Tigera, Trend Micro and Weaveworks. Chanwit Kaewkasi, DX engineer at Weaveworks, states in a recent company blog post:
Our Fork Clone Run model works nicely to enable GitOps on a Bottlerocket cluster. Bottlerocket OS simplifies and speeds up Kubernetes cluster creation, providing a seamless, secure GitOps user-experience.
Bottlerocket includes support for use with Amazon EKS, and according to the announcement, Amazon will soon support Amazon ECS. Furthermore, the tech giant is aiming to release Bottlerocket to the general public later this year.
Lastly, customers can start using Bottlerocket now by launching Amazon EC2 instances with the Bottlerocket AMI, and joining them to an Amazon EKS cluster following the QuickStart guide.
MMS • Martin Rixham
Article originally posted on InfoQ. Visit InfoQ
Pieces, a new JavaScript library I have created, takes these two problems of routing and page transitions and tackles them together. After all, they’re both concerned with what happens when the app changes from one view to another. The idea is that the developer creates the individual pages and let’s Pieces worry about everything involved in changing between them.
By Martin Rixham
MMS • Helen Beal
Article originally posted on InfoQ. Visit InfoQ
Ryan Landry, the senior director for TechOps at edge cloud platform, Fastly, has shared how network automation enables them to manage traffic peaks during popular live-streamed events such as the Super Bowl LIV.
Fastly are directly connected to numerous ISPs across the US and endeavour to keep their live video traffic on these direct paths with their partners to deliver video streams as close to the end-user as possible. However, when traffic demand increases, these interconnection points can become congested and impact quality. The live streaming viewers may experience performance issues such as video buffering or reduced stream quality as a result of packet loss. When users have a poor online experience, a majority of them will abandon the broadcast within a couple of minutes.
Fastly has built-in network automation (known internally as Auto Peer Slasher, or APS – underpinned by StackStorm) that activates when the interconnection points become congested and link utilisation is nearing full capacity. APS automatically diverts a small portion of traffic in order to keep the link under congestion thresholds. This traffic is then automatically rerouted via alternate best paths to the given ISP, typically via IP transit. With very large live streaming traffic, this can happen multiple times in a matter of minutes, causing the platform to shed traffic from interconnect partners to IP transit repeatedly. In most cases, the connection state is maintained, eliminating the need for the player to restart a session from scratch. Towards the end of a live event, when peak traffic declines, APS knows to unwind those actions and effectively reverts back to the starting position.
Link utilisation is one measure but it doesn’t necessarily highlight potential congestion deep inside certain backbones or ISP networks. Rates of loss and retransmissions are other measures that Fastly observe and take real-time action on using a technique they call Fast Path Failover (FPF). Their edge caches monitor the forward progress of individual end-user TCP flows. If the flow appears to stall via one given path, the cache triggers an automatic attempt to forward the flow via an alternate path, hoping to maintain a stable state and connection quality. When the amount of automatically diverted traffic exceeds the available capacity of alternate paths, or if FPF is unable to find uncongested alternate paths, Fastly makes a human-based decision about how to reroute traffic next.
Fastly have learned through experience that using an ‘all hands on deck’ approach to traffic engineering adds complexity. Whilst the network engineering team at Fastly is a lean and efficient group, they further reduce the number of engineers at the controls for major live events; on average to around twelve members. They break the geography into quadrants and assign a lead engineer to each. Each lead engineer is partnered with a co-pilot engineer who monitors alerts and thresholds and feeds information to their quadrant leader as necessary, while providing secondary validation and verification of changes made by the lead. When their automatic shifting of traffic from direct ISP links begins to reach upper limits of available point of presence (POP) capacity, the engineering pair works together to decide how and where to migrate traffic next, usually by altering Fastly’s border gateway protocol (BGP) anycast announcements or influencing end-user POP selection via their domain naming system (DNS) management platform.
The automation and systems run 24/7. During one recent major multi-day event, over a forty-eight hour period, the team observed APS performing a total of 349 actions against the network across the ten most active POPs and interconnect partners. While APS handles much of the heavy lifting, the team spends their time tuning the system and attending to other elements of the edge cloud platform’s performance. In February, 2020, APS carried out more than 2,900 automated actions across the global network in response to changing internet conditions, while the next closest on-call engineer carried just above 500.