Java News Roundup: JEPs for JDK 21, Spring Cloud AWS 3.0, OptaPlanner to Timefold

MMS Founder
MMS Michael Redlich

Article originally posted on InfoQ. Visit InfoQ

This week’s Java roundup for May 1st, 2023 features news from OpenJDK, JDK 21, Spring Boot 3.1.0-RC2, Spring Modulith 0.6, Spring Cloud for Amazon Web Services 3.0.0, Spring Cloud Data Flow 2.10.3, Spring Tools 4.18.2, Infinispan 14.0.9.Final, Open Liberty 23.0.0.4, Quarkus 3.0.2 and 2.16.7, Helidon 3.2.1, Apache Camel 4.0.0-M3, Arquillian 1.7.0 and OptaPlanner transitions to Timefold.

OpenJDK

JEP 448, Vector API (Sixth Incubator), has been promoted from Candidate to Proposed to Target for JDK 21. This JEP, under the auspices of Project Panama, incorporates enhancements in response to feedback from the previous five rounds of incubation: JEP 438, Vector API (Fifth Incubator), delivered in JDK 20; JEP 426, Vector API (Fourth Incubator), delivered in JDK 19; JEP 417, Vector API (Third Incubator), delivered in JDK 18; JEP 414, Vector API (Second Incubator), delivered in JDK 17; and JEP 338, Vector API (Incubator), delivered as an incubator module in JDK 16. This feature proposes to enhance the Vector API to load and store vectors to and from a MemorySegment as defined by JEP 424, Foreign Function & Memory API (Preview). The review is expected to conclude on May 9, 2023.

JEP 445, Unnamed Classes and Instance Main Methods (Preview), has been promoted from Candidate to Proposed to Target status for JDK 21. This feature JEP, formerly known as Flexible Main Methods and Anonymous Main Classes (Preview) and Implicit Classes and Enhanced Main Methods (Preview), proposes to “evolve the Java language so that students can write their first programs without needing to understand language features designed for large programs.” This JEP moves forward the September 2022 blog post, Paving the on-ramp, by Brian Goetz, Java language architect at Oracle. Gavin Bierman, consulting member of technical staff at Oracle, has published the first draft of the specification document for review by the Java community. The review is expected to conclude on May 12, 2023. InfoQ will follow up with a more detailed news story.

JEP 441, Pattern Matching for switch, has been promoted from Candidate to Proposed to Target for JDK 21. This JEP also finalizes this feature and incorporates enhancements in response to feedback from the previous four rounds of preview: JEP 433, Pattern Matching for switch (Fourth Preview), delivered in JDK 20; JEP 427, Pattern Matching for switch (Third Preview), delivered in JDK 19; JEP 420, Pattern Matching for switch (Second Preview), delivered in JDK 18; and JEP 406, Pattern Matching for switch (Preview), delivered in JDK 17. This feature enhances the language with pattern matching for switch expressions and statements. The review is expected to conclude on May 11, 2023. InfoQ will follow up with a more detailed news story.

JEP 440, Record Patterns, has been promoted from Candidate to Proposed to Target for JDK 21. This JEP finalizes this feature and incorporates enhancements in response to feedback from the previous two rounds of preview: JEP 432, Record Patterns (Second Preview), delivered in JDK 20; and JEP 405, Record Patterns (Preview), delivered in JDK 19. This feature enhances the language with record patterns to deconstruct record values. Record patterns may be used in conjunction with type patterns to “enable a powerful, declarative, and composable form of data navigation and processing.” Type patterns were recently extended for use in switch case labels via: JEP 420, Pattern Matching for switch (Second Preview), delivered in JDK 18, and JEP 406, Pattern Matching for switch (Preview), delivered in JDK 17. The most significant change from JEP 432 removed support for record patterns appearing in the header of an enhanced for statement. The review is expected to conclude on May 11, 2023. InfoQ will follow up with a more detailed news story.

JEP 439, Generational ZGC, has been promoted from Candidate to Proposed to Target for JDK 21. This JEP proposes to “improve application performance by extending the Z Garbage Collector (ZGC) to maintain separate generations for young and old objects. This will allow ZGC to collect young objects, which tend to die young, more frequently.” The review is expected to conclude on May 10, 2023. InfoQ will follow up with a more detailed news story.

JEP 404, Generational Shenandoah (Experimental), has been promoted from Candidate to Proposed to Target status for JDK 21. This JEP proposes to “enhance the Shenandoah garbage collector with generational collection capabilities to improve sustainable throughput, load-spike resilience, and memory utilization.” Compared to other garbage collectors, such as G1, CMS and Parallel, Shenandoah currently requires additional heap headroom and has a more difficult time recovering space occupied by unreachable objects. The review is expected to conclude on May 12, 2023. InfoQ will follow up with a more detailed news story.

JEP 450, Compact Object Headers (Experimental), has been promoted from its JEP Draft 8294992 to Candidate status. Under the auspices of Project Lilliput, the JEP draft proposes to reduce the size of Java object headers from 96 or 128 bits to 64 bits. Project Lilliput, created by Roman Kennke, principal engineer at Amazon Web Services, marked a milestone 1 in May 2022 by achieving 64-bit headers.

Daniel Smith, Programming Language Designer at Oracle, has announced that JEP 401, formerly known as Null-Restricted Value Object Store (Preview) and Primitive Classes (Preview), has been renamed to Flattened Heap Layouts for Value Objects. Smith has provided an updated specification document for review by the Java community.

JDK 21

Build 21 of the JDK 21 early-access builds was also made available this past week featuring updates from Build 20 that include fixes to various issues. Further details on this build may be found in the release notes.

For JDK 21, developers are encouraged to report bugs via the Java Bug Database.

Spring Framework

The second release candidate of Spring Boot 3.1.0 ships with new features such as: change the default shutdown in the DockerComposeProperties class to stop; automatically apply the TestcontainersLifecycleApplicationContextInitializer class for context tests; and the addition of Docker Compose service connection support for the SQL Server, Oracle Database, Liquibase, Flyway and Cassandra databases. There was also a deprecation of the Couchbase SSL keystore properties, spring.couchbase.env.ssl.key-store and spring.couchbase.env.ssl.key-store-password, in favor of SSL bundle support in Couchbase. More details on this release may be found in the release notes.

The release of Spring Modulith 0.6 delivers bug fixes, dependency upgrades and notable new features such as: auto-configuration for MongoDB transactions if the event publication registry is used; the event publication registry now enables asynchronous processing and shutdown behavior; the @EnableScenario annotation for using the Scenario Testing API with @SpringBootTest integration tests; and support for jMolecules architecture stereotypes in the Application Module Canvas. The Spring Modulith team has also decided to elevate this project into a top-level, non-experimental Spring project. The plan is to release a 1.0-M1 version after the GA release of Spring Boot 3.1. Further details on this release may be found in the release notes.

Version 3.0.0 of Spring Cloud for Amazon Web Services has been released with new features: compatibility with Spring Boot 3.0; built on the top of AWS SDK V2 for Java; a completely re-written SQS integration module; and a new integration of DynamoDB. More details on this release may be found in the release notes.

The release of Spring Cloud Data Flow 2.10.3 primarily addresses security issues in transitive dependencies such as: spring-security-oauth2-client-5.4.2; spring-expression-5.2.11; spring-webmvc-5.3.25; json-smart-2.3; and jettison-1.51. There were also dependency upgrades to Spring Boot 2.7.11 and Spring Cloud sub-projects. Further details on this release may be found in the release notes.

Spring Tools 4.18.1 has been released featuring enhancements such as: support for navigating to a Spring property file when inspecting on an @Value annotation; support for the @ConditionalOnProperty annotation in property navigation; and early access to Eclipse 2023-06 milestone builds. The Spring Tools team anticipates version 4.19.0 to be released in late June 2023. More details on this release may be found in the release notes.

Infinispan

Infinispan 14.0.9.Final has been released with notable changes such as: fix the failure of Infinispan third party integration tests with JDK17; document how to monitor cross-site replication; remove the dependency Jaeger test containers; and fix the port number in the properties file. Further details on this release may be found in the changelog.

Open Liberty

IBM has released Open Liberty 23.0.0.4 featuring: container images for the ARM64 architecture along with the existing AMD64, PPC64LE and S390X architectures; and a resolution for CVE-2023-24998, a vulnerability in Apache Commons FileUpload such that an attacker can trigger a denial-of-service with malicious uploads due to the number of processed request parts is not limited.

Quarkus

Quarkus 3.0.2.Final, the second maintenance release, ships with notable changes such as: rename the server-list file to hosts in the Infinispan Dev Services guide; Dev UI2 displaying the wrong Java version; the k3s flavor name is not properly documented in the Kubernetes Dev Services guide; and RESTEasy Reactive streaming resource methods leads to NoSuchMethodException exception in native mode. More details on this release may be found in the release notes.

Quarkus 2.16.7.Final has also been released featuring: a fix for the algorithm comparison bug in OIDC code loading the token decryption key; a minor update to the OIDC UserInfo class throwing NullPointerException if a string or boolean property with a given name does not exist; Quarkus dev mode not working with a certain type of project directory tree when using the @ApplicationScoped annotation; and throw an exception if the OIDC client fails to acquire a token. Further details on this release may be found in the release notes.

Helidon

Oracle has released Helidon 3.2.1 with new features such as: an enabled flag to the JpaExtension class to permit subsequent refactoring and replacement; integration changes with the MicroProfile Rest Client and Fault Tolerance specifications to handle async calls due to an issue with the default invocation context in the Weld specification; and support for different propagators with integration of Jaeger OpenTelemetry. More details on this release may be found in the release notes.

Apache Software Foundation

The third milestone release of Apache Camel 4.0.0 features bug fixes, dependency upgrades and improvements such as: change the default Micrometer meter names to follow the Micrometer naming conventions; support for Micrometer Observation; directly use the HTTP server in the implementation of Spring Boot; and add a listener for added/removed HTTP endpoints that make it easier for runtimes, such as Spring Boot, to use platform-http with Camel and its own HTTP server. Further details on this release may be found in the release notes.

Arquillian

Arquillian 1.7.0.Final has been released featuring: support for Jakarta Servlet 6.0; support for HTTPS in URLs injected with the @ArquillianResource annotation; and a fix for a NoClassDefFoundError exception from the LoggerFactory class when using TestNG 7.5+. More details on this release may be found in the changelog.

OptaPlanner Transitions to Timefold

OptaPlanner, an open source AI constraint solver for software developers, will transition to Timefold, a new planning optimization company created by Maarten Vandenbroucke, co-founder and CEO, and Geoffrey De Smet, co-founder and CTO. Created by De Smet while working at Red Hat as a senior principal software engineer, OptaPlanner has matured under the auspices of Red Hat by providing their own build of OptaPlanner. InfoQ will follow up with a more detailed news story.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


.NET Community Toolkit 8.2: MVVM Toolkit Attributes, Performance Enhancements, and More

MMS Founder
MMS Almir Vuk

Article originally posted on InfoQ. Visit InfoQ

Microsoft has released the latest version of its .NET Community Toolkit, version 8.2, with several new enhancements and features. According to the release, this new version brings performance enhancements both at runtime and in the MVVM Toolkit source generators, new code fixers aimed at boosting productivity, and a number of user-requested features.

The first notable addition to the MVVM Toolkit is the support for custom attributes with RelayCommand. This was suggested by users on GitHub and builds on the work done in the previous release. The new version leverages the native field: and property: C# syntax to indicate targets of custom attributes, giving users full control over attributes for all generated members when using RelayCommand to generate an MVVM command. This feature is particularly useful when using a view model that needs to support JSON serialization and requires the ability to explicitly ignore generated properties.

The original blog post provides the following code examples regarding the custom attributes with RelayCommand.

[RelayCommand]
[property: JsonIgnore]
private void DoWork()
{
    // other code.
}

As a result, the previous code sample will produce the following members in the runtime.

private RelayCommand? _doWorkCommand;

[JsonIgnore]
public IRelayCommand DoWorkCommand => _doWorkCommand ??= new RelayCommand(DoWork);

Furthermore, in the 8.2 release of the Toolkit, developers can now take advantage of two new property change hooks that have been added to the ObservableProperty fields. These new two hooks aim to address a common scenario in MVVM, where an observable property such as a “selected item” needs to be modified, requiring changes to both old and new instances.

Sergio Pedri, Software Engineer II, Microsoft Store client team and author of the original blog post states the following:

Previously, this was a scenario where using [ObservableProperty] wasn’t ideal, as it didn’t have the necessary infrastructure to easily inject such logic to perform the necessary state changes on the old and new values being set. To fix this, starting from the 8.2 release of the MVVM Toolkit there are two new property change hooks being generated for all [ObservableProperty] fields.

The blog post also provides detailed code examples regarding the usage of the ObservableProperty attribute and it is worth checking out.

By the usage of the ObservableProperty attribute, developers can ensure that the selected view model will always be correctly reported as being selected. The attribute now includes built-in support for this functionality, eliminating the need for fallback methods. The MVVM Toolkit is also optimized to automatically detect usage of this attribute, optimizing code generation. Additionally, the Roslyn compiler will remove calls to any methods that are not implemented.

According to an original blog post, the MVVM Toolkit has introduced new diagnostic analyzers in its latest release, which can detect and warn users when they access a field marked with the ObservableProperty attribute incorrectly, or when they declare a type with similar attributes without using inheritance. Moreover, the latest release of the toolkit also includes built-in code fixers for these two analyzers. As per the report, users can now easily fix their code by selecting the code fix suggested by the IntelliSense light bulb whenever the analyzers produce a warning. Additionally, the code fixers support bulk fixes, enabling users to rectify all their errors with just one click.

Regarding MVVM Toolkit, this version also brings some performance improvements to its source generators. The primary focus was on optimizing the incremental pipelines to minimize memory usage and ensure that no unnecessary objects would be kept alive across concurrent executions. Several pull requests were made, including moving two additional diagnostics to a diagnostic analyzer, which can run concurrently and out of the process. The another one is removing some Roslyn symbols from the incremental pipeline, and resolving necessary analyzer symbols early during the initial callback setup to speed up callback executions in each compilation instance.

Lastly, this release includes enhancements and fixes, such as resolving build errors in VB.NET projects and fixing forwarded double attribute parameters. The release now supports partial methods with RelayCommand and open generic types in ToTypeString. In addition, MemberNotNull is now emitted in ObservableProperty setters, and complete XML docs are available for all generated types and members for better understanding.

The complete list of changes made in this release can be viewed on the GitHub release page.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Vercel Announces New Storage and Security Offerings for the Egde

MMS Founder
MMS Steef-Jan Wiggers

Article originally posted on InfoQ. Visit InfoQ

Vercel recently announced a suite of serverless storage offerings for their cloud platform with Vercel KV, Postgres, and Blob, powered by the company’s infrastructure partners, Neon and Upstash. In addition, the company also launched  Vercel Secure Compute, Visual Editing, and Spaces.

The new storage offerings are intended to provide frontend developers building applications for the edge with fast and efficient access to data. In particular, these offerings are:

  • Vercel KV (Key-Value) is a serverless Redis solution powered by Upstash, allowing developers to create Redis-compatible databases that can be written to and read from Vercel’s Edge Network in regions they specify.
  • Vercel Postgres is a serverless SQL database built for the front end powered by Neon, providing developers with a fully managed, highly scalable, fault-tolerant database that delivers high performance and low latency for web applications.
  • Vercel Blob is a solution to upload and serve files at the edge currently in preview. The offering provides an API built entirely on top of web standards without the need to configure buckets or implement SDKs.

Source: https://vercel.com/blog/vercel-storage

In a company blog post, the authors explain the rationale behind these storage offerings:

As the world moves away from monolithic architectures to composable ones, there’s no shortage of options for backends and databases. But for new projects, the choice can still be paralyzing. In the spirit of being the end-to-end solution for building on the web, we are introducing solutions that are open, easy to use, and scale as efficiently as our frontends.

Furthermore, the company has additional offerings, which include:

  • Vercel Secure Compute allows developers to create private connections between serverless functions and protect their backend cloud to increase security, compliance, and privacy obligations on the Web. Accompanying this offering is Vercel Firewall.
  • Vercel Visual Editing resulted from a partnership with Sanity to introduce a new open standard for content source mapping for headless CMSs, which allows live visual editing of content directly on a website and provides a tunnel directly to the content’s source.
  • Vercel Spaces provides powerful tools and conventions designed to integrate with a developer’s monorepo setup to help scale efficiently while retaining quality.

Malte Ubl, CTO at Vercel, told InfoQ:

Developers need tools to build and deploy applications at scale. Our new offerings, which include Storage databases, Vercel Secure Compute, Live Editing, and Vercel Spaces, provide the frontend ecosystem with durable and seamless solutions allowing them to innovate faster. Our expanded suite of cloud-native tools ladders up to our vision for the Frontend Cloud, empowering developers to build, test and deploy high-quality web applications efficiently.

Various people responded to announcements from Vercel; Yehuda Fruchter, an internet entrepreneur and investor, tweeted:

Vercel is becoming a monolith.

With Guillermo Rauch, CEA at Vercel, responding:

Everything about the platform remains composable. In fact, today’s announcements are about precisely that: meeting you where you are backend-wise, securely.

Followed by Fruchter’s response:

Yeah, sure I get that, but practically-speaking this will ultimately be Vercel controlling everything end-to-end. I don’t think that’s a problem, frankly, as the industry needs one-player to “consolidate” all these microservices b/c there r too many options.

Lastly, the pricing details of the offering are available on the respective pricing pages: Vercel KV pricing page, Vercel Postgres pricing page,  Vercel blob pricing page, Vercel Secure Compute, Vercel Visual Editing, and Spaces via Enterprise plans.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Diving into AWS Databases: Amazon RDS and DynamoDB Explained – The New Stack

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

<meta name="x-tns-categories" content="Data / Software Development“><meta name="x-tns-authors" content="“>

Diving into AWS Databases: Amazon RDS and DynamoDB Explained – The New Stack

Modal Title

2023-05-05 12:30:17

Diving into AWS Databases: Amazon RDS and DynamoDB Explained

sponsor-mission-cloud,sponsored-post-contributed,



Data

/

Software Development

A look at the differences between these popular options, and between relational and nonrelational databases.


May 5th, 2023 12:30pm by


Featued image for: Diving into AWS Databases: Amazon RDS and DynamoDB Explained

The increasing demand for data processing and manipulation has led organizations to prioritize efficiency, security, scalability and availability in their data management strategies. Companies that leverage top tools in the market enjoy a higher chance of success, which is why Amazon Web Services (AWS) should be a top contender when considering database storage solutions.

AWS offers an extensive range of database services for various use cases. In this article, we’ll explore the primary differences between two popular options:

  1. Relational Database Service (RDS)
  2. DynamoDB

Before we delve into the specifics of each service, let’s first look at the distinctions between relational and non-relational databases.

Relational (SQL) vs. Nonrelational (NoSQL) Databases

Relational databases use predefined schemas and store data in rows and columns, resembling a spreadsheet. In contrast, nonrelational databases like DynamoDB feature dynamic schemas, are document-oriented and scale horizontally.

Relational (SQL) Databases

Relational databases employ Structured Query Language (SQL) for interaction and use predefined schemas. SQL is a widely recognized query language familiar to most database administrators (DBAs).

SQL databases contain tables with columns (attributes) and rows (records) and use keys with constrained logical relationships to maintain consistency and integrity. These databases can be scaled vertically by enhancing processing hardware power (for instance increasing RAM, CPU or solid-state disks).

Advantages of relational (SQL) databases:

  • Use of SQL language
  • Atomicity of database operations
  • Flexible query capabilities

Disadvantages of relational (SQL) databases:

  • Requires careful upfront design
  • Schema changes may cause downtime
  • Limited horizontal scalability

Nonrelational (NoSQL) Databases

NoSQL databases are nonrelational database management systems with dynamic schemas for unstructured data. These databases can be categorized based on their data models:

  • Document
  • Key value
  • Wide column
  • Graph

NoSQL databases are suitable for large volumes of data or frequently changing data sets. Document databases are particularly useful for handling vast amounts of unstructured data. Unlike SQL databases, which scale vertically, NoSQL databases scale horizontally, making it easier to expand capacity by adding more servers or nodes.

Advantages of nonrelational (NoSQL) databases:

  • Easy scalability and high availability
  • Flexible database models
  • High performance

Disadvantages of nonrelational (NoSQL) databases:

  • Some databases lack atomicity and data integrity
  • Absence of standardization

Now that we’ve covered the basics of relational and nonrelational databases, let’s examine the database options provided by AWS.

Amazon RDS (SQL) vs. DynamoDB (NoSQL)

Both RDS and DynamoDB are fully managed by AWS, meaning the company handles the underlying operating system and core components. AWS automates routine tasks such as provisioning, patching, backup, recovery, failure detection and repair, reducing administrative overhead.

Let’s examine the details of each service.

Amazon RDS

Amazon RDS enables users to set up, operate and scale relational (SQL) databases on AWS. It simplifies replication to improve availability and reliability for production workloads. AWS offers six SQL-based database engine options:

  • Amazon Aurora
  • MySQL
  • MariaDB
  • PostgreSQL
  • Oracle
  • Microsoft SQL server

AWS provides various instance types with differing combinations of CPU, memory, storage and networking capacity to suit workload requirements.

Amazon RDS features:

  • Multi-availability zone deployment for high availability
  • Read replicas for read-heavy workloads
  • Automatic backups and patching
  • Monitoring

Both AWS RDS and DynamoDB provide businesses with fully managed cloud-service options. AWS or a managed cloud-service company, such as Mission, takes care of routine tasks like provisioning, patching, backup, recovery, failure detection and repair. Ultimately, the choice between these two services depends on your specific needs and preferences.

RDS is often favored for enterprise resource planning (ERP), customer relationship management (CRM), financial data and transactional applications. It enables you to establish, operate and scale relational (SQL) databases on AWS, offering a variety of instance types to choose from.

Dynamo DB

On the other hand, AWS DynamoDB is a serverless solution that automatically adjusts table capacity to accommodate demand, requiring no administrative effort from you. Typical use cases for DynamoDB include real-time bidding, shopping carts, mobile applications and high I/O requirements.

When you need help deciding what database to migrate to, an AWS Premier Tier Partner like Mission Cloud can advise you in determining the right database for your needs. Contact Mission Cloud to set up your complimentary one-hour session with a Mission solutions architect to discuss your machine learning questions.

Group
Created with Sketch.

TNS owner Insight Partners is an investor in: Pragma.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Open Liberty 23.0.0.3 Unveiled: Embracing Cloud-Native Java Microservices, Jakarta EE 10 and Beyond

MMS Founder
MMS A N M Bazlur Rahman

Article originally posted on InfoQ. Visit InfoQ

IBM unveiled Open Liberty 23.0.0.3, boasting support for Java SE 20, Jakarta EE 10 and MicroProfile 6.0. This significant release introduces the Jakarta EE 10 Core Profile, Web Profile, and Platform, as well as enhancements to various features that comprise the profiles. Additionally, the release includes the new MicroProfile Telemetry 1.0 specification and updates to the Metrics, OpenAPI, and JWT Authentication specifications. Open Liberty 23.0.0.3 marks a milestone in the runtime’s development since its inception over five years ago.

The Jakarta EE 10 release signifies a major milestone, being the first Jakarta update since Java EE 8 in 2017 and the first since Oracle donated Java EE 8 to the Eclipse Foundation. The release includes numerous updates to existing specifications and introduces the Core Profile, tailored for lightweight runtimes like Open Liberty, to optimize the operation of cloud-native Java microservices.

The Jakarta EE Core Profile, new for Jakarta EE 10, features Context and Dependency Injection 4.0 Lite, JSON Binding 3.0, RESTful Web Services 3.1, JSON Processing 2.1, Annotations 2.1, Interceptors 2.1, and Dependency Injection 2.0. Jakarta Contexts and Dependency Injection (CDI) 4.0 Lite further enhances support for lightweight runtimes and microservices. This streamlined version of CDI 4.0 provides developers with the essential features for building cloud-native Java applications while minimizing resource consumption, improving startup times, and optimizing overall performance. With CDI 4.0 Lite, developers can now leverage the power of CDI in a more efficient and agile manner to meet the ever-evolving demands of modern Java development.

This release also includes Jakarta EE Web Profile 10, encompassing Jakarta EE Core Profile 10, Authentication 3.0, Context and Dependency Injection 4.0, Concurrency 3.0, Expression Language 5.0, Faces 4.0, Security 3.0, Servlet 6.0, Standard Tag Library 3.0, Persistence 3.1, Server Pages 3.1, and WebSocket 2.1.This release presents Jakarta EE Platform 10, which includes Jakarta EE Web Profile 10, Authorization 2.1, Activation 2.1, Batch 2.1, Connectors 2.1, Mail 2.1, Messaging 3.1, XML Binding 4.0 (optional), and XML Web Services 4.0 (optional). To enable Jakarta EE Platform 10 or Web Profile 10 features, developers need to add the respective feature to their server.xml file.


     jakartaee-10.0

For the Core Profile, enable its equivalent by adding specific features to your server.xml file. To run Jakarta EE 10 features on the Application Client Container, developers need to add an entry in their application’s client.xml file. For more information, developers can leverage the Jakarta EE 10 specifications, Javadoc, and content on the differences between Jakarta EE 10 and 9.1.

MicroProfile 6.0 is also a part of Open Liberty 23.0.0.3, bringing with it Jakarta EE Core Profile 10 and enhancements to the MicroProfile ecosystem. The new MicroProfile Telemetry 1.0 feature, along with updates to MicroProfile Metrics 5.0, OpenAPI 3.1, and JWT Authentication 2.1, are all included in this release, ensuring compatibility with the latest industry standards.

Java SE 20 support is another key addition to the Open Liberty 23.0.0.3 release, offering developers access to the latest features and improvements in the Java ecosystem. Additionally, this update includes numerous bug fixes, further enhancing the stability and performance of the runtime.

Developers can get started with Open Liberty 23.0.0.3 using Maven, Gradle, or container images. The release is available for download on the official Open Liberty Downloads page, where the Jakarta EE 10 and MicroProfile 6 packages have been added. Developers seeking assistance can ask questions on Stack Overflow, where the community actively provides support and guidance.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: On Beyond Serverless: CALM Lessons and a New Stack for Programming the Cloud

MMS Founder
MMS Joe Hellerstein

Article originally posted on InfoQ. Visit InfoQ

Transcript

Hellerstein: My name is Joe Hellerstein. I’m a professor at UC Berkeley in computer science, and a fellow at Sutter Hill Ventures. I’m going to be talking about serverless computing, CALM lessons, and a new stack for programming the cloud. There’s a story we like to tell in computing that with every new generation of platform that gets invented, a programming model emerges that allows third party developers to unlock the unique properties of that platform in new and unexpected ways. This story goes back at least as far as the minicomputers of the early ’70s, with Unix and C, for which Ritchie and Thompson were given Turing Awards. We see similar patterns as we look over time at the disruptive platforms that have been invented. What’s interesting is what’s missing from this slide, the biggest platform for computing that humankind has ever assembled, the cloud. We don’t have a programming environment that is suited to its unique physics.

The Big Question

The big question that I have that we’re going to talk about is how will folks program the cloud in that way that fosters the unexpected innovation that takes advantage of its properties. I’ll say before we get into it, that distributed programming is hard. It’s harder than the other challenges we faced on the previous platforms. It entails issues of parallel computing, of the consistency of data that’s replicated across multiple geographies. Of the possibility that pieces of the system have failed while it’s still running. Modern autoscaling challenges where things are supposed to grow as usage grows and shrink as usage shrinks to save money, that only makes it harder. In my mind, programming the cloud is one of the grand challenges for computing over the next decade. It’s something I’ve been working on and I’m very passionate about.

Outline

In this talk, we’re going to have four chapters. In the first we’ll talk about serverless computing, which is an early leading indicator of cloud programming. We’ll talk about state and coordination, which are the hardest things to do in a distributed programming language. Then we’ll talk about some foundations, the CALM theorem that help us figure out when it’s easy to deal with state and when it’s hard. Then finally, I’ll talk about work we’re doing in my group right now on a project called Hydro, which is building a language and compiler stack for programming the cloud.

Serverless: Signs of Interest from Cloud Vendors

Let’s begin with serverless. I view serverless computing as a leading indicator of interest from the cloud vendors in making the platform truly programmable to third parties. Work on serverless computing goes back at least as far as 2014. It’s not brand-new. Lambda was introduced on AWS in 2014. It’s really come to the fore in the last four to five years. The idea of serverless computing particularly as instantiated by something called Functions as a Service, or FaaS. The idea is really simple. The idea is you write a little function, you can write it in your favorite language, Python, JavaScript, Java, you launch it into the cloud. This little function can take an input and produce an output. Once you launch it to one of these FaaS platforms, clients from the internet can send inputs to this thing, and it’ll produce outputs for them at whatever scale of clients you manage to recruit. If you have a very popular function, the FaaS platform will scale up to meet all your demand. If you have a function that’s not being used, the FaaS platform will scale down so you’re not billed for any usage. It’s a really attractive way to think about the cloud and how to program it. The promise of FaaS then, is that it’s like a boundless computer. It’s programmable. It’s as big as you need it to be or as small as you need it to be. It knows no bounds on compute and storage. The reality of FaaS, however, is much less attractive. It’s what I like to call an elastic army of incommunicado amnesiacs. Let me tell you what I mean by that.

Serverless Function Limitations

Serverless functions have the positive aspects that they have this boundless compute. What’s negative about it is the functions themselves have to be simple enough to run on a single computer. They’re basically a laptop’s worth of computation. You can have as many laptops’ worth of computation in the cloud as you need, but each of those computations is completely isolated. This is caused by the fact that serverless platforms don’t let functions talk to each other, there’s no network messages. That means no distributed computation out of a platform that is fundamentally the biggest distributed computer you can imagine. We really handicapped ourselves in terms of the physics of the cloud, we’re not taking advantage of the ability to have functions that span multiple computers.

A second problem is there’s no low latency data storage or access from these functions. They can access remote cloud services like S3 storage, object storage, or remote databases, but they don’t have any local storage for very fast persistence, which means that typically, they’re stateless. They have no information that’s kept across invocations. Then the third big missing piece with these functions and Functions as a Service platforms is that they’re made to reboot every few minutes. Because they have no storage, they didn’t squirrel away in memory what they knew before they were rebooted. They’re basically reborn with no memory, they’re amnesiacs. This is why I call it an army of incommunicado amnesiacs. They have no ability to talk to each other, no ability to remember anything. We wrote about this in a paper that I’d encourage you to read. The spirit of this is not so much that serverless computing is bad, but rather that it skipped some fundamental challenges, and if we solve those challenges, we would really have a programmable cloud in our hands. Let’s get to work.

Serverless as a Leading Economic Indicator

Before we get into that work, though, I do want to point out that serverless computing is really interesting as a leading economic indicator. The first 15 to 20 years of the cloud is what I call the boring revolution, surely a revolution in the sense that folks like Jeff Bezos were able to overturn the industry and shift billions of dollars of revenue from legacy enterprise vendors like IBM and Oracle, to cloud vendors that didn’t previously exist, like AWS and Google Cloud. Of course, Microsoft was able to weather this transition internally. That’s a big deal. The truth is, it’s still legacy enterprise software. What can you get in the cloud today? Mostly, you get databases, and queuing systems, and application services, and load balancers, and all the stuff we already had before. From a fundamental computing perspective, it’s not that exciting. It seems somehow, we haven’t taken advantage of what the cloud could let us do in terms of innovation.

For 15 years or so, the likes of Jeff Bezos were off pulling off this revolution, they weren’t worried about enabling third-party developers to take advantage of the new platform. They were busy taking advantage of it themselves. Now they’ve won that battle. In 2022, they start to have incentives to grow by exposing the platform and fostering more innovation. I think the time is ripe to really start answering this question, how do we program the cloud? I think there’s economic incentives for it now. All we need is for the infrastructure folks who build the programming models, the systems that do the runtimes behind those programming models, we need to roll up our sleeves and say yes to the hard challenges that earlier generations like Functions as a Service took a pass on.

The good news is that in that 15 years, when the cloud vendors were stealing business from the enterprise vendors, a bunch of us were off doing research. We’ve been quietly able to do a bunch of research. Now it’s time to start to harvest that research, put it together into artifacts that people can really use to leverage the power of the cloud. We’re going to roll up our sleeves. I think we can do this. The goal really is general-purpose cloud computing, without compromises, without really narrow corner cases. How do you let most people write programs that harness the full power of the cloud? Three main goals, simplicity. It should be easy for developers to get on-ramped into this new programming platform: easy to learn, easy to debug, easy to operate. Correctness, we want the programs to run as intended. That is tricky in the cloud, as we’re going to talk about. Then, the dynamics of cost and performance. We want efficient code by which we mean, yes, it should run fast. It should also only consume the resources that it needs. It should be efficient in terms of my cloud bill, not just in terms of my time.

Toward Generality: Embracing State

To get to that general-purpose computing, the thing we’re going to have to do, the hard thing is to deal with what’s called state. In the animation, I’m embracing the state where I live, and you might want to embrace yours. What I mean by state here is the data that functions are computations, generate and manage within and across invocation. If you call a function, it generates some information that it keeps in RAM, and then if you call it a second time, it might need to remember some things from the past. I call all of that state. It’s really data that’s either in memory or could be on storage devices as well. There’s two challenges with state, one is hard, one is less hard. The really hard one is a correctness challenge called distributed consistency. We’re going to talk about it quite a bit. It’s a correctness problem. It’s a barrier to simplicity. It really makes programming distributed systems hard to get right, and it’s unavoidable. The second challenge is data placement and data movement. Which data should be where, when, for high performance? This performance problem is the stuff that we as engineers are quite good at. You do a prototype, it’s a little slow. You profile it, you realize the data that was over here should be over there. You do some adaptations, and over time your system gets fast. Challenge number one is the one I want to talk about.

The Challenge: Consistency

What is the challenge of consistency? The challenge is to ensure that agents that are separated by space, agree, or will agree on common knowledge. We have this lovely couple in the upper right, and they are an example of what I call simple data replication. All we’ve replicated is x, a variable. We haven’t replicated a program and all its meanings, we’ve just replicated some state, x equals heart. They both agree that this variable x, which could change, it’s mutable. Right now, it equals heart, and everything’s happy. What happens if they’re separated? They can’t talk to each other. One of them changes the value of that variable, so now unfortunately, the young woman on the left believes that x is poop. While the young man on the right believes that x is heart, what is going to happen? The problem is that with their beliefs, they will move forward and make more fateful decisions, and step down a forking path of these decisions to the point where if we have to put them back together later, they’ve made so many decisions based on different assumptions, that we can’t put them back together in any sensible way. This is what’s sometimes called split brain divergence. The computer brain across these two people has two different parts that can’t be put back together in a sensible way.

This is not a new problem. This is a classic problem in distributed computing, and it’s been solved at some level. On the right you see Leslie Lamport. He won the Turing Award for inventing a consensus protocol called Paxos. What’s a consensus protocol? It’s a protocol that allows multiple computers to agree on a value, like x equals heart. There’s similar protocols from database systems, like the two-phase commit protocol that gets a network of computers to agree on whether or not to commit a transaction. These protocols are well known. I will say, though, that they are tricky. Every time I have to teach Paxos, I really need to bone up on it before I go into the classroom, because otherwise students will get very lost. There’s a ton of details, and it’s not particularly intuitive. Sometimes in engineering, in math, in computer science, stuff is tricky, but you got to do it. It’s worth it because it’s cool. It solves a real problem. This stuff, unfortunately, is both tricky and also bad. We shouldn’t use this stuff. Don’t take my word for it. What do I mean by this?

Coordination

In the upper right of this slide is a picture of James Hamilton. James has been around forever. He was involved in the early IBM databases. He was involved in the Windows file system. Over the last 20, 25 years, he’s been involved in architecting two of the major clouds, both at Microsoft, and at Amazon. He is one of the key architects of the modern cloud. Some 13 years ago, he gave a talk, and the quote is so good I like to read it like poetry. He talks about it like this. “The first principle of successful scalability is to batter the consistency mechanisms down to a minimum, move them off the critical path, hide them in a rarely visited corner of the system, and then make it as hard as possible for application developers to get permission to use them.” It’s as if he’s saying, “Thank you Dr. Lamport for all the Paxos but we’re not going to use that.”

Why is he saying this? Why is he tying our hands and not letting us use the solution to the problem at hand? Coordination of the forms that he’s calling consistency mechanisms, so things like Paxos, two-phase commit, they involve computers waiting for each other to come to agreement. There’s a lot of messages that go back and forth that you have to wait for. That waiting causes other machines to wait, and it builds up queues. I may be waiting for you. Other services may be waiting for me. Services way up the chain may not even know what they’re waiting for anymore. What happens is if some junior developer at a big cloud vendor decides to call Paxos in the middle of their program, because they think they might have a bug otherwise, it can cause a cascading effect of queues throughout the cloud that bring things to their knees. This has been documented in some of the major clouds, especially in the first many years. James’s advice is very pragmatic, we don’t want to use coordination. If we have to use it, only the experts get to use it, and only in a corner of the system.

The problem with coordination mechanisms is that they’re reasoning in a very conservative way, way down at the bottom of the system, at the level of memory accesses, things like memory access, read and write, or disk access, read and write. Databases are classic for this in terms of their transaction semantics. My former student, Peter Baillis, is now at Stanford and has Sisu, which is this startup of his, had this great cartoon in his PhD thesis. It’s an adaptation of The Far Side. It goes like this, what we say to databases is, “Ok, database, now be good. Please move one iPhone out of inventory and into Bob’s cart.” What the database storage manager actually hears is, “Blah, blah, blah, read, blah write, blah read, blah write.” It knows nothing about all that application-level semantics about products and inventories and what it means for there to be less than zero things in inventory or too many things in a cart. It doesn’t know any of that. It just knows reads and writes. It’s very cautious about what reads and writes it will allow.

What we’re seeing is a generational shift in the way we’re going to solve the problem. The Lamport-era 20th century approach was to reason at the storage level, and make worst-case assumptions. What we’re going to talk about, and what we’re going to see in more modern research in the 21st century is going to look at application specific assumptions. What do we know about our application, and how it uses its data and its state, and how the computations in the application affect that state? At some level, the stuff about memory access is really not where the action is anymore. If we want to do this well, we’re going to be reasoning about programs, not about I/Os.

When and Why Do We Need Coordination?

The big question, when you start walking down this path of application semantics is, when do we truly need stuff, like Paxos, like Lamport’s work? Why do we need it, and when can we avoid it? This question, when do I need coordination and why? If you ask a traditional computer scientist, ask most computer scientists, why do we need a lock in our program? That’s a little example of a local form of coordination. Why do I need Paxos? They’ll tell you, “We have conflicts around some resource, and if we don’t coordinate the order in which people access that resource in space and time, bad things will happen.” A typical example in the physical world is this intersection, hasn’t been coordinated, and two cars can’t be in the same place at the same time, and so you get this problem. There’s an obvious solution to this thing, which is fix the stoplights. Stoplights are great, because then the north-south guys can go for a while. Then the east-west crowd can go for a while. Then the north-south again. At any given time, at least we’re utilizing that resource of the intersection. The problem with that, of course, is if the arrival rate of cars is faster than the drain rate that happens while the light is green, we get infinite pileups at these intersections, even with stoplights. What are we going to do? We have to have stoplights. Physics says you can’t be in the same place at the same time. Maybe there’s a clever engineering solution that doesn’t break physics. In fact, there is one and you see it all the time. It’s to use another dimension in space, we’ll build overpasses, and then we don’t have to have stoplights. These overpasses can go at full bandwidth all the time. Of course, you’ve been stuck in traffic. There’s other reasons why traffic backs up besides coordination. The idea that we can avoid coordination in cases where we might have thought we have a contention or a race condition makes the question really hard, when do we need coordination really? When is there a clever solution to work around it and not have coordination?

It’s a theory question really, at the end of the day. Let’s move to our whiteboard. Suppose you understand your program semantics, which programs have a coordination-free implementation? If we could just find it, a clever programmer could build it without using coordination. Those are the good programs, we put them in the green circle. The rest of the programs are the programs that require coordination. There is no programmer in the world who could come up with a solution to this problem without using coordination. It’s absolutely intrinsic to the statement of the problem. I want to know, what’s the green stuff, because the green stuff, I can make that autoscale and run faster in the cloud and all these good things. The red stuff, I’m going to have to bite the bullet and use coordination. That’s the stuff that James Hamilton doesn’t want the junior programmers to write. What is that green line? Can you give me a test that will separate the red programs from the green programs? This is a computational complexity or computability problem? What are the programs computable without coordination versus the programs that require coordination? What we’re going to do next is look at that theory. It’s called the CALM theorem. It’s going to inform how we can actually build systems. What we’re going to see is that monotonicity is going to be that green circle, the bright line test for whether coordination is needed. We’re going to talk about that, some lessons, and some performance payoffs.

CALM Theorem

Here’s the statement of the CALM theorem as it was conjectured in a talk I gave back in 2010, and then proved the very next year by an extremely brilliant graduate student at the University of Hasselt, in Belgium, Tom Ameloot. The theorem says this, a distributed program has a consistent and coordination-free distributed implementation, if and only if it is monotonic. The green line test is monotonicity of the problem. If it’s monotonic, we can get a consistent coordination-free implementation. If it is not monotonic, it’s outside the green circle, there is no consistent coordination-free distributed implementation. If you want to dig into this research, I recommend you start with a short six-page overview paper that Peter Alvaro and I wrote in CACM, just a couple years ago. It’s pretty accessible, and it points to the relevant papers where you can dig deeper. The papers are beautiful.

Let me give you some intuition. We’re going to need some definitions, particularly of our key words, consistency, monotonicity, and coordination. We won’t get formal definitions, that’s what the proofs are for, but we will get some intuition. For consistency, we’re not going to use an I/O consistency like Lamport did, we’re going to use an application-level consistency. I want the outcomes of the code of the application to be the same everywhere in the network, regardless of whatever shenanigans are going on with the network. Messages getting delayed, or reordered, or having to be resent multiple times, so we receive them lots of times. Regardless of all those shenanigans, I want everybody to compute the same thing. The test we’re going to apply is monotonicity. Monotonicity is a property of a function that says if you give it a bigger input, you get a bigger output. Formally, on the left, it says that f’s monotone is defined as if x is a smaller thing than the y, then the function on x is a smaller thing than the function on y. Put in a bigger thing, namely y, get a bigger outcome, namely f of y. The cool thing about this is you can think of this as a streaming environment, start by computing at x, all of its outputs are correct. Then if you grow to include the stuff that’s in y that wasn’t in x, you can compute that, and all the stuff that was from x will still be correct.

With monotonic programs, the early results are guaranteed to be in the final results, so you can stream output to the user without regret. This should give you a hint at why this is coordination free. Every computer in the system, the minute it knows something is true, is guaranteed that thing will be true in the final outcome, and so it can emit the things it knows true right away without asking anybody, no coordination. Which leads to the hardest question actually, turns out, how do we define what coordination is? That’s where Ameloot’s proof is really brilliant. He shows that the way to think about coordination is, there are the messages that we have to wait for, even if it turns out we have all the data. Suppose I know everything about the answer, and the only thing I don’t know is that no one else knows anything about the answer. Then I’m going to run around to everybody saying, do you know anything else I should know? Only after everybody else have no idea what you’re talking about, then I can output things. That work I was doing with everybody, that counts as coordination. The actual work to compute the answer is not coordination, the work to check that nobody knows anything that I need to know, that’s coordination.

Easy and Hard Questions

Again, the CALM theorem says that the monotonic programs are exactly the set of programs that can be computed coordination free with a consistent outcome. Which leads to some easy and hard questions that we can ask of you. The first question is, are any of you over 18? If any one of you sent me a text message or an email or whatever that said, I’m over 18. Then I’d have my answer and I’d be done. What about if I asked the question, who is the youngest person to watch this video? That one, first of all, suppose you are the youngest person who watches the video, the only way you’ll know that you’re the person who should send me a text is by asking everybody else, are you older than me? Worse than that, suppose somebody younger than you comes along tomorrow and watches the video, then you’re not the right answer after all. You shouldn’t have said you were the right answer. It’s non-monotonic. The more people who come along, the answer might change. We can’t stream outputs. We might have to retract outputs that we released early.

Let’s look at a rephrasing really of the right-hand question. All we’re going to do is we’re going to rephrase it to say, who is the person that nobody is younger than? I’ve done the thing where we do a double negation, where we do De Morgans Laws. Nobody is younger than, so let’s read it out. There exists an x such that there does not exist any y, where y is less than x. There is no person younger than this person. The reason I rewrote this is it’s really easy to look at this stuff in logic now and see what’s monotone and what’s not. The thing on the left there exists. The minute we see something that exists, we’re done. Anything more that comes along, only makes that thing more true. It was true, and it remains true no matter what happens. The thing on the right, it’s got that not sign in front of the exists. When you see not, it’s non-monotone, because we might start with there not existing a y. As we introduce more y’s, we might find one that does exist, and so we’ll start out by saying, Mary is the youngest person watching this video. There’s nobody younger than Mary until somebody younger than Mary comes along. That not exists is something we can’t test locally. We need to know a lot of everything to test not exists. That negation is the key to non-monotonicity. Very easy to distinguish monotone from non-monotone in these logic expressions.

The CALM Theorem

That gives you a flavor for the CALM theorem. Monotonicity is the bright line test for whether coordination is needed. You might be asking a pretty natural question at this point, which is like, why do you need to know this? Very nice. It was good to go to a talk and learn a little computer science, but like, what’s it for? I’m going to try to argue in four different ways that you should understand this, and you should ingest it and somehow use it in your work. We’ll do it first to look back at some familiar ideas in distributed computing to some of you, the CAP theorem. Then we’ll talk about right now, what can you walk away with from this talk as a design pattern for thinking about how to build systems that go real fast, and are monotone. Just over the horizon, stateful serverless computing, how can we have serverless functions, like we talked about at the beginning, that have state, that have data, and keep them correct? Then finally, we’ll transition to the last part of the talk, talk about languages and compilers that can help all of us program in ways that take advantage of monotonicity when we can. Then try to move coordination, as James Hamilton suggested, into the corner, into the background.

A CALM Look Back at CAP

The CAP theorem was a conjecture by my colleague, Eric Brewer at Berkeley. It says that you only get two out of the following three things, consistent outcomes, availability of your system, that is that it’s on and live and accepting updates, and partitioning in the network. Partitioning means that at least one computer can’t talk to at least one other computer. A way to think about this is if you have a partition, if there’s two computers that can’t talk to each other, you have two choices, either you can turn off the service, and then it’s not available. Or you can let the service run with these two different computers that can’t talk to each other, and you’ll end up with a lack of consistency. You’ll end up with split brain. You won’t be able to put it back together when the network heals. That’s the CAP theorem. For 15 years, people talked about the CAP theorem, it just doesn’t seem quite right. I bet I can beat it. There was a proof of the CAP theorem that came out of MIT, Gilbert and Lynch. An unspoken assumption of that proof, and in a lot of the discussion of the CAP theorem advocates, is that the kind of consistency we mean, is that low level I/O consistency that requires coordination. The definition of consistency is one that requires coordination in its definition.

As I said, that’s tired in today’s world. What the CALM theorem does, is it explains when and why we can beat the CAP theorem and get all three of consistency, availability, and partitioning. Remember that we can have consistency that’s coordination free in the green circle. In the green circle, we can have split brain for a while, while things are partitioned. When they come back together, because they’re monotone, they’ll just exchange messages and start outputting more good stuff. We can put the Humpty Dumpty back together, in a coordination-free environment. Coordination-free consistency is possible, even if we’re available under partitioning. What are the coordination-free consistency programs? The monotone programs, that’s the CALM theorem. CALM is explaining where the happy case is. It’s the coordination-free programs. We can get all three of the CAP things, and one of the sad cases. The CAP theorem is really a theorem about that outer red circle, or about a worst-case assumption. This definition of coordination is at the heart of the technical differences in the formal proofs of these theorems. What I can tell you is that while the theoreticians chose their formalisms, Eric and I are on the same page about how this relates to what you and I might want to build in the field. Eric and I are pretty connected to practical use cases, both of us involved in the industry.

Design Patterns from CALM

What I want to talk about now are things you can build in languages like C++ and Java that would benefit from the design patterns of the CALM theorem. What we did in my group is we applied it to the context of traditional software. We picked a little petri dish of a problem, which is a key-value store. We built a key-value store called Anna. I should mention on the upper right, this is Anna’s hummingbird. It’s a kind of hummingbird that was clocked as being the fastest animal for its size on the planet. There was research that was done at UC Berkeley by colleagues, so we like the name. The idea with Anna is that it’s very lightweight, very fast. Just as fast as it needs to be for its size, because it’s very serverless and autoscaling. The key things with Anna are twofold. First of all, it does give you many kinds of consistency guarantees that are coordination free, so things in the green circle. It’ll give you causal consistency if you know what that is. It’ll give you read committed transactions if you’re familiar with transaction levels. It does this with no coordination, which means it is always running full tilt parallel, and it’s able to autoscale. Here’s a couple of different papers on the Anna system that you can go chase down. Great work by Chenggang Wu, the lead author. He won the ACM Sigma Dissertation Award for this work, so a landmark in database systems.

CALM Autoscaling: The Anna KVS

The key thing with Anna is not only is it fast, but it’s autoscaling. We started out with the goal of overcoming conventional wisdom on scaling from the first decade of the cloud, in essence, or of at least network services. Jeff Dean, around 2009, after about a decade at Google, was giving talks saying that when you build a service, you should design for 10x growth, but plan to rewrite it before you get to 100x growth. That’s what we’ve had to do at Google every time. That was good lessons for Google back then but it’s a terrible idea today. Serverless computing is all about my application might not be popular now, but maybe next month, it’ll take off like a hockey stick, and it’ll be 10,000 times more popular than it is today. I as a programmer don’t want to rewrite that application three times between today and tomorrow. I want to write it once. I want it to take up as much compute as it needs, up and down.

What does CALM tell us? What we’re going to do in Anna is we’re going to take the idea of coordination-freeness, and we’re going to use it to enforce certain consistency levels. We’re going to enforce levels like causal consistency. The way we’re going to do it is we’re going to use data structures that only grow. Data structures that are monotone, like sets that we union things into, or counters that only get bigger and bigger. These are what are called semilattices, or if you’ve heard of CRDTs, which I’ll talk about later. These are like simple CRDTs, but we’re composing them into hierarchies so you can have a set of counters of sets, and so on. When you put these things together, each one of them super easy to reason about, but they’re all data structures that just get bigger. We never remove or lower them. We only make them bigger over time. In doing that, we have monotonicity. The data structures themselves are just C libraries. They’re just sets and counters and things. Because we’re using them only with things that grow or methods that grow like union, and increment, then we’re cool.

What this is going to allow us to do is scale across threads without coordination. Here we have three objects in the Anna key-value store and one operating thread. We can scale it up, replicate the objects in certain ways. In this case, there’s two copies of every object, but each node has a different pair. This is good for fault tolerance. Any node can go down, and we still have all three objects. We can also scale up and down a memory hierarchy. Here, what we’re doing is we’re putting cold objects like blue, on disk drives. There’s no point allocating RAM in the cloud for those objects if no one’s accessing them right now. Hot objects like green and red will promote up into the memory tier where things are fast access. We can choose to have things that need to go low latency and fast in memory. Things that are not used very often and cold can be far away. This is the control we want over state. We want state to be close to us, if we need it to be close to us, and far away if it doesn’t need to be close to us. We also have the ability to scale back down when the system changes usage, and we have fewer users. Anna can do all this really easily.

CALM Performance

The performance we get out of Anna is phenomenal. What you’re seeing up here, the Anna architecture essentially shares no memory. Even individual threads just talk to each other with messages, and computers talk to each other with messages. What we’re seeing is as we add threads to this thing, which are the individual ticks on the x axis, it scales linearly in this very smooth way. When we saturate a machine with all its threads, so we have a 32-core machine here, we just go to the next machine, and it just keeps scaling, with almost no interruption in the smoothness of that. If you know anything about building parallel and distributed systems, you know that getting smooth scaling like this is extremely difficult. When you’re doing coordination-free computing, it’s often very easy. Anna has this beautiful scaling phenomena. More impressive, though, is that it is crazy fast even under contention. That’s the case where there’s a few objects that are very hot, everybody wants to write them and read them and write them, and lots of objects that are pretty cold. In most systems, those hot objects determine your throughput. In Anna, you can have lots of copies of those hot objects, lots of people can be updating them, and it’ll just keep going at maximum performance. What that gives you is ridiculous performance improvements over the competition.

Masstree was the fastest in-memory multi-threaded key-value store we could find at the time from research, came out of Harvard. Anna was 700 times faster in these experiments with contention. It was 10 times faster than Cassandra in a geo-distributed deployment with this contention. It was 350 times the cost performance from DynamoDB. It was 350 times faster for the same price than DynamoDB. The reason for this is very clear, it’s CALM and coordination free. That means that as we’re inserting things into this key-value store and reading them, we never have to do any waiting. We have no atomic instructions, no locks, no Paxos, no waiting ever. What that causes in performance when you do your performance breakdowns is that the competition on the lower lines of the chart on the right, TBB is Intel’s Thread Building Blocks library. It’s Intel’s fastest library for hash table. Spends 95% of its time in this workload retrying atomic instructions. These aren’t locks, so it’s lock-free TBB. It is atomic so it needs to successfully update the thing before somebody else peeks at it. It fails, and so 95% of its time, over and over, it’s trying to do atomics. Atomics are coordination. They’re just not locks, they’re a different kind of coordination. Anna, on the other hand spends 90-plus percent of its time just doing PUTs and GETs. This is what’s called good PUT. Anna is doing only good PUT. Ninety-percent of its time is spent doing good stuff. Anna is giving you consistency, this is not just the who knows key-value store, it’s guaranteeing you things like causal consistency, or read committed.

Lessons from Staying Monotonic

Some lessons from staying monotonic. If you can figure out how to have your data structures and what you do with them only get bigger over time, monotonicity, then you can update those things at any time, in any place have copies at will. The system will have maximum good PUT even under contention, and you have the ability to be really profligate about your replication, so you can do things like replicate horizontally to your peers for fault tolerance, or for load balancing, or for latency. You can have copies of the data close to your users geographically. You can also replicate vertically to faster caches. If you want data in your serverless functions, for example, you want them to be stateful. Or you can move the data to slow storage if it’s cold. You pay more dollars only for the things that are hot. That’s how you beat the heck out of something like DynamoDB.

Stateful Serverless Computing

We talked about what we can do, now let’s talk a little bit about what’s just over the horizon. We’ve already prototyped a couple years ago at Berkeley, the idea of Functions as a Service, but with state. The trick was to use the ideas of Anna, in our Functions as a Service environment, to do updates locally and remember them as long as the data types are monotone. One of the things that was really interesting about Ameloot’s CALM theorem papers, he showed a third equivalence. I said consistency is logical monotonicity, CALM. He pointed out that the monotonic programs and the coordination programs are the same, and they’re also the same programs that can be computed by oblivious actors. What’s an oblivious actor? It’s an actor that doesn’t know who all the other actors in the system are. It doesn’t even know its own identity. It’s just one of an army of clones, not knowing who’s in the army that just computes when it gets messages. It sends messages, gets messages, but it doesn’t really know who’s around. It’s just doing its thing. Obliviousness is just the property we want for autoscaling. It means that if I don’t need to know the population, I can add nodes at will, I can remove nodes at will, and the individual nodes just keep doing their thing. The programs you can execute in that environment are the monotonic, coordination-free programs. It’s all the same. It is possible and we’ve shown it in CloudBurst, to build a Functions as a Service environment. It’s oblivious, so the functions don’t know who all is running. It’s stateful, and communicating, so it’s removing those restrictions that we worried about early in the talk with serverless systems. It’s got the freedom to scale up. We can add another node, start copying state to at any time. Scale down. We can stop taking updates at a node, copying state elsewhere, and decommission it.

We’re using this architecture that Chenggang and Vikram, the leads on this work, have gone off and started a company called Aqueduct, which is doing serverless model serving, so take machine learning models. They need to do inference or prediction at scale, and you want those things to scale up and scale down with use. That’s what Aqueduct is all about. It’s an open source library. I encourage you to go check it out if you do machine learning or MLOps. What I was going to close with those, that the shortcuts for Functions as a Service, the statelessness and the lack of communication aren’t necessary if you commit to monotonic code. Monotonicity is the key property that was missing.

What More Could We Want?

So much for the CALM theorem. Now I want to talk about Hydro and what we’re doing to make it easier to program the cloud. First, let’s ask the question, what more could we want? I told you all these great things about monotone programming, but what I didn’t focus on is the idea that non-monotone stuff happens. James Hamilton acknowledges that sometimes in the corner of the system, you need to use coordination. We really do want to have a programming environment for general-purpose computing, that has a mix of monotone and non-monotone stuff. We might want strong consistency. We might want transactions, sometimes. I/O guarantees, maybe we want that, we should be able to have it. Simple thing, I might want to know when my monotone program terminates. Let’s go back to my question of, is anyone here over the age of 18 watching this talk? I’m sitting here recording the talk and I don’t know the answer yet. No one’s texted me. I don’t know if the answer is no, nobody is over the age of 18 watching this talk, this talk is strictly for kids. Or just, I have to keep waiting. That idea of termination detection, even for monotone programs that haven’t finished, it requires coordination. I could keep waiting, and somebody might come along and say I’m over 18, and that’s good. Then I’m done. Until that happens, I just have to wait. I might like to know like, is there really nobody over 18? That’s a non-monotone question. For all the people in the world, who are going to watch this, are they under 18? That’s a non-monotone question. Termination detection requires coordination. Then there’s some algorithms that just computational complexity tells us they require coordination. The non-monotone programs don’t cover all of polynomial time, which is most of what we do in algorithms. They cover a large fragment of it, but there are programs that you might want to compute algorithms that require coordination or that use coordination to get efficiency gains, so I want that option.

The second thing I want is I want a language, compiler, a debugger that’s going to help me with these problems. It’s going to address the real concerns of distributed computing. What are the concerns of distributed computing? Here’s one, is my program consistent? Even though maybe I’m not sure it’s monotone. Is it consistent? Can I take the state in my program and partition it across lots of machines by sharding it? Some of the state’s on one machine, some of the state’s on a second machine, and so on. If I do that, will my program still be correct? What about if I replicate the state particularly if it’s non-monotone, is it going to work out or not work out? Do I need to use coordination? Do I not need to use coordination? Where in the program should I put the coordination? What about failures? What failures can my program tolerate? What if a machine goes down? What if a network message gets dropped? How many failures can I tolerate before the system goes offline, or starts giving wrong answers? Then, what data is moving around in the system where? Who can see it? Is that ok? What if I wanted to control it, how would I control where the data goes? Take any one of these questions and go to your favorite traditional compiler, like the LLVM stack that we all use for compiled programs these days, and they’ll look at those questions and pretty much shrug their shoulders. Modern compilers were built to handle the problems you get on your PC, the problems you get on your phone. They were not built to answer the programming problems that arise in the cloud. I want a language, a compiler, a debugger that can answer these kinds of questions. Then, of course, locally on the individual machines we’ll call LLVM to compile the local code. That’s fine. It does a lot of good things. Hydro is going to try to address these kinds of challenges.

Inspiration is a Query Away: SQL

The inspiration for this is something we’re all familiar with. It’s just a query away. It’s SQL. SQL, the database query language is the single biggest success story in autoparallelized programming. Take an SQL query, run it on your laptop, take it to the cloud, run it on a million machines. It’s that simple. Not only that, we’ve been doing this since the 1980s. There’s a project at Wisconsin called Gamma, led by Dave DeWitt, who should win a Turing Award for this work, that showed that you could take SQL queries and parallelize them like crazy. He proved it by building it on a very early parallel computer that was incredibly hard to work with. Around the same time, a company started called Teradata to do the same thing. Teradata is still around, they’re still cooking. They can run stuff really fast across lots of machines. This has been known since the ’80s. A lot of it was reinvented in the big data craze in the 2000s. Honestly, most of the innovation has been around for SQL since the beginning.

Relational queries scale like crazy. Serializable transactions do not scale like crazy. As we take lessons away from databases, we should focus on the query languages, the query processors, the query optimizers, and not worry about these low-level I/Os things, like transactions. That’s a separate concern that we talk about separately.

I want to highlight a thing that goes back to the dawn of relational databases, and Ted Codd, who also won the Turing Award for this. The idea was that data moves around on disks over time, and queries stay the same. We need to build database systems, namely relational systems, that hide how data is laid out and hide how queries are executed. All you do is you say your query, and the system is in charge of worrying about, the data is laid out in a certain way, so I’ll run the query in a certain way. The cloud was invented to hide how computing resources are laid out, and how general-purpose computations are executed. The cloud in a lot of ways, is in the state that we were in with database systems in like 1969. It’s waiting for somebody to come by and invent a programming model that empowers us to hide this stuff. There’s lots to learn from database systems.

Our First Approach: Relational, Data-Centric Programming

The first approach we took in my group to solving this problem was a very relational, data-centric programming language that we called Bloom. Bloom is still available. You can play with it. It’s an embedded language inside of Ruby, actually, about 10 years old now. We took a theoretical language for relational queries called Datalog, we extended it to deal with time and space, so distributed systems problems. We call that Dedalus. Then, Bloom was the practical programming language we were showing to developers that took Dedalus into something that was programmable. We demonstrated many of the benefits that I’m alluding to in prototypes of the Bloom language that weren’t particularly fast implementations and they weren’t particularly easy to learn. There they are, and we wrote papers about them. Folks who are really interested in this area, I encourage you to have a look. What we had built in the Bloom era was a walled garden. We really didn’t focus on making this easy for people to learn, incrementally adoptable from existing languages. We didn’t worry about how it integrated with other languages, and we didn’t really worry about whether it was that fast, just that it was coordination-free and scalable.

Then the other thing I’ll point out about Bloom that we started fixing, and we’re continuing to fix in new projects, is that a lot of common constructs we want in computing are pretty clumsy to do with SQL or another relational language. The classic one is counters, which are monotonic and just be going up. They’re super commonly used in distributed systems, and we use them in a lot of our coordination-free tricks. They’re very hard to do in SQL or a relational language like Datalog. Another thing that’s hard to do with those languages is data that really wants to be ordered. I have a list of things, it’s ordered, and I want to preserve its order. That’s very clumsy to do in relational languages. It can be done, but it’s ugly. The programming model is not always friendly to the programmer in this environment.

Competing Approach: CRDTs

There was a competing approach around the same time called CRDTs. These are an object-oriented idea. The idea is that you have objects that have one method, the method is called merge. It’s an object class with a merge function that takes two CRDTs and merges them to make one big CRDT. Things merge together like amoebas, so these objects can get bigger. The only requirement is that your merge function can be anything you like, as long as it obeys the following rules. It is associative, which means that you can batch these merges up into batches of any size, and it still gives you the same answer. It’s commutative, so the order in which you merge things doesn’t matter to the outcome. It’s idempotent, so if you merge the same object in twice, it doesn’t change the outcome. In mathematics, there’s a name for a mathematical object that is a type with such a merge function that’s associative, commutative, and idempotent, it’s called a semilattice. This is an object-oriented interface to semilattices. Because we’re computer geeks, we give it a new acronym, we call it CRDTs.

Unfortunately, CRDTs are broken. I like to call them the Hotel California of distributed state because they only have a merge method. You can merge state anytime you like, but you can never read safely. If you observe the internal state of a CRDT, you freeze it in time, you say what’s in there, and you open the package and you try to read from it. That breaks the guarantee of eventual consistency that they’re giving you. Depending on when you read, you may see stuff may be non-monotone and change over time. CRDTs are broken. They sound like they’re promising you correctness. They promise you correctness if you never look at your data. They don’t promise you correctness if you do look at your data. Still in the spirit of Anna, they’re inspiring. They allow us to start thinking about data structures that only get bigger. CRDTs are data structures that only get bigger in a formal sense. They’re getting a little bit of adoption. People are starting to pay attention to CRDTs. It’s even beginning to have a little impact in some commercial products. Of course, they’re being used by developers in ways, hopefully, that avoid the bad cases that they don’t guarantee to handle. I view them the way I view serverless functions or NoSQL, is they’re flawed, but they’re a leading indicator that people want something and that we could build something even better.

Another Influence: Lessons from Compiler Infrastructure

I talked about relational languages. I talked about CRDTs. Another big influence on the work we’re doing is just compiler infrastructure the way it’s done today, which is basically to have stacks of languages. The canonical example is LLVM, which is a very influential piece of software. The key idea in LLVM, there’s many ideas, but the one that architecture is most important is that it has its own language in the middle, an intermediate representation, an IR as it’s called in the programming languages community. You can have many languages at the front of LLVM, like C and FORTRAN and Haskell in this picture, but since then Rust and Swift and a bunch more. First, you translate those to LLVM’s internal IR, which is a language. You could actually type it. Then the LLVM optimizer works on that IR to generate an appropriate machine language for the backend, which could be one of many machine languages. It’s really three languages. There’s the input on the left, there’s the IR in the middle, and there’s the machine code on the right, all of which, in principle, you can type in text. If you’re a performance sensitive person, you can jump in at these different layers and hack. There are people who change their machine code, not recommended. There are people who will change the LLVM IR that’s generated from the frontend, might sometimes be worth it.

Hydro’s Inspiration

Hydro is a new programming language for the cloud. It’s inspired by logic languages like SQL, because they give us the analytic power to look for correctness and optimization. If you remember back to our example with the questions, the exists and not exists questions, the for alls, those are logic languages. You saw those expressions where we could just look for the not side and be able to tell that something was non-monotone. Logic languages are like that, SQL is like that. Inspired by lattices, because really what lattices are, is they’re generalizations of what’s good about relations to other data types. They allow us to have Grow-Only things like sets of tuples, which is what relations are. They don’t have to be sets or tuples, they could be things like counters, they could be things like maps of other kinds of lattices. Lattices are composable, and they’re quite general. It generalizes what we already knew about from database systems, if we use them in sophisticated ways. We’re inspired by functional dataflow, things like MapReduce, Spark, and Pandas. Actually, if you crack open any SQL database, there’s a functional data flow runtime inside of it, so it’s no surprise that MapReduce, Spark, and Pandas look a lot like SQL systems, or even support SQL as Spark does, because really, they’re just the inside of a database like Gamma. Then finally, compiler infrastructure stacks, as we talked about, things like LLVM. We have these multiple languages targeting both generalists and experts. We have a vision paper that we wrote last year, “New Directions in Cloud Programming,” that lays out our agenda for what we want to do and why we think we can do it, given what’s known in the field as of a year ago.

Hydro: A Programming Stack for the Cloud

This is Hydro in a block diagram. It’s a programming stack for the cloud. At the top, many languages that you can program in, much like the left-hand side of the LLVM picture. In the middle, an IR that’s declarative and hides the resources and the execution that’s in the cloud, very much like an SQL style language just says what you want, not how to do it. Our challenge is to compile into a declarative IR. Then we have a compiler that translates that into a Dataflow program, and, in particular, into a Dataflow program of lattices. In essence, we’re taking the idea from CRDTs, but we’re turning it into a composable, rich, low-level programming model that can build extensions to the ideas of things like Spark. Then that Hydroflow program actually generates Rust code. The Rust code goes into LLVM, it turns it into executables. Then the challenge is to deploy them and scale them up and down. That’s the Hydro stack.

Initial Wins from Hydro

What I’ll tell you is we’re starting to get some early wins in the stack after a year of work. The first one that’s come out as a paper is surprising to me, from the team, because it’s the one I thought was the biggest science fiction. It’s the idea of taking a traditional sequential program written in a language like C or C++, and translating it automatically into a language like HydroLogic. In particular, we simply translated it into the CRDT. All we’re doing is we’re taking sequential data structures, and converting them into semilattices. This uses techniques called verified lifting that my colleague at Berkeley, Alvin Cheung invented. The lead student on this, Shadaj Laddad, is just a rockstar, amazing guy. He knocked this out so quickly, it was really impressive. I’m quite optimistic now that getting things into a IR like HydroLogic is going to be within our grasp in the next year or two.

Another piece of the puzzle, compiler optimizations for distributed programs. My student David Chu won an award at the ACM SOSP competition last year for Best Student Research Pitch, when he pitched this problem. The idea is to take techniques like CALM analysis, and apply them to protocols in distributed systems like Paxos. Actually, what you find is even though Paxos is a coordination protocol, inside of it, there’s monotone components that can be scaled up. Even inside the classic protocol for coordination, there’s things we can do underneath to scale it. The theme here is that, sure, not all programs are monotone, but most programs are mostly monotone, even programs like Paxos. The challenge here is not only to replicate with CALM, but also to shard the program intelligently. He’s using ideas essentially from SQL to do sharding. Then the goal here is to get compiler guarantees of correctness.

Then, finally, at the bottom of the system is this high-performance kernel for distributed execution, local data flow on local machines with network ports at the sides that communicate with low latency. In essence, it’s distributed flows of lattices. Then Rust is being used not only to generate really fast code, because it’s an LLVM language, but also to do the type checking for the distributed properties that we talked about in the previous slide. We’re beginning to make progress on the type system part. The Hydroflow runtime, however, is quite mature. I encourage you to have a look at it. It’s got a low-level language that can be used to program it in terms of data flows. We’ve got examples of programs that are written in Hydroflow today.

Hydroflow Example

Here’s a little example of a program that I wrote. It’s a program to build a chat server. This is the entire code in Hydroflow for that server, and the first six lines are really just about setting up the network ports. All the logic of the program is in the last four lines. This model of simple data flows or simple lattice flows at the bottom of the system is actually very expressive and compact. It’s really cool that you can get programs like this to be so small.

Key Takeaways

Embrace state in serverless computing, don’t settle for statelessness. Avoid coordination. You can do both, even though currently the vendors won’t let you. We can embrace state and avoid coordination in the monotone case. Move to the 21st century, when we talk about consistency, let’s stop talking about reads and writes or stores and accesses, let’s talk about the consistency of our application outputs. Then, the centerpiece of this whole thing foundationally is the CALM theorem, which is, if your programs are monotone, then there is an implementation that’s coordination free and is consistent. If your program is not monotone, no such implementation could exist. Finally, I showed you the example of how you can build a system where the data structures only grow, so the monotone data structures, and that system will be coordination free, and it’ll just go like crazy. This Anna programming model, the way that it was architected, I encourage you to look at that paper. The architecture is super simple. The thing just scales like crazy. It’s super-fast. Finally, the blue lollipop is the one that’s not quite ready yet, but I can see it over the horizon. New language stack for the cloud. The Hydro project at Berkeley is exploring this and I expect that we’ll have really usable results from this based on years of prior research. We’ll have usable results from this coming up in the next few years. Programming the cloud from my perspective is a grand challenge for computing. I hope more folks get involved, kicking the tires, generating ideas, building competitors. Let’s go solve this problem.

Resources

For more information, you can look at the Hydro website, hydro.run. All our code is on GitHub at hydro-project, particularly the Hydroflow kernel. It’s quite mature at this point. It’s going to be probably released into v 0.1 towards the end of 2022. The Sky Lab at Berkeley, sky.cs.berkeley.edu, you can learn about that more. The Aqueduct serverless scalable machine learning platform is at aqueducthq.com. If you have to pick three papers, you want to read the serverless paper, “One Step Forward, Two Steps Back,” the explainer paper in CACM on the CALM theorem, six pages long, and the vision paper for the Hydro project. Those are three you might want to pay particular attention to.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Podcast: Creating a Developer-Centric Culture and Building Platform as Runtime

MMS Founder
MMS Aviran Mordo

Article originally posted on InfoQ. Visit InfoQ

Subscribe on:






Transcript

Shane Hastie: Good day, folks. This is Shane Hastie for the InfoQ Engineering Culture podcast. Today I’m sitting down with Aviran Mordo, who is the VP of Engineering of Wix. Aviran, welcome. Thanks for taking the time to talk to us today.

Aviran Mordo: Hello and good evening.

Shane Hastie: Thank you, and good morning to you. I’m a New Zealand Aviran is in Israel, so the miracles of modern technology, we are able to communicate in real time. Aviran, the place I’d like to start with our guests is who’s Aviran?

Aviran Mordo: I’m VP of Engineering for Wix I’ve been at the company for 12 years. Before that, I worked at a lot of big companies like Lockheed Martin. I had my own startups. So I’ve been around for quite a while. I’ve been a very vocal advocate of the DevOps culture at Wix. We call it the developer-centric culture where we put the developers in the middle and then the organization helps developers to ship software faster, and that’s our goal, to try to figure out ways of how to ship software faster.

Shane Hastie: So what does a developer-centric culture look like and feel like?

In a developer-centric culture, the whole organisation is optimised to help ship software faster [01:24]

Aviran Mordo: If we think about development lifecycle, like an assembly line for cars, for instance, the developers are the assembly line because they build the product. So the software is the product, and they know the product the best. They know it better than the QA, testing, they know it better than the product that try to define them because they actually coded the actual product. So the whole developer-centric idea is the entire organization have a support circle with specialized professions that supports developers and help developers basically shift software faster.

Shane Hastie: And for the individual developer, how do we prevent this from feeling just like, “Work harder?”

Aviran Mordo: Because it’s not work harder, it’s work smarter. You do more things in the same amount of time that would’ve taken you before this whole continuous delivery movement or finding ways to do smarter things. If we’re talking about platform engineering or low code development where you codified a lot of the things ahead and you basically have to write less code and achieve much more in a lower amount of time with fewer people, which we know developers are very scarce and hard to recruit. So if you can do more, you just basically need less developers.

Shane Hastie: So one of the things you mentioned before we started recording was the concept of platform as runtime. What do you mean by that?

Platform as runtime [02:52]

Aviran Mordo: Platform as a runtime, this is a new concept that we have been internally working on at Wix. It relates to platform engineering. However, if you hear about all the talks about platform engineers and a lot of companies are talking with regard to platform engineering on the CI, Kubernetes, all those areas of the platform and Wix are basically past that thing because we built this platform many years ago and we took the platform engineering to the next level to codify basically a lot of the concerns that developers have in their day-to-day life.

But if we’ll take our own areas of business, which is basically business models, we build com platforms, we build blogs, we build events platforms or booking platforms. So those are all business applications. And for all business applications, you have basically the same concepts or the same concerns that you need to do, like throw domain events, model your entities in a way that is multi-tenant, you know, GDPR concerns, GRPC communications and how all those things are working together.

So usually most companies, we build our own frameworks and libraries that the common libraries that developers just build into their microservices deployables. And that creates a problem because while we build those libraries, we also build tons of documentations that developers need to understand to use those libraries. Okay, for GDPR, you have to be expert in privacy. And how does the AB testing system works for each company have their own AB test for feature flag systems, and how do you communicate via GRP? What are the headers that you need?

So tons of documentations that developers need to learn and understand. And what we did is that we codified a lot of these concepts into our platform. So basically we looked at the amount of lines of code that developers have to write, and by analyzing them, we saw that 80% of the lines of code are basically wiring stuff and configuring things and not actually writing the business logic that you have to write, the business value, that this is what we get paid for, to bring business value to the business, not to wire things.

And that’s 80% of our work is wiring stuff. So what we did is we coded it into a very robust framework or platform. We took this framework and instead of having developers to compile this loaded framework, which is basically 80% of your deployable is the framework. And we build a serverless cloud, our own serverless infrastructure, and we put the framework on the serverless cloud. And now developers, they don’t need to compile this code themselves, they just build against an interface of some sort, and they deploy their deployable into the cloud where they get the framework.

Benefits from platform as runtime [05:58]

Aviran Mordo: So the platform is basically their runtime. It runs their business logic for them. What that actually also gives us is the ability to have a choke point and to update. If you think about it, Wix have about 3000 unique microservices clusters. If we think about the Log4j security vulnerability that was discovered a few months ago now, what we had to do before we have the platform as a run time is to ask all the teams in Wix to rebuild 3000 microservices to recompile with, “Okay, we update the framework, we update the dependency. Now everybody have to rebuild, recompile and redeploy everything into the cloud.”

And that takes a lot of time. And of course, a lot of things get missed on the way. So by taking out the platform, the framework from the microservice itself and put it outside of the microservices as a runtime, we can control these whole common libraries that appears everywhere else in all the microservices and control it from a central place as the runtime. And that gives us a huge velocity gain. So before we build, we codified all this logic. So now developers don’t need to read dozens of documents just to understand how things works. Things just works for them.

And during this process, we were able to eliminate between 60 to 80% of the lines of code that developers have to write to achieve the same goal and reduce the time for development tremendously. From weeks, it took us like two to three weeks for a new microservices that doesn’t really do much. Basically a basic crowd operations because you really have to work hard to do all these wiring things into a matter of hours. So now developers in a matter of two or three hours, they can have a new microservice running on the cloud with all the legal requirements, the business requirement, the infrastructure, the ecosystem requirements that’s out there.

Shane Hastie: Who are the team that maintain this platform?

Maintaining the platform [07:59]

Aviran Mordo: So we have an infrastructure team and they work just like any other product team. So we looked at our platform as a product and we have technical product that go into the different teams at the company. They sit with them, they analyze their code, they build with them code and try to extract and find commonalities between different business units. They say, “Okay, this is something that is common to a lot of developers. So it’s common for Wix’s business areas. And we extract those things and we will create a team of people from different parts of the company. We try to understand, “Okay, what is the commonality code?” Put that into the platform and have that as a product. We had this concept before we kind of did this mind shift. Infrastructure teams, they tend to build things that are cool that they say, “Okay, this is what developers should know and understand.”

And then a lot of times the products that the infrastructure teams are building are not being used or being used in very small parts of the organizations. So what we did in our platform engineering team is we put one main KPI for our infrastructure team is you do something, you commit on adoption, you’re not succeeding just by finishing this new shining infrastructure. It’s just like when you develop a business product, you want customers to come and buy your product. So same for infrastructure teams. We commit on adoption. Our success is how many developers in the organization actually use this new infrastructure. And if they don’t use it means that we didn’t succeed in building the right product for them.

Shane Hastie: How do you prevent this from becoming a single point of failure? How do you keep quality in this platform?

Maintaining quality in the platform [09:52]

Aviran Mordo: There is always a single point of failure in every system. At Wix, we put a lot of emphasis, and especially in the infrastructure team, on quality. We are doing the best practices of test driven development and constantly monitoring things. But we also have the single point of failure is not really single because it is a distributed system. There are hundreds of servers running. So unless there is a bug, then you have to fix it. But for that, you know, got test, you got gradual rollout, you got AP testing, feature flex.

So all the best practices in the industry to prevent it. But if a service goes down, then that’s just one cluster of hundreds of cluster or thousands of clusters that are coming down. But it happens rarely, but it does happen. But we look at this runtime as a critical infrastructure, just like you would look at an API gateway. Just a critical infrastructure that essentially it is a single point of failure. So we look at it just as another layer, just like a load balancer or a gateway server. This is the platform, this is the runtime, it has to be at the highest quality and the best performance that we can get.

Shane Hastie: Another thing that you spoke about before we started recording was shifting left in the design space. What are you doing there? It sounds pretty interesting.

Shifting left in design [11:13]

Aviran Mordo: We talked about how we constantly trying to try ways in improving our development velocity. So up until now we talked about the backend side. So let’s talk about the frontend side. Since Wix is a website building platform, we employ in Israel the most amount of frontend developers in Israel and the largest design studio in the country. So we have a lot of experience of seeing frontend developers and designers how they actually work together.

And what we saw that they don’t really work together, they work alongside each other. So in most cases, when you want to design a new web application or a website, you have the designer design in whatever design tool that they want via Photoshop or Figma or any design tool that they feel comfortable with. And then they hand over the designs to the developers and they try to copy to the best of their ability and their tools into HTML and CSS.

It’s a web browser. So it’s a two different tools that are not always compatible in their capabilities. And there is always this back and forth ping-pong between designers and developers. And then for every change, even during the life cycle of a product, for every change that marketing want to do or the business product want to do, they have to go back to the developers, “Hey, I need to move this button from here to there. Please change this color or we want to do a test.” Or something like that. And that is kind of wasting the time and talent of developers because if I need to move something on the screens, it’s something that you don’t really need a developer to do. So we want developers and designers to work on the same tools. So this is where the movement of flow code comes into place.

So which is one of the players in this area. So we have our own product, it’s called Editor X, where designers and developers basically can work on the same tools while designers move components around. They put it on the screen, developers go in the same IDE, it’s basically a visual IDE and codify their components. So instead of having developers … I wouldn’t call it wasting their time, but investing their time in trying to move things around or doing the pixel perfect, messing around with CSS and browser compatibilities, developers and designers work together as one team.

So the design becomes just another developer with their own responsibility and developers codify the actual business logic that runs behind the component. So we created a holistic team that can actually work faster and better. And if product managers, they want to change something on the screen, they don’t need to go to developers, which is basically the most expensive resource in the organization. Our developers, they can have the designers or the UX expert just play around with the design, change the design without affecting the code because you are working on the same environment.

Shane Hastie: So this is a tighter collaboration. These are often people with very different skill sets, but also perspectives. How do you bring them together into this one team? How do you create the one team culture?

Enabling one-team culture [14:50]

Aviran Mordo: So this is something that is ingrained in the way that Wix is working. Our organizational structure is basically defined as we call them companies and guilds. So companies think about it as a group of people that are responsible for business domains. So they actually build the product and they have their own 1, 2, 3, how many teams that they need to work. So in this team, you have developers, you have QA, you have product managers, you also have UX experts and designers, and they’re all part of the same team.

And then you have the guilds, the professional guilds that are responsible for having the standards and the professional expertise and building the tools for each profession. So having them working on the same team, having the same business goals, it creates a holistic team that instead of throwing the work over defense, and this is tied up to all the DevOps, continuous delivery culture that instead of throwing the responsibility, “Okay, I’m done with my work now. Okay, product manager done with the work, now go developers, develop now, QA tested.” And in our system they operated, we create this whole, again, it’s this whole dev-centric, developer-centric culture that it’s one team with experts, with subject expert, they’re all working together as a team to basically help the developers basically ship the product. So designers just become an integral part of the development team.

Shane Hastie: That structure makes sense with the companies and the guilds, what you’re calling the company there, I would possibly have called it a value stream. How are these companies incentivized?

Incentivizing internal companies [16:33]

Aviran Mordo: Well, they have their own business KPIs, and it’s not a P& L, especially at Wix because we have companies that their product earn money. For instance, the e-commerce platform, this is a product that makes money. You have the people going and buying and stuff. But we have another company that’s, for instance, the blog company. The blog is an add-on. It’s a necessity for the whole Wix ecosystem because most websites need blog. But blog is a free product. So each company have their own KPI. So if the blog KPI will be adoption or a number of customers, they’re not paying customers.

But we see that there is an adoption to the product. A lot of websites have installed blogs in their website while let’s say the econ platform or the payments company that are responsible for receiving processing payments and stuff, they have their own KPIs. So payments is not something that you sell, but it’s something that you process. So their payments KPI will be the number of transactions that we processed or how many customers decided to install, for instance, Wix payments as opposed to just use PayPal or any alternative payments, so they have their own KPIs and the ecom platform have their own KPIs, right? Number of stores or in the stores, they know the number of transactions or the GPV, each company have their own metrics.

Shane Hastie: And how do you keep those aligned with the overall organizational goals?

Aviran Mordo: That comes to our organizational structure. We specifically call them companies and not value streams because we treat them as startups within the big organizations. So each company have their own quote unquote CEO. We call them head of company and they have their own engineering manager. And basically they get funded by Wix as a whole. And while they have their own KPIs, if you think about startups, startups have board of directors that they align them with strategic goals. So each company have, we call them a chairman, which is basically the board of director.

The chairman is one of the Wix’s senior management. And between the head of company and chairman, they can decide like 95 of the things they can decide together without involving all of Wix’s management. So the chairman, which is serves as the board of director, aligns the company with the needs of Wix and also considers the needs of this mini-company itself within Wix. So it’s just like any other startup.

Shane Hastie: Interesting structure and sounds like an interesting place to work. Thank you very much. Some great examples of tools, techniques to improve velocity, and I think there’s a lot that our audience will take away from this. If people want to continue the conversation, where do they find you?

Aviran Mordo: They can find me on LinkedIn mostly. I’m also on Twitter, but mostly on LinkedIn.

Shane Hastie: Wonderful. Thank you so much.

Aviran Mordo: Thank you very much. It’s been a pleasure.

Mentioned:

About the Author

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Log4Shell Response Patterns & Learnings From Them

MMS Founder
MMS Tapabrata Pal

Article originally posted on InfoQ. Visit InfoQ

Transcript

Pal: I’m Topo Pal. Currently, I’m with Fidelity Investments. I’m one of the vice presidents in enterprise architecture. I’m responsible for DevOps domain architecture and open source program office. Essentially, I’m a developer, open source contributor, and a DevOps enthusiast. Recently, I authored a book called, “Investments Unlimited.” It is a story about a bank undergoing DevOps transformation, and suddenly hit by a series of findings by the regulators. The book is published by IT Revolution and available on all the major bookselling sites and stores, so go check it out.

Interesting Numbers

Let’s look at some interesting numbers. By 2024, 63% of world’s running software will be open source software. In AI/ML, specifically, more than 70% software is open source. Top four OSS ecosystems have seen 73% increase in download. Python library downloads have increased by 90%. A typical microservice written in Java has more than 98% open source code. If you look at the numbers, they’re quite fascinating. They’re encouraging, they’re interesting, and scary, all at the same time. Interesting is just the numbers, 63%, 70%, these are all just really interesting numbers. Encouraging is big numbers, just seeing where open source has come to. The scary is, again, those big numbers from an enterprise risk standpoint, which basically means that most of the software that any enterprise operates and manages are out of their hand, out of their control. They’re actually written by people outside of their enterprises. However, about 90% of IT leaders see open source as more secure. There’s another argument against this, the bystander effect. Essentially, everyone thinks that someone else is making sure that it is secure. The perception is that it is more secure, but if everybody thinks like that, then it may not be the reality, in whatever it is, just the fact that 90% leaders trusting open source is just huge. Until this happens, everything goes in smoke when a little thing called Log4j brings down the whole modern digital infrastructure.

Log4j Timeline

Then, the next thing to look at is the timeline of Log4Shell. I took this screenshot from Aquasec’s blog. This is one of the nicest timeline representations of Log4j. This is just for reference. Nothing was officially public until the 9th of December 2021. Then, over the next three weeks, there are three new releases and correspondingly new CVEs against those, and more. It was a complete mess. It continued to the end of December, and then bled into next year, January, which is this year. Then February, and maybe it’s still continuing. We’ll see.

Stories About Log4j Response

Let’s start with the response patterns. I’ll call them stories. As a part of writing this paper, we interviewed or surveyed a handful number of enterprises, across financial, retail, and healthcare companies of various sizes. These companies will remain anonymous. The survey is not scientific by any means. Our goal was not to create a scientific survey, we just wanted to get a feel of what the industry and the people in the industry experienced during this fiasco, how they’re impacted. I think most companies will fit into one of these stories. These stories are not like individual enterprises that we picked from our sample. This is kind of, collected the whole stories and grouped them in three different buckets, and that’s how we are presenting. Even though it may sound like I’m talking about one enterprise, in each of these stories, but they’re not. I also thought about the name of the story, the good, the bad, and the ugly. I don’t think that that’s the right way to put it. We’ll do number one, number two, and number three, as you will see that there is nothing bad or ugly about it. It’s just about these enterprises’ maturity. The stories are important here, because you’ll see why things are like what they are.

The question is, what did we actually look at when we built the stories and created these response patterns? These are the things that we looked at. Number one, the detection, or when someone in a particular enterprise found out about this, and then, what did they do? Number two is declaration. At what point there is a formal declaration across the enterprise saying that there is a problem. Typically, this is from a cyber group or information security department, or from the CISO’s desk. Next is the impact analysis and mitigation. How did these enterprises actually go for impact analysis and mitigation? Mitigation as in stop the bleed. Then the remediation cycles. We call it cycles intentionally, because there was not just one single remediation, there are multiple of those there, because as I said, multiple versions came along in a period of roughly three weeks. Then the cycles are also some things, some cycle that are generated by the enterprise risk management processes that go after this set of applications first, then the next set, then the third set. We’ll talk about those. The last one is the people. How did the people react? What did they feel? How did they collaborate? How did they communicate with each other? Overall, how did they behave during this period of chaos? We tried to study that. Based on these, we formed these different patterns, the three patterns, and I’m calling that, three stories.

Detection

With that, let’s look at the first news of Log4j vulnerability, the power of social media. It came out at around 6:48 p.m., December 9, 2021. I still remember that time because I was actually reading that when it popped up on my Twitter feed. I actually called our InfoSec people. That’s the start of Log4j timeline. With that, let’s go to the detection. On 2021 December 9th, let’s talk about the detection. Note that, InfoSec was not the first one to know about it. Maybe they knew about it, but they’re not the first one. Right from the get-go, there’s a clear difference between the three with regards to just handling the tweet. Number one, cyber engineering, some engineers in the cyber security department noticed the tweet and alerted their SCA, or software composition analysis platform team immediately. Then other engineers noticed that too in their internal Slack channel. SCA platform team sends inquiry email right then and there to the SCA vendor in that evening itself.

Number two, similarly, some engineers noticed that tweet and sends the Twitter link to the team lead. Then the team lead notifies the information security officer, and it stops there. We don’t know what happened after that, in most of the cases. Number three, on the other hand, there’s no activities to report, nobody knows if anybody has seen that, or if they did anything. Nothing to track. Notice the difference between number one and number two. Both of them are reacting, their engineers are not waiting for direction. We don’t know what the ISOs did after number two, actually. The team lead in number two enterprise notified the ISO, and we don’t know what the ISO did after that. Let me now try to represent this progress in terms of some status so that we can visualize it and understand what’s going on here. In this case, number one is on green status on zero day. Number two is also green. Number three is kind of red because we don’t know what happened. Nobody knows.

Declaration

With that, let’s go to the next stage, which is declaration, or in other words, when InfoSec or cyber called it a fire, that we have a problem. On December 10th, morning, the SCA vendor confirms zero day. Engineers at the same time almost start seeing alerts in their GitHub repository. Then the cyber team declares zero day, informs the senior leadership team and the mass emails go out. This is all in the morning. I think they are on green on day one, they reacted in a swift manner that put them in the green status. Number two, ISO notifies the ISO who was informed about this, on the previous day. He notifies the internal red team. The red team discusses with some third-party vendors, and they confirm that it is a zero day. Then CISO is informed by midday. Then, senior leadership meeting during the late afternoon, they discuss the action plan. They’re on yellow because, first of all, that’s late in the day that they’re actually discussing about the senior leadership team, and about action plan. They should have done it early in the morning, I think, as number one. Number three, on the other hand, the red team receives daily feed in the morning about their security daily briefing. Then Enterprise Risk Office is notified before noon. Then the risk team actually met with the cyber and various “DevOps” teams. The first thing they do is enter the risk in their risk management system. That’s why they’re yellow.

Impact Analysis and Mitigation

On the impact analysis and mitigation process, this is about, first try to stop the bleed. Second, where the impact is, which application, which platform, which source code? Then, measure the blast radius and find out what the fix should be, and then go about it. Number one, before noon on December 10th itself, right after they send out that mass email, they actually implemented the new firewall rule to stop the hackers coming in. By noon, the impacted repositories were identified by the SCA platform team, so they have a full list of repositories that needed to be looked at or at least impacted from this Log4j usage. By afternoon, external facing applications start deploying the fix via CI/CD. Note that these are all applications that are developed internally. These are not vendor applications, not vendor supplied or vendor hosted applications. Overall, I think number one has done a good job. They responded really quickly. On the same day, they started deploying the changes to external facing applications. Number two, on the other hand, they implemented the firewall rule at night. You may ask, why did they do that, at night or wait until the night? I do not know that answer. Probably, they’re not used to changing production during the day, or maybe they needed more time to test it out.

Number two, however, struggled to create a full impact list. They had to implement custom scripts to detect all Log4j library usage in their source control repositories. I think they are a little bit behind, because they actually took a couple extra days to create the full impact list. They really did not have a good one to start with. All those custom scripts were created. Number three, on the other hand, they went to their CMDB system to create a list of Java applications from their CMDB. There are two assumptions to it. One is that they assume that all Java applications are impacted, whether or not they used that particular version of Log4j. Number two is they assumed their CMDB system is accurate. The question is, how did the CMDB system know which Java applications there are. It’s, essentially, developers are developing things manually, identify their CMDB entries to have Java programming languages in those. Based on that CMDB entry list, email is sent out to all the application owners, and then project management structure was established to track the progress. Nothing is done yet. Nothing is changed. It’s just about how to manage the project and getting a list. I think they are great, on the fourth day.

Remediation Cycle

Let’s look at the start of the remediation cycle. Of course, there are multiple rounds of remediation. One thing I’ll call out is that in all the enterprises that we spoke to, they all had the notion of internal facing application versus external facing application. Any customer facing application is external facing application, the rest are internal facing application. Every enterprise seemed to have a problem in identifying these applications and actually mapping them with a code repository or a set of code repositories, or an artifact in their artifact repository. It is more of a problem if you use some shared library or shared code base all across your applications across the whole enterprise. Then, we also notice that all enterprises have a vulnerability tracking and reporting mechanism that is fully based on these application IDs in their IT Service Management System, where, actually, the applications are marked as internal versus external, they have to fully rely on that. We may or may not agree with this approach, but this is a common approach across all the big enterprises that we talked to. I think it’s a common struggle.

Back to the comparison. Number one, during December 11th and 12th, they fixed all their external facing applications and then they started fixing their internal facing applications on December 13th. I think they’re a little bit yellow on that side, and I think they struggled a bit in finding out these two sets of applications, internal versus external, and spent some cycles there. Number two, on the other hand, they actually reacted to the new vulnerability on 13th, now they are in the middle of fixing for the first one, the second one came in, and then confusion. On December 14th, they opened virtual war rooms all on Zoom, because people are still remote at that point, 50% of external facing applications fixed their code, and some of them are in production, and most of them are not. Number three, on the other hand, they are manually fixing and semi-manually deploying over the period of December 14th and 16th. They do have incorrect impact analysis data, so basically, they’re fixing whatever they can and whatever they know about. Project is of course in red status, and we don’t disable that, so we put a red status at this remediation cycle.

Remediation cycle continues. Number one, through the period of 13th to 19th, they even fixed two new vulnerabilities, two new versions, Log4j 2.15 and then 2.16. That’s two more cycles of fixes all automated, and then all external facing applications are fixed. Number two, the war room continues through the whole month of December, external facing applications fixed multiple times via semi-automated pipeline. Internal facing applications must be fixed by end of January 2022. They did actually determine that with the firewall mitigation and external facing applications being fixed, they don’t have that much risk with internal facing. They took that towards that, let’s give one month to the developers to fix all the internal facing applications. Number three, some applications are actually shut down due to fear. This is quite hard to believe, but this is true. That they got so scared that they actually shut down some customer facing applications so that no hacker from outside can come in. At that time, they basically did not have a clear path in sight to fix everything that were really impacted. At this point, I’d say number one is green, and number two is kind of, on the 20th day, they’re still red, because they’re not done yet. Number three, who knows? They’re red anyway.

Aftershock Continues

Aftershock continues. Vendors start sending their fixes starting January 2022 for number one, but I think overall, within 40 days they fixed everything. I think they’re green. For number two, it took double the time, but I think they reached, so that’s why they’re green. They at least finished it off. Number three, most known applications are fixed. The unknowns are not of course fixed. As I said, at least number one and number two, they claimed to have fixed everything. The interesting part, maybe it’s because of the number three, is that from Maven Central standpoint, Log4j 2.14 and 2.15 are still getting downloaded as I speak. Which enterprises are downloading? Why? Nobody knows, but they are, which basically tells me that not all Java applications that were impacted by Log4j 2.14 actually got remediated, so they’re still there. Maybe they have custom configurations to take care of that, or they’re not fixed. I don’t know. I couldn’t tell. At this point, if you look at the status, as I said, it’s number one, green, number two, green, and number three is red.

Effect on People

Let’s talk about the effect on people. Most of us, the Java developers, and people around the Java applications, they want to forget these few weeks and months. Actually, it was the winter vacation that got totally messed up. Everyone saw some level of frustration, burnout, and had to change their winter vacation plan. Number one, the frustration was mostly around these multiple cycles of fixes, you fixed it and you thought you’re done, and you’re packing up for your vacation. Then, your boss called you and said that, no, you need to delay a little bit. Overall, few had to delay their vacation for a few days. Overall, no impact on any planned vacation. Everybody who had planned for vacation actually went for vacation happily. Number two, people got burnt out. Most Java developers had to cancel vacation. Many other canceled their vacations around the Java development teams like management, scrum masters, and release engineers, and so on. The other impact of this was that for most of these enterprises, January and February actually became their vacation months, which basically means that the full year, 2022’s feature delivery planning got totally messed up because of that. Number three, people really got burnt out. Then, their leadership basically said that, ok, this is the time where we should consider DevSecOps, as a formal kickoff of their DevOps transformation. We don’t know, we did not keep track as to what happened after that.

Learnings – The Good

That’s the story. We wrapped up the story, or the response patterns. Let’s go through the learnings. The learnings have two sides to it. One is, let’s understand first, the key characteristics of these three groups that we just talked about. Then, from those characteristics, we will try to form the key takeaways. Let’s look at the characteristics for the good one, or the number one. They had software composition analysis tool integrated with all of their code repository, so every pull request got scanned. If any vulnerabilities were detected in that repository, they got feedback through their GitHub Issues. They had complete software bill of material for all repositories. I’m not talking about SPDX or CycloneDX form of SBOM, these are just pure simple dependency lists, and they have metadata for every repository. They also had fully automated CI/CD pipeline. As you noticed here, that unless a team had a fully automated CI/CD pipeline, manually fixing and deploying three changes in a row within two weeks is not easy all across the enterprise.

They had mature release engineering and change management process that are integrated with their CI/CD pipeline that helped them actually to carry out this massive amount of changes within a short timeframe, across the whole enterprise. They also had good engineering culture collaboration, as you have seen during the declaration and detection phases. The developers were empowered to make decisions by themselves. As I mentioned, during the declaration phase, the developers knew that it is coming and it is bad, and it’s going to be bad. They did not actually wait for the mass email that came out from the CISO’s desk. They actually were trying to fix right from the get-go. They noticed it first and then they took it upon themselves to fix that thing. They did not wait for anyone. That’s called developer’s empowerment. Also, the culture that the developers can make a decision by themselves, whether to fix it or not, and actually not wait for InfoSec. Then, of course, this resulted in good DevOps practices, good automation all the way through. Most of these applications in these enterprises, they resided on cloud, so no issue there.

Learnings – The Not-So-Good

On the other hand, the not-so-good, or number two. They had SCA coverage, but limited. It was not shifted left. Most of these enterprises for most of these applications, they scanned right before the release, which was too late. They are on the right side of the process, not on the left side. They did not have the full insight into the software bill of materials or simple dependency tree. They had some. They had semi-automated CI/CD pipeline, which basically means complete CI automation, but not so much on the CD side or the deployment side. The did have good change management process. What I’m saying is that these enterprises are actually very mature enterprises. It’s just that they got hit with Log4j in a bad way. They do have good team collaboration. The cyber engineers saw that, then immediately the engineer passed it on to the team’s lead who passed it on to the ISO. That’s a good communication mechanism and team collaboration. They did have a bit of hero culture, since they did not have full SCA coverage, basically, some of their developers stood up and said that we are going to write a script, and they’re going to scrape the whole heck out of their internal GitHub and report out on all the repositories that had a notion of Log4j in their dependency file, or pom.xml, or Gradle file. With that, they struggled quite a bit.

Learnings – The Bad

Let’s talk about number three, they did not have anything. No SCA, no tooling. As I said, classic CMDB driven risk management. It’s basically, look at your CMDB, look at all the Java applications and just inform the owner, and it’s their responsibility. We’re just going to track from a risk management perspective. They did have DevOps teams, but they were not so effective. They’re just called DevOps teams, probably, they’re just renamed Ops team and not so much DevOps. Of course, as you can tell, that they have a weak engineering culture and silos. Basically, even if somebody saw something, they would not call cyber, because they don’t know that they’re supposed to do that, or they’re empowered to do that, or anybody would entertain anybody’s communication that way. We don’t know, so let’s not make any judgment call. I think, overall, what we found is that they had weak engineering culture across the enterprise.

Key Takeaways

The key takeaways from these. I would like to take a step back and try to answer these four questions. Number one, what are the different types of open source software? How do you bring them in, in your enterprise? Then, once we bring them in, how should we keep track of their usages? Then the last question is, what will we do when the next Log4Shell happens? Are we going to see the same struggle or things are going to improve? Let’s go through one by one. The first thing, open source come in many forms. Some are standalone executables. Some are development tools, utilities. Some are dependencies, that’s the Log4j’s. These are the things that get downloaded from artifact repositories and gets packaged with the actual application. Even source code that gets downloaded from github.com, and then they get into the internal source code repositories or source code management system. They come in different formats too, like jar, tar, zip, Docker image. These are the different forms of open source, Log4j is just the third bullet point, they fall into the dependencies. Most of the time, we don’t look at the other types. They also come in via many paths. First, developers could directly download from public sites. Some enterprises have restrictions around that, but that creates a lot of friction between the proxy team and the developer or the engineering team. Some enterprises chose not to stop that, which basically means developers can download any open source from public sites.

Pipelines can download build dependencies from their continuous integration pipeline. Most of these build dependencies, in all the enterprises that we saw, in number one and number two group, they actually downloaded from an internal artifact repository like Artifactory, or Nexus. For standalone software tools and utilities, many enterprises have an internal software download site. Then production systems sometimes directly download from public sites, which is pretty risky, but it happens. Then, open source code gets into internal source code. We don’t usually pay attention to all of these paths, we just think about the CI pipeline downloading Log4j and all that, but the other risks are already there. All of these are known, if we narrow it down, we could write down more actually. Then, what are the risks involved with that? We need to do the risk mitigation or risk analysis on that anyway.

To track how they’re used or if you want to manage this usage, you firstly need to know what to manage. What to manage comes from what we have. First thing is for all applications, whether they are internally developed applications, or vendor software, or standalone, another open source software, we need to gather all of them and their bill of materials, and store them in a single SBOM repository. This could be, as I said, internally developed application, runtime components, middleware, databases, tools, utilities, anything that is used in the whole enterprise. Then we need to continuously monitor that software bill of materials, because without the continuous monitoring of the software bill of materials, it’s hard to tell whether there is a problem or not, or when there is a problem. With continuous SBOM monitoring, we need to also establish a process to assess the impact. If that continuous monitoring detects something and creates an alert, what is going to happen across the enterprise? What is the process of analyzing that impact? Who is going to be in charge of that? All this process needs to be documented. Then, establish process to declare vulnerability. At which point, who is going to declare that we have a big problem and what is the scope of the problem? Who is going to get notified and how? All this needs to be written down, so that we do not have to manufacture these, every time there is a Log4Shell. Without these, we will run into communication problems, communication breakdown. It will unnecessarily cause a lot of pain or friction, to a lot of folks, that we do not want.

In terms of mitigation and remediation, we need to decide whether we can mitigate only, or we need to remediate or fix the root cause as well. In some cases, just mitigation is good enough. Next, do we need to manually remediate or we need to automate that? It depends upon what it is, how big it is, and whether we have the other things to support us in this automated mitigation process. Prioritization is another big thing, because as I said, external versus internal, or what kind of open source it is, the prioritization will depend upon that, and its risk profile. That prioritization itself can be a complex one and be time consuming one too. Then, deploying the remediation, automation is the only way to go.

The last thing, probably the most important tool that we have is the organizational culture. I’m convinced that the Log4Shell response within an enterprise talks a lot about the enterprise’s DevOps and DevSecOps technical maturity and culture, the engineering culture, leadership style, organizational structure, how people react and collaborate, and their overall business mindset. Do they have blameless culture within the organization? That is also something that comes out, out of these kinds of events. We often use the term shift left security. We most likely mean that having some tools in our delivery pipeline or code will solve the problem, but it is more than that. We need to make sure we empower the developers to make the right decisions while interacting with these tools. Each of these tools will generate more work for the developers. How do we want the developers to prioritize these security issues over feature delivery? Can they design? Are they empowered to design themselves? Lastly, practicing chaos. Just like all other chaos testing, we need to start thinking about practicing CVE testing, chaos testing. Practice makes us better. Otherwise, every time there is a Log4Shell, we will suffer. Spend months chasing this, and we will have demotivated associates, and that will not be good. Because the question is, not if, but the question is, when will there be another Log4Shell?

Questions and Answers

Tucker: Were there any questions in your survey that asked about the complete list of indicators of compromise analysis?

Pal: Yes, partially, but not a whole lot. We just wanted to know the reaction more than the action. We did ask about, how do you make sure that everything is covered? We did ask about whether you completely mitigated, or remediated, or both. That was not our intent of the questions.

Tucker: As a result of all the research that you’ve done, what would you prioritize first to prepare for future vulnerabilities, and why?

Pal: First, we have to know what we have. The problem is that we don’t know what we have. That is because, first of all, we don’t have the tools. Neither do we have a collective process of identifying the risks where we need to focus on. There are many forms of open source that we don’t have control over. There are many ways that we can get them in our environment. We only focus on the CI/CD part, and that too in a limited way. We have a lot more way to go. The first thing is, know what you have first before you start anywhere else.

Tucker: Are you aware of any good open source tools or products for helping with that, managing your inventory and doing that impact analysis?

Pal: There are some good commercial tools. In fact, on GitHub itself, you have Dependabot easily available and few other tools. Yes, start using that. I think we do have some good tools available right away, you should just pick one and go.

Tucker: Once you know what you have that you need to deploy, and you’re able to identify it, I know that you talked about automation for delivery and enabling automated deployment release. What technical foundations do you think are really important to enable that?

Pal: In general, to enable CD, continuous delivery in a nice fashion, I think the foundation block is the architectural thing. Because if it is a monolithic architecture, and we all know that if it is a big monolithic application, yes, we can get to some automated build and automated deployment, but not so much on the continuous integration in the classic sense, and in the continuous delivery in the classic sense. That can be automation, but that’s about it. To achieve full CI and CD, and I’m talking about continuous delivery, not continuous deployment, we actually need to pay attention to what our application architecture looks like, because some architecture will allow you to do that, some architecture will not allow you to do that, period.

Tucker: What are some other patterns that you would recommend to allow you to do that?

Pal: For good CI/CD, decoupling is number one, so that I can decouple my application with respect to anything else, so that I can independently deploy my changes, and I don’t have to depend upon anything else, or anybody else around me. Number two is, as soon as I do that, then the size of my thing comes into picture. The ideal case is, of course, the minimal size, which is the microservice, I’m going towards that term without using that term. Essentially, keep your sizes low, be lean and agile, and deliver as fast as possible without impacting anybody else.

Tucker: One of the things that I love that you brought up in your talk was the idea of practicing chaos. In this context, what ideas do you have for how that would look?

Pal: If we had the tools developed, and we have an inventory of things that we have in terms of open source, just like chaos, let’s find out how many different ways we can actually generate a problem, such as declare a dummy CVE as a zero-day vulnerability, and score 10, and see what happens in the enterprise. Those kinds of things, so that you can formulate few scenarios like that, and see what happens. You can do that in multiple levels. You can do it at the level where it’s just the developer tools on somebody’s laptop, versus a library like Log4j that is used everywhere, almost. There are various levels that you can do that.

Tucker: I love that idea, because I think with these things that we don’t practice very often that could give us a muscle to improve and measure our response over time.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


SQL vs NoSQL – Ashley Biddle – Medium

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Global NoSQL Database Market to Witness Exponential Rise in Revenue Share during …

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

New Jersey, United States – The Global NoSQL Database Market is comprehensively and accurately detailed in the report, taking into consideration various factors such as competition, regional growth, segmentation, and market size by value and volume. This is an excellent research study specially compiled to provide the latest insights into critical aspects of the Global NoSQL Database market. The report includes different market forecasts related to market size, production, revenue, consumption, CAGR, gross margin, price, and other key factors. It is prepared with the use of industry-best primary and secondary research methodologies and tools. It includes several research studies such as manufacturing cost analysis, absolute dollar opportunity, pricing analysis, company profiling, production and consumption analysis, and market dynamics.

The competitive landscape is a critical aspect every key player needs to be familiar with. The report throws light on the competitive scenario of the Global NoSQL Database market to know the competition at both the domestic and global levels. Market experts have also offered the outline of every leading player of the Global NoSQL Database market, considering the key aspects such as areas of operation, production, and product portfolio. Additionally, companies in the report are studied based on key factors such as company size, market share, market growth, revenue, production volume, and profits.

Get Full PDF Sample Copy of Report: (Including Full TOC, List of Tables & Figures, Chart) @ https://www.verifiedmarketresearch.com/download-sample/?rid=129411

Key Players Mentioned in the Global NoSQL Database Market Research Report:

Objectivity Inc, Neo Technology Inc, MongoDB Inc, MarkLogic Corporation, Google LLC, Couchbase Inc, Microsoft Corporation, DataStax Inc, Amazon Web Services Inc & Aerospike Inc.

Global NoSQL Database Market Segmentation:  

NoSQL Database Market, By Type

• Graph Database
• Column Based Store
• Document Database
• Key-Value Store

NoSQL Database Market, By Application

• Web Apps
• Data Analytics
• Mobile Apps
• Metadata Store
• Cache Memory
• Others

NoSQL Database Market, By Industry Vertical

• Retail
• Gaming
• IT
• Others

The report comes out as an accurate and highly detailed resource for gaining significant insights into the growth of different product and application segments of the Global NoSQL Database market. Each segment covered in the report is exhaustively researched about on the basis of market share, growth potential, drivers, and other crucial factors. The segmental analysis provided in the report will help market players to know when and where to invest in the Global NoSQL Database market. Moreover, it will help them to identify key growth pockets of the Global NoSQL Database market.

The geographical analysis of the Global NoSQL Database market provided in the report is just the right tool that competitors can use to discover untapped sales and business expansion opportunities in different regions and countries. Each regional and country-wise Global NoSQL Database market considered for research and analysis has been thoroughly studied based on market share, future growth potential, CAGR, market size, and other important parameters. Every regional market has a different trend or not all regional markets are impacted by the same trend. Taking this into consideration, the analysts authoring the report have provided an exhaustive analysis of specific trends of each regional Global NoSQL Database market.

Inquire for a Discount on this Premium Report @ https://www.verifiedmarketresearch.com/ask-for-discount/?rid=129411

What to Expect in Our Report?

(1) A complete section of the Global NoSQL Database market report is dedicated for market dynamics, which include influence factors, market drivers, challenges, opportunities, and trends.

(2) Another broad section of the research study is reserved for regional analysis of the Global NoSQL Database market where important regions and countries are assessed for their growth potential, consumption, market share, and other vital factors indicating their market growth.

(3) Players can use the competitive analysis provided in the report to build new strategies or fine-tune their existing ones to rise above market challenges and increase their share of the Global NoSQL Database market.

(4) The report also discusses competitive situation and trends and sheds light on company expansions and merger and acquisition taking place in the Global NoSQL Database market. Moreover, it brings to light the market concentration rate and market shares of top three and five players.

(5) Readers are provided with findings and conclusion of the research study provided in the Global NoSQL Database Market report.

Key Questions Answered in the Report:

(1) What are the growth opportunities for the new entrants in the Global NoSQL Database industry?

(2) Who are the leading players functioning in the Global NoSQL Database marketplace?

(3) What are the key strategies participants are likely to adopt to increase their share in the Global NoSQL Database industry?

(4) What is the competitive situation in the Global NoSQL Database market?

(5) What are the emerging trends that may influence the Global NoSQL Database market growth?

(6) Which product type segment will exhibit high CAGR in future?

(7) Which application segment will grab a handsome share in the Global NoSQL Database industry?

(8) Which region is lucrative for the manufacturers?

For More Information or Query or Customization Before Buying, Visit @ https://www.verifiedmarketresearch.com/product/nosql-database-market/ 

About Us: Verified Market Research® 

Verified Market Research® is a leading Global Research and Consulting firm that has been providing advanced analytical research solutions, custom consulting and in-depth data analysis for 10+ years to individuals and companies alike that are looking for accurate, reliable and up to date research data and technical consulting. We offer insights into strategic and growth analyses, Data necessary to achieve corporate goals and help make critical revenue decisions. 

Our research studies help our clients make superior data-driven decisions, understand market forecast, capitalize on future opportunities and optimize efficiency by working as their partner to deliver accurate and valuable information. The industries we cover span over a large spectrum including Technology, Chemicals, Manufacturing, Energy, Food and Beverages, Automotive, Robotics, Packaging, Construction, Mining & Gas. Etc. 

We, at Verified Market Research, assist in understanding holistic market indicating factors and most current and future market trends. Our analysts, with their high expertise in data gathering and governance, utilize industry techniques to collate and examine data at all stages. They are trained to combine modern data collection techniques, superior research methodology, subject expertise and years of collective experience to produce informative and accurate research. 

Having serviced over 5000+ clients, we have provided reliable market research services to more than 100 Global Fortune 500 companies such as Amazon, Dell, IBM, Shell, Exxon Mobil, General Electric, Siemens, Microsoft, Sony and Hitachi. We have co-consulted with some of the world’s leading consulting firms like McKinsey & Company, Boston Consulting Group, Bain and Company for custom research and consulting projects for businesses worldwide. 

Contact us:

Mr. Edwyne Fernandes

Verified Market Research®

US: +1 (650)-781-4080
UK: +44 (753)-715-0008
APAC: +61 (488)-85-9400
US Toll-Free: +1 (800)-782-1768

Email: sales@verifiedmarketresearch.com

Website:- https://www.verifiedmarketresearch.com/

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.