Trust-Driven Development: Accelerate Delivery and Increase Creativity

MMS Founder
MMS Ben Linders

Article originally posted on InfoQ. Visit InfoQ

By building trust you can break silos, foster collaboration, increase focus, and enable people to come up with creative solutions for products and for improving their processes.

Tomasz Manugiewicz spoke about how building trust impacts software delivery and creativity at ACE conference 2022.

The DevOps movement was created to break the silos in the organisations, Manugiewicz explained:

We want Dev people to talk and work with Ops team, breaking the opposition of “we” and “they”. In order to establish smooth cooperation between them and eliminate blaming culture, we need these two teams to trust each other.

Trust can be built in such circumstances by organising pair programming across various functions and various teams, Manugiewicz mentioned.

When people trust each other, they can focus their energy on creating a product, improving processes and of course, on programming, Manugiewicz said. That’s why Google pays attention to the psychological safety in and between teams, he mentioned.

Manugiewicz explained what can happen when there’s a lack of trust:

If people don’t trust each other and don’t feel safe, they invest their energy and time to secure themselves by various corporate actions that we all know.

Manugiewicz mentioned that once we have people’s energy focused on the actual work, they can be creative and improve not only the product, but also the process of production of such a product. With this, he meant all the automation that can be done to make the process smoother.

A creative process assumes that we come up with many various ideas. In a safe environment we are not afraid to generate those which are not perfect and ideal, which can inspire us to find more creative solutions, Manugiewicz concluded.

InfoQ interviewed Tomasz Manugiewicz about applying trust-driven development to accelerate delivery and increase creativity.

InfoQ: How can building trust help to break silos in organizations?

Tomasz Manugiewicz: Building trust means training people first to have a cross-functional crew. In those circumstances, people can build cognitive aspects of trust as they show that they are able to deliver results. They also can understand each other and do exercises like pair programming. And once they start talking to each other, the emotional aspect of trust can be established.

It’s as simple as giving them knowledge and tools and letting them cooperate together.

InfoQ: How can we accelerate delivery by increasing trust?

Manugiewicz: The DevOps evolution model shows how it can be done. It starts with manual execution of ad-hoc tasks, then goes on with more planned tasks which can be scripted.

Moving further down the line we can observe some groups of tasks- let’s call them activities- which can be grouped and orchestrated by specific tool; in this case people need to agree that their tasks will be automated by DevOps tools.

There are also some actions that need to still be done in manual or semi-manual manner – so autonomy is needed. And this is where trust pop-ups because to give someone autonomy, the manager needs to trust this person or team.

The last stage of DevOps evolution model is the self-learning part – so we are coming back to this cognitive aspect. It is not only about increasing human skills – it is also about increasing learning by an algorithm itself, so machine learning as we call it.

I encourage you to read further about the DevOps evolution model in Puppet resources at Puppet’s new Scaling DevOps Service helps orgs scale DevOps practices.

InfoQ: How do trust and relationships influence creativity?

Manugiewicz: I have seen this in retrospectives. I had one team who was really good at listening to all the things that needed to be improved, but once we were discussing solutions it was difficult for them to come up with solutions. It was quite a new team and trust wasn’t established yet between team members. Once they built trust over time, they started sharing ideas for solutions; they were challenging each other and as a result produced great and creative solutions.

Earlier InfoQ interviewed Tomasz Manugiewicz about Building Cognitive and Emotional Pillars.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


PostgreSQL Interface for Cloud Spanner Now Generally Available

MMS Founder
MMS Renato Losio

Article originally posted on InfoQ. Visit InfoQ

Google Cloud recently announced the general availability of the PostgreSQL interface for Cloud Spanner. The new interface increases the portability of workloads to and from Spanner and provides a globally distributed option to developers already familiar with PostgreSQL.

Announced in preview last autumn, the new functionality adds another option to the Google Standard SQL interface on Google Cloud. While the syntax supported is similar to standard PostgreSQL, workloads relying on stored procedures, extensions, triggers, or non-serializable isolation require rework to run on Spanner.

Justin Makeig, senior product manager at Google Cloud, explains how PostgreSQL is now the de facto standard for operational databases:

Enterprises and digital natives alike are standardizing on PostgreSQL as the common “API” for their operational databases (…) As organizations modernize in the cloud, they are looking to avoid the onerous lock-in associated with last generation’s databases and to leverage the industry skills and tools they already have. An increasing number of them are standardizing on PostgreSQL.

To see the differences between the two interfaces, developers should refer to the article Dialect parity between Google Standard SQL and PostgreSQL. Andi Gutmans, VP/GM databases at Google, comments:

This reinforces our commitment to being the most open cloud so customers have flexibility and choice on when and where to run. Spanner delivers virtually unlimited scale with market-leading 5 9s availability and no maintenance windows.

Makeig explains how portability helps regulated industries:

The schemas and queries that you write against Spanner’s PostgreSQL interface will run mostly without modification in another PostgreSQL environment, either in Google Cloud or elsewhere. This portability is especially important for industries like financial services where emerging regulations and industry guidelines require critical services to demonstrate exit strategies from essential vendors to ensure business continuity.

Google Cloud recently announced the preview of AlloyDB for PostgreSQL, a managed PostgreSQL-compatible service targeting enterprise deployments. CockroachDB, Yugabyte, and Amazon Aurora also offer PostgreSQL compatible distributed database-as-a-service. Reassuring existing customers, Makeig adds:

Google is fully committed to continued support and evolution of Google Standard SQL. Spanner’s ANSI SQL dialect and ecosystem are the best choice for teams already familiar with Google Cloud. Along with a wide range of functionality, it provides compatibility with BigQuery’s SQL.

The new PostgreSQL interface is configured per database at creation time. Administrators can provision and manage PostgreSQL databases by using the existing console, APIs, and gcloud CLI.

Running the new granular instances, customers can run a Spanner database starting at 65 USD/month, or at 40 USD/month with a three-year commitment. There are no additional costs associated with the new PostgreSQL interface.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Article: The Parity Problem: Ensuring Mobile Apps Are Secure Across Platforms

MMS Founder
MMS Alan Bavosa

Article originally posted on InfoQ. Visit InfoQ

Key Takeaways

  • Implementing a multi-layered defense that is broad and deep is critical for mobile app security, but nearly impossible to achieve using traditional approaches.
  • A broad defense covers the many different categories of attack a hacker can employ to compromise a mobile app.
  • A deep defense employs multiple means to detect and protect against each category of threat.
  • No  third-party library, commercial SDK, or specialized compiler can provide a sufficient broad and deep defense across both iOS and Android, plus the multitude of different devices — the complexity grows exponentially.
  • Automation must be built into the development process to implement broad and deep security defenses for apps across operating systems and devices.
     

It’s been held as common knowledge for some time that everyone “knows” Android is less secure than iOS as a mobile platform. Everyone except for consumers, it seems. A global survey of 10,000 mobile consumers from August 2021 found that the security expectations of iOS and Android users are essentially the same.

However, despite consumer expectations, while one mobile platform is not necessarily inherently less secure than the other, mobile apps rarely achieve security feature parity for Android and iOS. In fact, many mobile apps lack even the most basic security protections. Let’s examine why. 

Mobile App Security Requires a Multi-Layered Defense 

Most security professionals and 3rd party standards organizations would agree that mobile app security requires a multi-layered defense consisting of multiple security features in the following core areas: 

  • Code Obfuscation & Application Shielding to protect the mobile app binary and source code against reverse engineering
  • Data Encryption to protect the data stored in and used by the app.
  • Secure Communication to protect data as it moves between the app and the app’s backend, including ensuring the authenticity and validity of the digital certificates that are used to establish trusted connections.
  • OS Protection to protect the app from unauthorized modifications to the operating system, such as rooting and jailbreaking.

Developers should implement a balanced mix of these features in both iOS and Android versions of their app to form a consistent security defense. And they should add these features early in the development cycle – a concept known as “shift-left” security. Sounds easy enough right?  In theory, yes, in practice, it’s actually quite difficult to achieve a multi-layered mobile app security defense when using ‘traditional’ approaches. 

For years, mobile developers have attempted to implement in-app mobile app security using the traditional collection of tools available to them, including 3rd party open-source libraries, commercial mobile app security SDKs, or specialized compilers. The first major challenge is that mobile app security is never achieved via a ‘silver bullet’. Because mobile apps operate in unprotected environments and store and handle lots of valuable information, there are many ways to attack them. Hackers have an endless supply of freely available and very powerful toolsets at their disposal, and all the time in the world to study and attack the app undetected.

Mobile security requirements

So to build a robust defense, mobile developers need to implement a multi-layered defense that is both ‘broad’ and ‘deep’. By broad, I’m talking about multiple security features from different protection categories, which complement each other, such as encryption + obfuscation. By ‘deep’, I mean that each security feature should have multiple methods of detection or protection. For example, a jailbreak-detection SDK that only performs its checks when the app launches won’t be very effective because attackers can easily bypass the protection.

Or consider anti-debugging, which is an important runtime defense to prevent attackers from using debuggers to perform dynamic analysis – where they run the app in a controlled environment for purposes of understanding or modifying the app’s behavior. There are many different types of debuggers – some based on LLDB – for native code like C++ or objective C, others that inspect at the Java or Kotlin layer, and a lot more. Every debugger works a little bit differently in terms of how it attaches to and analyzes the app. Therefore, for the anti-debugging defense to be effective, your app needs to recognize among the multiple debugging methods being used and dynamically engage the correct defense, since hackers will continue trying different debugging tools or methods until they find one that succeeds.  

Anti-tampering

The list of security requirements doesn’t stop there. Every app needs anti-tampering features like checksum validations, protection against binary patching, and app repackaging, re-signing, emulators and simulators, etc. It would not be a stretch to assume that researching and implementing each one of these discrete features or protection methods alone would require at least several man-weeks of development, per operating system. And that’s being very generous in assuming that the mobile developer already possesses expertise in the specific security domain, which is often not the case. This can get complicated quickly, and so far we are only talking about a single protection category – runtime or dynamic protections. Imagine if each of the features mentioned required one or two weeks of development. 

Jailbreak/Rooting Prevention 

Next, you also need OS-level protections like jailbreak/rooting prevention to protect the app if the mobile operating system has been compromised. Jailbreaking/rooting makes mobile apps vulnerable to attacks because it allows full administrative control over the OS and file system, and thus compromises the entire security model. And just detecting jailbreak/rooting is no longer enough, because hackers are constantly evolving their tools. The most advanced jailbreak and rooting tools are Checkra1n for iOS, Magisk for Android – and many others. Some of these tools are also used for hiding or concealment of activity and managing superuser permissions – often granted to malicious apps. Net net, if you implemented jailbreak or rooting detection using an SDK or 3rd party library, there’s a good chance the protection may already be obsolete or easily bypassed, especially if the app’s source code is not sufficiently obfuscated.  

Code obfuscation

If you use an SDK or 3rd party library to implement a security protection, it’s pretty much useless inside an un-obfuscated app – why? Because hackers can simply decompile or dis-assemble the app to find the source code for the SDK using open source tools like Hopper, IDA-pro, or use a dynamic binary instrumentation toolkit like Frida to inject their own malicious code, modify the app’s behavior, or simply disable the security SDK.  

Code obfuscation prevents attackers from understanding mobile app source code. And it’s always recommended to use multiple obfuscation methods including obfuscating native code or non-native code and libraries, as well as obfuscating the application’s logical structure or flow control. This can be accomplished, for example by using control flow obfuscation or renaming functions, classes, methods, variables, etc. And don’t forget to obfuscate debug information as well. 

It’s clear from real-world data that most mobile apps lack sufficient obfuscation, obfuscating only a small portion of the app’s code, as this research study of over 1 million Android apps clearly illustrates. As the study suggests, the reason for this is that traditional obfuscation methods that rely on specialized compilers are simply too complex and time-consuming for most mobile developers to implement comprehensively. Instead, many developers implement a single obfuscation feature or only obfuscate a small fraction of the codebase. In the referenced research, the researchers found that most apps implemented class-name obfuscation only, which by itself is very easy to defeat. To use a book metaphor, class name obfuscation by itself would be like obfuscating the “table of contents” of a book, but leaving all of the book’s actual pages and content un-obfuscated. Such superficial obfuscation can be very easily bypassed.  

Data protection and encryption

Moving on to data protection, you also need encryption to protect the app and user data – there are lots of places where data is stored in mobile apps, including the sandbox, in memory, and inside the code or strings of the app. To implement encryption on your own there are lots of tricky issues to navigate: there’s key derivation, cipher suite, and encryption algorithm combos, key size, and strength. Many apps use multiple programming languages, each of which would require different SDKs or introduce incompatibilities or dependencies on code you may not control or have access to. And data-type differences can also increase complexity and the risk of performance degradation. 

Then, there is the classic problem of where you store the encryption keys. If keys are stored inside the app, they could be discovered by attackers who reverse engineer it, and once found they could be used to decrypt the data. This is why dynamic key generation is such an important feature. With dynamic key generation, encryption keys are generated only at runtime and never stored in the app or on the mobile device. Further, the keys are only used once, preventing them from being discovered or intercepted by attackers. 

And what about data in transit? TLS alone isn’t sufficient, as there are lots of ways to compromise an app’s connection. It’s important to inspect and validate TLS Sessions and certificates to ensure that all certificates and CAs are valid and authentic, protected by industry-standard encryption. This prevents hackers from gaining control over TLS sessions. And then there’s also certificate pinning to prevent connections to compromised servers or to protect the server-side against connections from compromised apps (for instance if your app has been turned into a malicious bot). 

Fraud, Malware, Piracy Prevention 

And finally, there’s anti-fraud, anti-malware, and anti-piracy protections that you can layer on top of the aforementioned baseline protections to protect against highly advanced or specialized threats. These protections may include features that prevent app overlay attacks, auto-clickers, hooking frameworks, and dynamic binary instrumentation tools, memory injection, keyloggers, key injection, or abuse of accessibility features, all of which are common weapons used in mobile fraud or by mobile malware. 

Just think about the sheer amount of time and resources required to implement even a subset of the above features. And so far, I’ve only talked about feature and function coverage required for a strong security defense. Even if you had the resources and required skill sets in-house (you don’t, but humor me), what about the operational challenges of cobbling together a defense. Let’s explore some of the implementation challenges your dev team will likely encounter. 

Implementation differences between platforms and frameworks

The next problem developers would face is how to implement each of those security features for Android and iOS given the endless number of framework differences and incompatibilities between SDKs/libraries and the native or non-native programming languages used by developers to build mobile apps. While software development kits (SDKs) are available for some standard security features, no SDK covers all platforms or frameworks universally.

A major challenge developers face when attempting to implement mobile app security using SDKs or open-source libraries stems from the fact that these methods all rely on source code and require changes to the application code. And as a result, each of these methods is explicitly bound to the specific programming language that the application is written in, and are also exposed to the various programming language or package ‘dependencies’ between those languages and frameworks. Let’s double-click on that for a moment. 

iOS apps are typically built in Objective-C or Swift, while Android apps are typically written in Java or Kotlin, along with C and C++ for native libraries. For example, let’s say you wanted to encrypt the data stored in your Android and iOS apps. If you found some 3rd party Android encryption libraries or SDKs for Java or Kotlin, they won’t necessarily work for the portion of your app that uses C or C++ code (native libraries). 

In iOS, same deal. You might visit StackOverflow and find that the commonly used Cryptokit framework for Swift won’t work for Objective C.

And what about non-native or cross-platform apps? These are an entirely different ballgame as you’re dealing with web technologies like JavaScript and non-native frameworks like React Native, Cordova, Flutter, or Xamarin which won’t work out of the box (or at all) with SDKs or libraries built for native languages. In addition, for non-native apps, you may not have access to the relevant source code files to implement encryption in the first place. 

For a real-world example of this problem, check out this Stack Overflow post by a developer who needs to build code obfuscation into an iOS app where there are multiple dependencies between React Native (a non-native framework) and Objective C (a native coding language). Because there is no built-in library in the iOS project that will obfuscate React Native code, the developer needs to use an external package (dependency #1). Furthermore, that external package has an additional downstream dependency on yet another library or package to obfuscate the JavaScript code (dependency #2). Now what happens if the developer of the 3rd party library decides to deprecate the solution? One of our customers was facing this very issue and it caused their app to fall out of PCI compliance. 

So how many developers do you think it would take to implement even a fraction of the features I just described? How long would it take? Do you have enough time to implement the required security features in your existing mobile app release process? 

DevOps is agile & automated, traditional security is monolithic & manual

Mobile apps are developed and released in a fast-paced, flexible, and highly automated agile paradigm. To make build and release faster and easier, most Android and iOS DevOps teams have optimized pipelines built around CI/CD and other automated tools. Security teams, on the other hand, do not have access to or visibility into DevOps systems, and most security tools are not built for agile methodologies because they rely heavily on manual programming or implementations, where an individual security feature may take longer to implement than the release schedule allows. 

In an attempt to bridge these shortfalls, some organizations use code scanning and pen testing before publishing apps to public app stores to provide insight into vulnerabilities and other mobile application concerns. When vulnerabilities are discovered, organizations are faced with a difficult decision: release the app without the necessary protections or delay the release to give the developers time to address the security issues. When this happens, it’s all too often that the recommended security protections often get overlooked.

Developers aren’t lazy. The systems and tools they use for security implementation simply cannot match the rapid cadence of modern Agile / DevOps development.

Five steps for strong mobile app security and platform parity

Automation is the key to achieving security parity and strong mobile app security, in general. Here’s a five-step playbook for building mobile app security into apps during the app’s release cycle: 

Step 1: Understand clearly what security outcome is desired

The development, operations, and security teams must all agree on their expectations for mobile security. There needs to be a common understanding of the security goals that organizations can use as a starting point, such as the OWASP Mobile Top 10, the TRM Guidelines for Mobile App Security, and the Mobile AppSec Verification Standard (MASVS). Once the goals are set and the standards are chosen, all team members need to know how they will affect their workflows. 

Step 2: Mobile App Security implementations must be automated

Security is immensely complex, and coding it manually is slow and error-prone. Evaluate and take advantage of automated systems that leverage AI and machine learning (ML) to integrate security into a mobile app. Typically, these are no-code platforms, which can build security into mobile apps automatically, commonly known as a security-build system. 

Step 3: Include security as part of the development cycle – Shift-Left-Security

The shift left in the mobile app security model says that mobile developers need to build the security features at the same time as they are building the app.  

Once an automated security implementation platform is chosen, it should be integrated into the team’s continuous integration (CI) and continuous delivery (CD) processes, which will speed up the development lifecycle, and all teams — development, operations, and security — should continue to collaborate closely throughout the sprint.  Additionally, organizations can come closer to achieving platform parity by creating reusable mobile security templates for the specific security features required in each Android and iOS app.

Step 4: Ensure instant validation and verification 

Without a means to instantly verify that the required security features are included in the release, conflicts can arise at release meetings that may delay the publication of the app or its update. Verification and validation should be documented automatically to prevent last-minute release confusion.

Step 5: Keeping security development to a fixed cost

Development teams need predictability and budget certainty. By taking an automated approach to security, app development teams can reduce unexpected changes in headcount and development expenses, because it eliminates the uncertainty inherent in coding security into mobile apps manually.

Conclusions

The problem of security parity is a big one, but it’s part of a larger problem: a general lack of security in mobile apps, period. By embracing automation for security implementation to the same or greater degree than it has been adopted for feature and function development, mobile app development organizations can ensure that every app they release for every platform will protect end-users and the publishers themselves from hackers, fraudsters, and cybercriminals. 

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Amazon Unveils ML-Powered Coding Assistant CodeWhisperer

MMS Founder
MMS Daniel Dominguez

Article originally posted on InfoQ. Visit InfoQ

Amazon AWS recently launched ML-powered coding companion CodeWhisperer which provides code recommendations based on developers’ comments in natural language and their code in the integrated development environment (IDE), the machine learning-powered service increases developer productivity.

CodeWhisperer bases its recommendations on a variety of contextual cues, including the cursor’s placement in the source code, the code that comes before it, comments, and code from other files in the same project. The recommendations can be implemented exactly as-is or can be improved upon and altered as necessary. CodeWhisperer uses billions of lines of code from forums, internal Amazon repositories, open source repositories, and API documentation to train.

According to Amazon, developers may use CodeWhisperer to accelerate the development process by merely adding a comment to the code in their IDE. The many programming languages, frameworks, software libraries, and cloud services must be kept up to date. With CodeWhisperer, developers will be able to accelerate frontend and backend development with automatic code recommendations, save time and effort to generate code to build and train ML models, speed up the development process with code recommendations for AWS APIs across the most popular services, including Amazon EC2, AWS Lambda, and Amazon S3, and offload writing repetitive unit test code.

CodeWhisperer also places a strong emphasis on security, it offers scans for Python and Java to assist programmers in finding vulnerabilities in their work and creating apps responsibly. Additionally, it has a reference tracker that can determine whether a code recommendation resembles a specific set of training data. Developers may then quickly locate the code example, examine it, and choose whether to utilize it in their project.

According to Amazon, CodeWhisperer has not been created to give an alternative to Copilot, the company set the groundwork for launch quite a few years ago with services like CodeGuru and DevOpsGuru.

For the time being, CodeWhisperer is compatible with Python, Java, and JavaScript. According to Amazon, it interfaces with a variety of IDEs, including JetBrains, Visual Studio Code, AWS Cloud9, and the AWS Lambda console.

Developers who wish to test out Amazon’s new code completion tool can sign up for the waitlist by submitting a request form. Developers can install the AWS IDE Toolkit, activate the CodeWhisperer functionality, and begin using the tool after receiving a preview access code.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Article: API Friction Complicates Hunting for Cloud Vulnerabilities. SQL Makes it Simple

MMS Founder
MMS Jon Udell

Article originally posted on InfoQ. Visit InfoQ

Key Takeaways

  • Developers spend too much time and effort wrangling APIs. When APIs resolve automatically to database tables, it frees devs to focus on working with the data.
  • SQL is the universal language of data, and a great environment in which to model and reason over data that’s frictionlessly acquired from diverse APIs.
  • Postgres is ascendant, more than just a relational database it’s become a platform for managing all kinds of data.
  • SQL has evolved! With features like common table expressions (CTEs)  and JSON columns, it’s more capable than you might think if you haven’t touched it in a while.
  • The ability to join across diverse APIs, in SQL, is a superpower that enables you to easily combine information from different sources.

Pen testers, compliance auditors, and other DevSecOps pros spend a lot of time writing scripts to query cloud infrastructure. Boto3, the AWS SDK for Python, is a popular way to query AWS APIs and reason over the data they return.

It gets the job done, but things get complicated when you need to query across many AWS accounts and  regions. And that doesn’t begin to cover API access to other major clouds (Azure, GCP, Oracle Cloud), never mind services such as GitHub, Salesforce, Shodan, Slack, and Zendesk. Practitioners spend far too much time and effort acquiring data from such APIs, then normalizing it so the real work of analysis can begin.

What if you could query all the APIs, and reason over the data they return, in a common way? That’s what Steampipe is for. It’s an open-source Postgres-based engine that enables you to write SQL queries that indirectly call APIs within, across, and beyond the major clouds. This isn’t a data warehouse. The tables made from those API calls are transient; they reflect the live state of your infrastructure; you use SQL to ask and answer questions in real time.

The case study we’ll explore in this article shows how to use Steampipe to answer this question: Do any of our public EC2 instances have vulnerabilities detected by Shodan? The answer requires use of an AWS API to enumerate EC2 public IP addresses, and a Shodan API to check each of them.

In the conventional approach you’d find a programming-language wrapper for each API, learn the differing access patterns for each, then use that language to combine the results. With Steampipe it’s all just SQL. These two APIs, like all APIs supported by Steampipe’s suite of API plugins, resolve to tables in a Postgres database. You query within them, and join across them, using the same basic SQL constructs.

Figure 1 illustrates the cross-API join at the heart of our case study.The aws_ec2_instance table is one of the hundreds of tables that Steampipe builds by calling AWS APIs. The shodan_host table is, similarly, one of a dozen tables that Steampipe constructs from Shodan APIs. The SQL query joins the public_ip_address column of aws_ec2_instance to the ip column of shodan_host.

 
Before we dive into the case study, let’s look more closely at how Steampipe works. Here’s a high-level view of the architecture.

Figure 2: Steampipe architecture

To query APIs and reason over the results, a Steampipe user writes SQL queries and submits them to Postgres, using Steampipe’s own query console (Steampipe CLI) or any standard tool that connects to Postgres (psql, Metabase, etc). The key enhancements layered on top of Postgres are:

  • Postgres foreign data wrappers 
  • Per-API plugins
  • Connection aggregators

Postgres foreign data wrappers

Postgres has evolved far beyond its roots. Nowadays, thanks partly to a growing ecosystem of extensions that deeply customize the core, Postgres does more than you think. Powerful extensions include PostGIS for geospatial data, pglogical to replicate over Kafka or RabbitMQ, or Citus for distributed operation and columnar storage. 

One class of Postgres extension, the foreign data wrapper (FDW), creates tables from external data. Postgres bundles postgres_fdw to enable queries that span local and remote databases. When Steampipe runs it launches an instance of Postgres that loads another kind of FDW, steampipe-postgres-fdw, an extension that creates foreign tables from APIs with the help of a suite of plugins

These foreign tables typically map JSON results to simple column types: date, text, number. Sometimes, when an API response includes a complex JSON structure such as an AWS policy document, the result shows up in a JSONB column.

Per-API plugins

The plugins are written in Go, with the help of a plugin SDK that handles backoff/retry logic, data-type transformation, caching, and credentials. The SDK enables plugin authors to focus on an essential core task: mapping API results to database tables. 

These mappings may be one-to-one. The aws_ec2_instance table, for example, closely matches the underlying REST API

In other cases it’s helpful to build tables that consolidate several APIs. A complete view of an S3 bucket, for example, joins the core S3 API with sub-APIs for ACLs, policies, replication, tags, versioning, and more. Plugin authors write hydrate functions to call these sub-APIs and merge their results into tables.

A basic Steampipe query

Here’s how you’d use Steampipe to list EC2 instances.

  1. Install Steampipe
  2. Install the AWS plugin: steampipe plugin install aws
  3. Configure the AWS plugin

The configuration relies on standard authentication methods: profiles, access keys and secrets, SSO. So authenticating Steampipe as a client of the AWS API is the same as for any other kind of client. With that done, here’s a query for EC2 instances.

Example 1: Listing EC2 instances

select
  account_id, 
  instance_id, 
  instance_state,
  region
from aws_ec2_instance;

+--------------+---------------------+----------------+-----------+
| account_id   | instance_id         | instance_state | region    |
+--------------+---------------------+----------------+-----------+
| 899206412154 | i-0518f0bd09a77d5d2 | stopped        | us-east-2 |
| 899206412154 | i-0e97f373db22dfa3f | stopped        | us-east-1 |
| 899206412154 | i-0a9ad4df00ffe0b75 | stopped        | us-east-1 |
| 605491513981 | i-06d8571f170181287 | running        | us-west-1 |
| 605491513981 | i-082b93e29569873bd | running        | us-west-1 |
| 605491513981 | i-02a4257fe2f08496f | stopped        | us-west-1 |
+--------------+---------------------+----------------+-----------+

The documentation for the referenced foreign table, aws_ec2_instance, provides a schema definition and example queries.

Connection aggregators

The above query finds instances across AWS accounts and regions without explicitly mentioning them, as a typical API client would need to do. That’s possible because the AWS plugin can be configured with an aggregator that combines accounts, along with wildcards for regions. In this example, two different AWS accounts – one using SSO authentication, the other using the access-key-and-secret method – combine as a unified target for queries like select * from aws_ec2_instance

Example 2: Aggregating AWS connections

connection "aws_all" {
  plugin = "aws"
  type = "aggregator"
  connections = [ "aws_1", aws_2" ]
}

connection "aws_1" {
  plugin    = "aws"
  profile = "SSO…981"
  regions = [ "*" ]
}

connection "aws_2" {
  plugin    = "aws"
  access_key  = "AKI…RNM"
  secret_key  = "0a…yEi"
  regions = [ "*" ]
}

This approach, which works for all Steampipe plugins, abstracts connection details and simplifies queries that span multiple connections. As we’ll see, it also creates opportunities for concurrent API access.

Case Study A: Use Shodan to find AWS vulnerabilities

Suppose you run public AWS endpoints and you want to use Shodan to check those endpoints for vulnerabilities. Here’s pseudocode for what needs to happen.

A conventional solution in Python, or another language, requires you to learn and use two different APIs. There are libraries that wrap the raw APIs, but each has its own way of calling APIs and packaging results. 

Here’s how you might solve the problem with boto3.

Example 3: Find AWS vulnerabilities via Shodan, using boto3

import boto3
import datetime
from shodan import Shodan

aws_1 = boto3.Session(profile_name='SSO…981')
aws_2 = boto3.Session(aws_access_key_id='AKI…RNM', aws_secret_access_key='0a2…yEi')
aws_all = [ aws_1, aws_2 ]
regions = [ 'us-east-2','us-west-1','us-east-1' ]

shodan = Shodan('h38…Cyv')

instances = {}

for aws_connection in aws_all:
  for region in regions:
    ec2 = aws_connection.resource('ec2', region_name=region)
    for i in ec2.instances.all():
      if i.public_ip_address is not None:
        instances[i.id] = i.public_ip_address
   
for k in instances.keys():
   try:
     data = shodan.host(instances[k])
     print(k, data['ports'], data['vulns'])
   except Exception as e:
     print(e)

When APIs are abstracted as SQL tables, though, you can ignore those details and distill the solution to its logical essence. Here’s how you use Steampipe to ask and answer the question: “Does Shodan find vulnerable public endpoints in any of my EC2 instances?”

Example 4: Find AWS vulnerabilities using Steampipe

select
  a.instance_id,
  s.ports,
  s.vulns
from
  aws_ec2_instance a
left join
  shodan_host s 
on 
  a.public_ip_address = s.ip
where
  a.public_ip_address is not null;

+---------------------+----------+--------------------+
| instance_id         | ports    | vulns              |
+---------------------+----------+--------------------+
| i-06d8571f170181287 |          |                    |
| i-0e97f373db42dfa3f | [22,111] | ["CVE-2018-15919"] |
+---------------------+----------+--------------------+

There’s no reference to either flavor of API, you just write SQL against Postgres tables that transiently store the results of implicit API calls. This isn’t just simpler, it’s also faster. The boto3 version takes 3-4 seconds to run for all regions of the two AWS accounts I’ve configured as per example 2. The Steampipe version takes about a second. When you’re working with dozens or hundreds of AWS accounts, that difference adds up quickly. What explains it? Steampipe is a highly concurrent API client.

Concurrency and caching

If you’ve defined an AWS connection that aggregates multiple accounts (per example 2), Steampipe queries all of them concurrently. And within each account it queries all specified regions concurrently. So while my initial use of the query in example 3 takes about a second, subsequent queries within the cache TTL (default: 5 minutes) only take milliseconds. 

It’s often possible, as in this case, to repeat the query with more or different columns and still satisfy the query in milliseconds from cache. That’s because the aws_ec2_instance table is made from the results of a single AWS API call.

In other cases, like the aws_s3_bucket table, Steampipe synthesizes many S3 sub-API calls including GetBucketVersioning, GetBucketTagging, and GetBucketReplication. And it makes those calls concurrently too. Like any other API client, Steampipe is subject to rate limits. But it’s aggressively concurrent so you can quickly assess large swaths of cloud infrastructure. 

Note that when using a table like aws_s3_bucket, it’s helpful to request only the columns you need. If you really want everything, you can select * from aws_s3_bucket. But if you only care about account_id, instance_id, instance_state, and region, then asking explicitly for those columns (as per example 1) avoids unnecessary sub-API calls.

Case Study B: Find GCP vulnerabilities

If your endpoints only live in AWS, example 3 solves the problem neatly. Now let’s add GCP to the mix.  A conventional solution requires that you install another API client, such as the Google Cloud Python Client, and learn how to use it. 

With Steampipe you just install another plugin: steampipe plugin install gcp. It works just like the  AWS: calls APIs, puts results into foreign tables that abstract API details so you can focus on the logic of your solution. 

In this case that logic differs slightly. In AWS, public_ip_address is a core column of the aws_ec2_instance table. In GCP you need to combine results from one API that queries compute instances, and another that queries network addresses. Steampipe abstracts these as two tables: gcp_compute_instance and gcp_compute_address. The solution joins them, then joins that result to Shodan as in example 4. 

Example 5: Find GCP vulnerabilities using Steampipe

with gcp_info as (
  select 
    i.id,
    a.address
  from
    gcp_compute_address a
  join
    gcp_compute_instance i
  on 
    a.users->>0 = i.self_link
  where
    a.address_type = 'EXTERNAL'
  order by
    i.id
)
select
  g.id as instance_id,
  s.ports,
  s.vulns
from 
  gcp_info g
left join
  shodan_host s on g.address = s.ip;

This query makes use of two language features that can surprise people who haven’t looked at SQL in a long while. The WITH clause is a Common Table Expression (CTE) that creates a transient table-like object. Queries written as a pipeline of CTEs are easier to read and debug than monolithic queries. 

a.users is a JSONB column. The ->> operator addresses its zeroth element. Now that JSON is a first-class citizen of the database, relational and object styles mix comfortably. That’s especially helpful when mapping JSON-returning APIs to database tables. Plugin authors can move some pieces of API data into legacy columns and others into JSONB columns. How to decide what goes where? That requires an artful balance of concerns, but the key point is that modern SQL enables flexible data modeling. 

Case Study C: Find vulnerabilities across clouds

If you’ve got public endpoints in both AWS and GCP, you’ll want to combine the queries we’ve seen so far. And now you know everything you need to know to do that.

Example 6: Find AWS and GCP vulnerabilities

with aws_vulns as (
  —- insert example 4
),
gcp_vulns as (
  —- insert example 5
)

select * from aws_vulns
union
select * from gcp_vulns;

+-------+---------------------+----------+--------------------+
| cloud | instance_id         | ports    | vulns              |
+-------+---------------------+----------+--------------------+
| aws   | i-06d8571f170181287 |          |                    |
| aws   | i-0e97f373db42dfa3f | [22,111] | ["CVE-2018-15919"] |
| gcp   | 8787684467241372276 |          |                    |
+-------+---------------------+----------+--------------------+

We’ve arranged example 4 and example 5 as a CTE pipeline. To combine them requires nothing more than a good old-fashioned SQL UNION. 

You also now know everything you need to know to expand the pipeline with CTEs for the Oracle or IBM clouds. While you’re at it, you might want to bring more than just Shodan’s knowledge to bear on your public IP addresses. There are plugins that do reverse DNS lookup, map IP addresses to geographic locations, and check addresses for reported malicious activity. Each of these maps another API that you don’t need to learn how to use, models it as a collection of database tables, and enables you to work with it using the same basic SQL constructs you’ve seen here.

It’s just Postgres

We’ve said that Steampipe isn’t a data warehouse, and that API-sourced tables remain cached for only a short while. The system is optimized for rapid assessment of cloud infrastructure in real time. But Steampipe is just Postgres, and you can use it in all the same ways. So if you need to persist that realtime data, you can.

Example 7: Persist a query as a table

create table aws_and_gcp_vulns as 
  -- insert example 6 

Example 8: Persist a query as a materialized view

create materialized view aws_and_gcp_vulns as 

  -- insert example 6
  -- then, periodically: refresh materialized view aws_and_gcp_vulns

Example 9: Pull query results into Python

import psycopg2, psycopg2.extras
conn = psycopg2.connect('dbname=steampipe user=steampipe host=localhost, port=9193')
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
cursor.execute('select * from aws_and_gcp_vulns')
for row in cursor.fetchall():
  print(row['cloud'], row['instance-id'], row['vulns'])

Example 10: Connect with psql

psql -h localhost -p 9193 -d steampipe -U steampipe

You can use the same connection details to connect from Metabase, or Tableau, or any other Postgres-compatible tool. 

Bottom line: Steampipe’s API wrangling augments the entire Postgres ecosystem. 

Skip the API grunt work, just do your job

For a DevSecOps practitioner the job might be to inventory cloud resources, check for security vulnerabilities, or audit for compliance. It all requires data from cloud APIs, and acquiring that data in a tractable form typically costs far too much time and effort. With fast and frictionless access to APIs, and a common environment in which to reason over the data they return, you can focus on the real work of doing inventory, security checks, and audits. The requisite API wrangling is a distraction you and your organization can ill afford. Don’t let it get in the way of doing your real jobs, which are plenty hard enough even when you have the data you need.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


VMware vSphere+ and vSAN+ Promise to Bring the Benefits of the Cloud to On-Premises Workloads

MMS Founder
MMS Sergio De Simone

Article originally posted on InfoQ. Visit InfoQ

Recently announced, VMware vSphere+ and vSAN+ integrate Kubernetes with VMware virtualization technology to help transform on-premises workloads into SaaS-enabled infrastructure and simplify its management and evolution, says VMware.

On-premises deployments have a number of benefits, including locality, low latency, performance, and predictable cost. Where they fall short is usually on the side of flexibility and maintenance.

In many instances, customers’ vSphere environments are distributed across siloed locations, edge sites, and clouds leading to operational complexity and inefficient maintenance experience.

According to VMware, vSphere+ makes it possible to provision infrastructure on-premise with the same ease as it is possible on the Cloud, for example by scaling services in and out based on demand. Central to vSphere+ is the integration between vCenter and the Cloud Console, which enables metadata collection and management in a centralized location.

This is made possible by vSAN+, which delivers vSAN storage services for on-premises deployments and represents the connection point between vCenter instances and the VMware Cloud for centralized management. Thanks to this connection, you can use higher-level services to access your on-premises as well as Cloud deployments, including admin, developer, and add-on services.

Admin services aim to simplify and streamline the overall management of the system, including for example lifecycle management to distribute and install updates; an inventory service to track all available resources such as clusters, hosts, VMs, and so on; an event viewer for alerts and other kinds of events; VM provisioning, to quickly create new VMs, and more.

Developer services, says VMware, bring the integration of vSphere with Kubernetes beyond what available in VMware Tanzu, enabling the unification of VMs and Kubernetes containers. This means for example you can create VMs using Kubernetes commands and APIs, run containerized apps using a Kubernetes distribution integrated with vSphere, managing network connectivity for VMs and Kubernetes workloads, etc.

Finally, add-on services provide extended capabilities, such as VMware Cloud Disaster Recovery, a solution to protect and recover mission-critical applications, which will be available soon.

VMware says they have defined an incremental and non-disruptive way to adopt vSphere+ which does not require migrating or moving any vCenter instances, both for vSphere and vSphere Enterprise Plus customers.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Adobe Researchers Open-Source Image Captioning AI CLIP-S

MMS Founder
MMS Anthony Alford

Article originally posted on InfoQ. Visit InfoQ

Researchers from Adobe and the University of North Carolina (UNC) have open-sourced CLIP-S, an image-captioning AI model that produces fine-grained descriptions of images. In evaluations with captions generated by other models, human judges preferred those generated by CLIP-S a majority of the time.

The model and experiments were described in a paper submitted to the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). CLIP-S uses a Transformer model to generate captions given an input image. During training, the model uses CLIP to determine how well the generated caption describes the image; this score is used as a reward signal for reinforcement learning (RL). To improve the grammar of the generated captions, the team fine-tuned CLIP with negative caption examples, which were generated by randomly modifying reference captions. To address the shortcomings of existing image-captioning evaluation methods, the team also developed a new benchmark dataset, FineCapEval, which includes more fine-grained image captions describing image backgrounds and relations between objects. According to the research team,

The reference captions of public datasets often describe only the most prominent objects in the images. This makes models trained to maximize textual similarity with reference captions tend to generate less distinctive captions that ignore the fine detailed aspects of an image that distinguishes it from others.

Many image captioning models are trained on datasets consisting of input images and reference captions; the training objective measures the similarity of the generated caption to the reference caption, using metrics such as BLEU. However, this often results in models that generate generic captions that describe only the prominent objects in the image, ignoring fine details that make the image distinctive.

To address this problem, the Adobe team chose to use OpenAI’s CLIP model to measure the accuracy of the generated captions. CLIP measures the similarity between an image and a text string; the more closely the text describes the image, the higher the similarity. The researchers used this CLIP score to create a reward function, CLIP-S, for RL training to produce their captioning model.

However, the team found that this model often generated grammatically incorrect captions, for example, by repeating words: “several rows of planes parked outside a terminal window area with fog outside a terminal window motion position area motion.” Their solution was to fine-tune the text-encoder portion of CLIP, by providing negative examples with randomly repeated, inserted, or shuffled tokens. They also introduced a two-layer perceptron classifier head that detects whether a sentence is grammatically correct, training this jointly with the text-encoder fine-tuning.

The team also created FineCapEval, a new benchmark dataset for evaluating fine-grained image captioning models. This dataset contains 500 images from the MS COCO test split and the Conceptual Captions validation split. For each image, five human workers wrote descriptions of: the image background; the objects in the image, including shape and color; the relationships among the objects, such as spatial relationships; and a detailed caption including all the first three aspects. The dataset contains a total of 1k images with 5k captions for each of those four criteria.

To evaluate their model, the team compared its captions to those from several baseline models, using the COCO dataset as a benchmark. Although a baseline model outperformed CLIP-S on text-based metrics such as BLEU, CLIP-S outperformed on image-text based metrics as well as text-to-image retrieval metrics. It also “significantly” outperformed baselines on the team’s new FineCapEval benchmark. Finally, human judges “strongly” preferred captions generated by CLIP-S to those generated by baseline models.

Multimodal image-text AI models are an active research topic. InfoQ recently reported on DeepMind’s Flamingo model, which exhibits state-of-the-art few-shot learning capability on several image-text tasks, including image captioning. Last year InfoQ reported on Google’s ALIGN model and on AliBaba’s M6 model, both of which can perform a variety of image-text tasks.

The CLIP-S code and the FineCapEval dataset are available on GitHub.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Java News Roundup: Payara Platform, JReleaser, Quarkus, Hibernate, Spring Cloud, Apache Beam

MMS Founder
MMS Michael Redlich

Article originally posted on InfoQ. Visit InfoQ

It was relatively quiet in the Java community during the week of June 27th, 2022 that features news from JDK 19, JDK 20, Spring Cloud 2020.0.6, Quarkus 2.10.1, Payara Platform Enterprise 5.40.0, JReleaser 1.1.0, Hibernate ORM 6.1.1, Apache Beam 2.40.0 and Apache Camel 3.14.4.

JDK 19

Build 29 of the JDK 19 early-access builds was made available this past week, featuring updates from Build 28 that include fixes to various issues. More details may be found in the release notes.

JDK 20

Build 4 of the JDK 20 early-access builds was also made available this past week, featuring updates from Build 3 that include fixes to various issues. Release notes are not yet available.

For JDK 19 and JDK 20, developers are encouraged to report bugs via the Java Bug Database.

Spring Framework

Spring Cloud 2020.0.6 has been released that delivers bug fixes and upgrades to all of the Spring Cloud sub projects, notably Spring Cloud Commons, Spring Cloud OpenFeign and Spring Cloud Netflix. This release also backports fixes from various issues related to the previous versions of the 2021.0 release train. More details on this release may be found in the release notes.

Quarkus

One week after the release of Quarkus 2.10.0, Red Hat has provided a maintenance release with Quarkus 2.10.1.Final that ships with bug fixes and improvements in documentation along with dependency upgrades such as: SmallRye Fault Tolerance 5.4.1, Keycloak 18.0.1, Scala Maven Plugin 4.6.3 and Flyway 8.5.13. Further details on this release may be found in the changelog.

Payara

Payara has released the June 2022 edition of their Payara Platform as an enterprise-only release. Payara Platform Enterprise 5.40.0 edition delivers three bug fixes, one component upgrade and two improvements that include: enhancements to the Jakarta Concurrency 3.0 specification that increase the functionality of the ManagedExectorService interface; improvements in the Enterprise edition documentation; increased security and stability; and a dependency upgrade to Smack 4.4.6. This release also includes backports for Payara 5 Enterprise. More details on this release may be found in the release notes.

JReleaser

Version 1.1.0 of JReleaser, a Java utility that streamlines creating project releases, has been made available featuring: adding active properties to the assemble, announce and download sections; an option to download assets required for assembly or release; authentication to HTTP; and FTP support for download and upload. Further details on this release may be found in the changelog.

Hibernate

Hibernate ORM 6.1.1.Final, a maintenance release, was made available featuring bug fixes, a memory optimization of the resolveDirtyAttributeIndexes() method in the AbstractEntityPersister class, and lifting of the limitation in selecting to-one associations with embedded IDs or ID classes.

Apache Beam

The Apache Software Foundation has released Apache Beam 2.40.0 that ships with; new features targeted to the Go SDK; a dependency upgrade to Apache Hive 3.1.3; and a new RunInference API, a machine learning inference for Apache Beam. Breaking changes include a minimal requirement of Go SDK 1.18 to support generics. More details on this release may be found in the release notes and a more in-depth introduction to Apache Beam may be found in this InfoQ technical article.

Apache Camel

Apache Camel 3.14.4 has been released featuring bug fixes and a module upgrades to camel-spring-boot 2.6.8, a dependency upgrade to Jakarta Mail 1.6.7, and correction to a wrong definition in the camel-azure-storage-datalake feature within camel-karaf module. Further details on this release may be found in the release notes.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Podcast: Jessica Kerr on Software Teams and Software Products as Learning Systems

MMS Founder
MMS Jessica Kerr

Article originally posted on InfoQ. Visit InfoQ

Subscribe on:






Transcript

Shane Hastie: Good day folks. This is Shane Hastie for the InfoQ Engineering Culture Podcast. Today I’m sitting down across the miles with Jessica Kerr. Jessica, welcome. Thanks for taking the time to talk to us today.

Jessica Kerr: Thank you, Shane. It’s great to talk with you.

Shane Hastie: If I knew, we could be here.

Jessica Kerr: Right. Exactly. It’s fine. It’s fine. We have removed a lot of disincentives to connect with people around the world.

Shane Hastie: We have indeed. The silver lining is this opportunity to at least be in the same virtual space with more people more frequently.

Jessica Kerr: Yes.

Introductions [00:35]

Shane Hastie: So a lot of our audience are probably aware of your content on InfoQ. You’re a frequent contributor to QCon conferences, and there’s a lot of stuff that we’ve got that you have both written and contributed to and spoken about, but there are some that probably don’t know. So give us the one minute overview. Who’s Jessica?

Jessica Kerr: Okay. A lot of that content on InfoQ is various languages. So I’ve spoken about Scala, Clojure, Ruby, Elm, Typescripts, Java. There’s more. I almost feel like I kind of do a survey of the industry sometimes. But more recently I’ve really gotten into wider systems. So over the last few years, I’ve keynoted several conferences with my best talk. This one’s really good. If you watch nothing else of mine, this one matters because it’s about symmathesy, which are learning systems made of learning parts. Once I found this word, I really see that in our software teams in particular and more in software teams than in other symmathesy,. So every forest is symmathesy,. Every ecosystem is a system that as a whole grows and changes because all of its parts are growing and changing. And every team is a symmathesy, because its people we’re constantly learning and that changes our interactions and changes how the team works. But software even more so because the code is learning because we teach it and we can learn from our code.

The importance of accelerating learning in software development [02:06]

Jessica Kerr: I mean, we already do in a lot of ways from tests and logs and databases. Much more so if you have good observability, that’s all about learning from your code. But what matters is that we are learning systems. And in particular, with software, we get to really accelerate this learning and accelerate its impact on the world because software changes the world. It changes what happens when I push a button on the screen. It changes physics in that sense. It changes the world that many people live in so we have this huge impact and these learning systems, and we have not figured out how to do this well. So over the last couple years, I’ve done systems thinking workshops with Kent Beck. I did domain driven design with Eric Evans, because that has a lot of domain driven design and the language and it’s about the learning about the business that we infuse into the software being so important.

The value of observability in software as it enables rapid learning [03:02]

Jessica Kerr: And now I work at Honeycomb, which is the OG of observability, state of the art of observability I think. And that’s also about learning from our systems. So yeah, there’s a lot for us to know. The beauty of this field is that we haven’t figured it out. And we get to experiment because software teams move so much faster, have an impact so much faster and we can learn so much faster than, well, at any other industry that I know of, which isn’t saying much. I’m sure there are others that I’m just not aware of.

Shane Hastie: So observability, this is the culture and methods space. This is the Engineering Culture Podcast. Observability into culture, how do we get that right? And how do we use that?

Observability into culture – don’t measure the easy things [03:47]

So first of all, observability is being able to see inside a system to see what it’s doing and why. So to set up the analogy, in software that means getting every service to emit traces to say, “Hey, I got this request and I got this request because this other service sent it to me.” And then we can weave those together and you can get a distributed trace, which is nice. But it’s about the software being able to say, “I made this decision because this attribute and that attribute and blah, blah, blah, blah, blah.” It is data emitted purely for us as operators to get a better understanding of the software that we’re coexisting with, co-evolving with even.

Jessica Kerr: So how do we do that in our teams? How do we get a view of what’s happening? And this overlaps with legibility. How do we make our teams report out such that we can understand what’s happening at scale? And when you’re talking about this with software, it’s easy, not always fast, but you insert some code to do that. In our teams I think there’s a lot that changes based on the questions we ask. As a manager of a team, for instance, or as a lead or as a product person, we’re probably asking, “Is it done yet?” Not literally. We’re probably asking, “How’s it going? What’s getting behind?” But what we really mean is, “When is it going to be done?” What else do we ask about? Do we ask about, “Oh, were there any security considerations with that change you just made?” Do we ask about, “Oh, how’s the error handling for that? How does this impact the user experience? And did the testers find any bits of it frustrating?” If you ask about those things, then you’re going to find out about them.

And these are all emergent system properties that we care about. We care about the software staying up. We care about it not letting hackers do something we didn’t want it to do. We care about people who use it not getting frustrated and having a delightful experience. None of which are, “When is it going to be done?” So it starts with what you ask about. You can also look for clues. Sometimes you can make the software give you those clues, like a big influencer over both stability and speed of delivery, all the DORA metrics. So meantime to recover. No, not meantime to recover. Time to recover, because mean is a BS metric on an asymmetrical distribution. Also, change failure rates, deploy frequency, and lead time to production. So how long between commit and available in prod? And if you look at even the easy ones to measure, which are lead time, how long do your builds take and how long does your code review take and then your next build and then how long till the deploy? You can measure that. And deploy frequency, you can definitely measure.

If you just look at those, you’re getting a clue. Especially if you look at the change in those. As we grow, are we getting more deploys or are deploys getting more scary and people are doing fewer of them? Major danger alert. Right, so there are some numbers that you can look at as an indicator, but I really want to discourage people from looking at the numbers that are easy to get, whatever JIRA will spit out for you because often what’s easy to measure is not what’s most important.

You can count the number of tickets a team has completed in a week. But what does that say about the… We usually say code quality. What I really think we mean is the malleability of the code that they’re writing. Are we going to be able to change it in the future? What does it say about security? Have we updated our libraries lately? There’s a lot more to it. Is our team becoming more or less able to work together? Are the new people on the team getting their skill level up closer to the experienced people? Are they able to kind of even out the work? Or are the experienced people changing the code so fast that the new people are just floundering? There is no sufficiently smart engineer to get integrated into a code base that’s changing under them without a lot of help, a lot of pairing usually.

Things you can’t measure but you can notice [07:46]

Jessica Kerr: So some of these things you can’t measure, but you can notice. Notice conversations, notice who is super helpful, questions in slack, notice who is writing the documentation. Who is doing that glue work of answering questions from customer support and maintaining relationships with other teams and digging into, “Hey, there’s this field we need to add to the API”? Who is skipping ahead and adding the field and just making a guess at the value and calling it closed? Who is deeply investigating what does this data mean, what security validations do I need on it, where do I get it, is it safe to store, is it safe to log and really gaining business domain knowledge? There’s a lot. And most of it you can’t measure, but you can ask about and you can notice these things if you try. So a lot of observability is about consciously deciding what’s important and opening your eyes and ears for it.

Shane Hastie: The stereotypical technical lead hasn’t been trained in observing culture in teams.

Jessica Kerr: Oh, that’s so true.

Shane Hastie: How do we help them?

The need for technical leaders to build empathy [08:54]

Jessica Kerr: Well, first, do they want to be helped? Because if they don’t want to be helped, we can’t help them. How do you help a technical lead acquire this kind of empathy? I don’t know. That’s something I would look to. I think there’s Sarah Mae and other people are working on that question. I don’t… Okay. The only thing from my perspective that I can contribute is that as you get to be a technical lead… Or no. The reason you get to be a technical lead is because you’re thinking about the system more widely. So as a junior dev, you solve the puzzles that are presented to you by more senior members of the team who are doing that thing of investigating what is this field that I’m asking you to the API and what requirements are around it and what should we watch out for? And then as you get to be like a mid-level dev, you understand the whole piece of software that you’re working on, or at least know where to go to get the information and you start to have familiarity with adjacent systems with your own interfaces.

And then as a team lead, you should be thinking about at least all of those adjacent systems and the ones that might be adjacent in the future and caring about the impact that our changes have on software that we talk to and teams that we talk to. So that is a widening. The trick is that when you widen your view of the system, you need to include the people because that software doesn’t get that way by itself and it doesn’t stay that way by itself. Oh, oh, Charity Majors has a great one. An individual developer can write software, but not deliver it. The unit of delivery is a team. And I think that’s really important because as a developer, I don’t want to just deliver features that’s not in fact useful to the world. My objective as a software team is to provide valued capabilities to customers. And that involves coding the software to provide those capabilities. It also involves that software being up, that software continuing to be malleable and secure, and a lot of different things that are delivery and operating that software, not just writing code.

Shane Hastie: Changing direction, what are the limitations of business aligned product teams?

The risks of business aligned product teams [11:05]

Jessica Kerr: So business aligned product teams are all the rage right now. People want product centricity. Project to product is the next agile, which is great because software’s not a project. It’s not “Deliver this feature.” It is a product. It is an ongoing providing of a capability. But then where do you set the team’s responsibility? This can go everywhere from, “We will tell you what capabilities you provide. And then we will ask you to provide more as time goes on” to “You own the product and your job is to provide business value, red money to the business with your product and you have complete autonomy over that.” The word autonomy implies responsibility for the everything else in the system that really it takes a human to perceive. And when you go to the extreme of that, when each team is responsible for providing business value, how do you account for the value that one team adds to another?

So if you have like… I don’t have an example off the top of my head. But if you have one team responsible for maybe it’s a travel site and one team is in charge of selling flights and the other team is in charge of selling hotels, and the other team is in charge of selling rental cars and you want each of these to be profitable, okay, that makes sense but I could do this in a couple different ways. I could make the hotel part of the page so obnoxious that you focus on that and ignore the flights hypothetically or I could make the hotels part of the page direct you to flights, or I could make the flights part be like, “Oh, here are hotels that are available for that date range.” We can make these things work together or conflict with each other if each team has a number that it’s responsible for. That number could be money. It could also be increased engagement, more clicks or something.

Then there’s nothing to stop them from competing. How do we measure the systemic effects of your team? And also how do we increase your ability to provide capabilities by having a self-service platform? Platform teams are definitely one of the key teams in Team Topologies. I love the book Team Topologies. But how do you justify that when each team is supposed to make a fixed amount of money? I think we are not good at measuring systemic contributions. I don’t have an answer for you on that for how do we do that. We can notice them. We can notice systemic contributions, but if we’re data driven, then we’re going to reward the teams that are hogging the page space or the load time or whatever. This is why I don’t like data driven work. I like data informed decisions.

Shane Hastie: So let’s pick that one apart if we can. Data driven is very fashionable and it’s very easy. How do we interpret that from data to knowledge?

Make data informed decisions, don’t be data driven [13:59]

Jessica Kerr: Right. From data to useful information, which we can then use to decide on useful action. Yeah, when I hear data driven, I just think it’s not my fault. This is, “Blame the data. It wasn’t me.” But data informed means we turn that data into knowledge and then we put it in context, because when you look at a number like clicks or “Did people spend how much time with their mouse over this part of the page?” indicating some level of attention, we can get focused on the number. But the thing is that that has taken the data out of context. This is a property with all metrics. Everything that is legibility, everything that we can add up and some and aggregate and divide and blah, blah, blah, it’s all out of context. So you don’t know whether my mouse was remaining over this page because I was reading it and pointing to it or because actually I just got distracted and went somewhere else.

So when we look at that metric as information… First of all, if you ask a team to focus on it, then they’ll naturally game it. You’ve asked them to. And then it becomes not information. But if we haven’t done that and we have this information of, “I observed that people have their mouse over the hotel portion of the page more than the flight portion of the page,” we can ask why. We could do a little user research and maybe it’s because the hotel part is more confusing. Maybe it would help to combine that information with and how many people are smoothly making it through reserving a hotel. How many people give up on the page while they’re looking at the hotel part and leave versus engaging further? Funnels and drop offs are all attempts to go a little deeper. And then you have the extreme and you can use Intercom to record everything that a user does on the screen and try to get ultra details. And maybe you want to sample that a little bit, but also ask people.

Customer support is really good for this if you can just ask them what you struggle with. Yeah, so a little bit of context can go a long way in turning data into actionable information. Maybe the action we want to take is actually reduce the time they spend on the hotel portion by making it clearer, to increase their real engagement of actually reserving a hotel rather than their difficulty, their time spent. And the problem with this is, it is absolutely different in every case. I mean, you can learn heuristics. You can learn heuristics of when I see a number, I always ask why and see if people know. Always have more than one number. For instance, if you have an OKR, always have multiple key results per objective. It keeps us from narrowing in on the beauty of a number, the value clarity that we get by that pristine, precise definition of good, which is also garbage because it’s ignoring the everything else, all the emergent properties of we actually want people to enjoy being on our site or whatever.

Shane Hastie: Emergent properties.

Many qualities of a software system are emergent properties [17:06]

Jessica Kerr: Right. Emergent properties are properties of a system that are not isolated to one part. They exist as the result of interactions between the system of all the parts together. For instance, availability is an emergent property of software. It means not going down. It means no part of the system is crashing and taking everything else out. And all parts of the system are dealing with the errors that do happen. Security is an emergent property. It means we’re not doing anything we’re not supposed to do. Really tough one. User experience is highly emergent because it’s about the consistency, the expectations that you set up for people, and then how you fulfill those. Super dependent on all the different parts working smoothly together.

I like a lot of decoupling in the back end. We really want to decouple our code. But in the front end, there’s a problem because every part of the front end is coupled at the user, at the person who’s looking at both of those parts. So yes, we really do need that UI to be consistent even though I would love for the teams to be able to change at different rates. Very tricky.

Yeah, so these emergent properties are what make our software valuable. They would allow it to provide capabilities to our internal or external customers. But we can’t measure them directly. We can only get little clues. You can measure uptime and call that availability. But really, if your learning platform is down at midnight in the time zone of the university that is using it, it might impact some students who are trying to hit a last minute deadline. But if it’s down at 10:00 AM when professors are trying to give tests, that’s a much bigger thing. That’s a different level of availability. So it’s better to measure events and which ones were good and which ones were bad than to measure up time. But it’s still just a clue. All of these numbers are clues. And if we treat them like that, their information. And if we treat them as a goal, there is some use to that and there is danger. There’s danger in spoiling the information. And more importantly, there’s danger in trammeling over these emergent properties.

Shane Hastie: Shifting away from the software, culture is an emergent property of the teams and the organization.

Culture is an emergent property of teams and organisations and you can only shift it slowly [19:26]

Jessica Kerr: Yes. Culture is the sum of everything we actually do and actually say. I struggle to think of culture as a property. I feel like we are just putting a label on something that is many things. But yeah, that is the word that we use to describe the overall feeling of a place, what is acceptable there.

Shane Hastie: So if I want to change some of these elements, the big ones at the moment, diversity and inclusion, consciousness of the impact that we are having on society as a whole…

Jessica Kerr: The part where, “Do we value security in this organization?” for instance. It’s rarely stated in your quarterly goals, but some managers ask about it and some teams always take it into consideration. That’s a totally a culture thing. My theory about culture is that you don’t change it. You do shift it. Culture is constantly changing. It’s changing itself. Can you shift the direction of that change? And if you think about it that way, that you’re trying to shift the direction of the change that’s always happening, then you recognize that it has to be slow. You can’t get diversity and inclusion. You can only slowly shift the trend for more or less. For, “Do we think about this about diversity and inclusion when we have a meeting, when we talk over each other? Do we think about it when we’re…” Hiring is the obvious one, because hiring is one way to shift the culture, but not really, because everyone you hire will immediately be absorbed into the much wider system.

Shifting toward caring about security could be, “Do you ask about it?” And how high up the organization do those questions go? What abilities do you give teams to know whether their software is secure? Do you give them the business knowledge that they need to validate the data properly? Do they have quick to deploys and permission to upgrade libraries as soon as the new version comes out? There’s a lot of things you can do to remove obstacles to the culture you want to have and to shift the inherent cultural change in that direction. And it’s always going to be slow and you’re always going to have to be shifting it for years and years and years. It’s not something you’re going to accomplish in a quarter. You’re not going to get anywhere in a quarter. Because if you shift the direction in a tiny bit for a quarter and then you stop, it goes right back.

Shane Hastie: Culture is an elastic band.

Jessica Kerr: Yeah, it’s boingy.

Shane Hastie: Jessica, thanks so much for taking the time to talk to us today. If people want to continue the conversation, where do they find you?

Jessica Kerr:  I’m jessitron on Twitter, J-E-S-S-I-T-R-O-N. Also, jessitron.com. If you really want to chat with me, I have a Calendly and I have open office hours at honeycomb.io/office-hours.

Shane Hastie: Thank you so much.

Jessica Kerr: Thanks, Shane.

Mentioned

About the Author

What are the major trends that matter right now in software development and technical leadership?
Uncover emerging software trends and practices to solve your complex engineering challenges, without the product pitches.
Save your spot now!,

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Making Sense of Application Security

MMS Founder
MMS Adib Saikali

Article originally posted on InfoQ. Visit InfoQ

Transcript

Saikali: Welcome to my talk, making sense of application security. This talk is an application developer’s guide to all of the foundational things that you should know about security. The kinds of things that helps you debug tricky exceptions without having to visit lots of blog sites and Stack Overflow and trial and error your way to eliminating an exception, for example.

My name is Adib Saikali. I am a software developer, been a software developer since 1995, and a Code Janitor since 2014. I’m currently a Global Field Principal at VMware Tanzu. That basically means I spend enormous amounts of times doing things with Kubernetes, Cloud Foundry, Spring, and a lot of time in customer code bases, that are multiple millions of lines long with like making them a billion dollars a year, and really advise customers on how to go towards modular monoliths or microservice architectures. I’ve accumulated a lot of patterns on how to refactor and modernize applications. Security has been a big part of that. One of the things I’ve observed is that developers struggle with security a lot. Because it’s one of those things that one person on the team does, and then everybody else doesn’t have to do it. You do it once a year, and so you every time have to remember and re-Google how to do it. I’m also currently working on a book called Securing Cloud Applications for Manning Publications. You can buy it and start reading the chapters as I finish them.

Why We Should Care About Application Security

Why should you care about security? You should care about it because it’s a CEO level problem. What I mean by that is, if there’s a large enough security incident, either your company is going to go bankrupt and shut down, or potentially more likely, the CEO is going to get fired. A good example of that is the Equifax CEO who did resign after that massive data breach they had in 2017, and so did a bunch of other senior executives. What are CEOs doing about this? CEOs, for the most part, are not developers or technical people. What they’re doing is they’re actually creating this, they have this role of chief information security officer. This is the individual responsible for the security of the enterprise, and they report directly to the CEO so that they can drive the transformation of the organization’s security practices to whatever is needed to make it more secure. Even the U.S. president is getting in on the action. President Biden has issued two executive orders in the past year or so around how to improve security and requirements for secure infrastructure and applications in the U.S. government.

What AppSec Means for App Developers

With this extreme focus from executives on security, what does that mean for you as an application developer? I think it actually means four things. Number one, is you’re expected to use all product security features. For example, if a database you’re using has an option to allow you to encrypt data at rest, turn that on. If you are making a call from one microservice to another, you can make that call over mutual TLS as opposed to over a plain, regular non-encrypted channel. The other thing that’s expected of you is to follow corporate security standards. More organizations have published standards that say, when you do certain things like a service calls another service, or when you go to a database and you want to access it, you have a password for the database, where does that password get stored? How is it managed? Thirdly, there’s just general expectation that you’re going to be able to design and implement secure applications, which I think is a reasonable assumption in 2022. In order to design and implement secure applications, there’s foundational things about security that you need to be familiar with. Lastly, in all of these organizations I talk to, there is a transformation program around doing more DevSecOps. You are the developer in the DevSecOps tool chain or process.

What Senior Business Leaders Want

If we take a step back and say, let’s double click on those four things, how do we go about doing those? If you look at it from the point of view of the senior business leaders at the top of the pyramid, they just want a secure application. In order to do a secure application, you as the developer need to do two things. Number one, is you need to write your code using some programming language using the libraries of that language, and that framework. For example, maybe you’re a Java developer building Spring Boot applications running on Kubernetes, which is then running on a public cloud. The second part is that you follow the corporate standards for secure software development lifecycle. When you start doing that middle layer, people usually get stuck. They get stuck because there’s something about the underlying layer that they don’t understand.

For example, let’s say you’re configuring Spring Security to allow you to log in with your Google account into the application. You have to provide it with this thing called a client ID and a client secret, because it implements something called the OpenID Connect protocol. The documentation for Spring Security is not going to teach you the OpenID Connect protocol, you’re expected to show up to the party knowing that protocol. If you want to avoid getting stuck, what you want to focus on as a developer is the bottom layer of the pyramid. If you learn the bottom layer of the pyramid, you can always more easily learn the second layer, the middle layer of the pyramid whenever you need to. That’s really what I want to talk to you a little bit about, in particular, that standards, protocols, and patterns that’s in the bottom left corner of the pyramid.

Key Security Skills for Application Developers

If we extract from that bottom left corner, what are the four things that you may want to focus on? It’s going to be, number one, secure your communication channels between your application and everywhere else using TLS. Number two is implement single sign-on in your applications, and try to get rid of passwords. You don’t want to store them in your app. You don’t want your users to have them either. Number three is you want to manage your credentials securely, all those API keys, database passwords, that your application might have. Number four, is when you have a microservice architecture, and you have one service calling another service, calling another service, you need a way to secure that call chain. We’ll talk about some of these. Again, my purpose here isn’t to solve these problems, but to more guide you as a developer on the things and then the direction that you need to go with.

Typical Legacy Enterprise App

What does securing communication channels mean? Let’s take a look at a classic typical legacy enterprise application, the simplest things of all. We have a three-tier application with some JSON HTTP coming in from a mobile app or a web browser, going to a monolithic backend, and going to a SQL database. When you think in terms of security from the enterprise, people will come back and they will say something like, I’m going to put firewalls on here. What does putting firewalls do? It assumes that you have this network zone that usually people call the red zone, where that’s exposed to the internet, maybe bad actors could get in. We’re going to have a firewall to basically say, the application server will only accept requests from a load balancer, and the database will only accept requests from the application server. This idea of segmenting the network into zones so that you can say the green zone is where I can keep my data, the yellow zone is where I keep my business logic, and the red zone is that edge to the internet that I have.

The key thing here is that this is not enough, because you can have a bad actor internally, you’re not the only application on the network. There could be other applications that have security issues, and they break down. There’s this move towards doing everything with zero trust. We don’t want the application server to trust the network, we want it to assume that its communication from the load balancer has to come in over mutual TLS, so the application server can make sure that the request only came from the load balancer. The load balancer can make sure that it sent the request to the application server. Similarly, the application server wants to know that it connected to the correct database. The database wants to know that the application server is the right application server talking to it. This means as a developer, you’re expected to be in more situations where TLS is being used. It used to be that TLS was just something you did at the edge of the network between the internet and your load balancer.

How to Learn Cryptography as a Developer

What do you have to know in order to understand this? There’s a lot that you need to know. This mind map here shows you how you need to approach this. There’s a lot of stuff on here. It’s possible when you go through this to get lost in all of the cryptography rules. If you’re looking at this and you’re like, I got to learn encryption. There’s this thing called authenticated encryption. What does that mean? There’s this elliptic curves, and key exchange, and certificates, and perfect forward secrecy, and all the different tradeoffs that cipher suites represent? I’m overwhelmed as a developer, what do I do? My advice to you is to think about cryptography not in terms of the math that it’s implemented in, although that’s very fascinating, but to think of cryptography as a set of primitives that are like black boxes, tools in your toolbox. You bring out different cryptographic primitives for different purposes. For each cryptographic primitive, it always provides you some security guarantees. For example, AES with authenticated encryption allows you to make sure that data is not only confidential, but you can detect if somebody has modified the encrypted data. You have to know how to configure it correctly. Because, unfortunately, some of these standards have settings on them that are no longer secure, because there were breakthroughs in attacking these algorithms. Or maybe just the computers got faster, and they’re no longer considered secure.

These primitives can also be misused, so it’s important that you understand the wrong ways of using them so you can avoid using it incorrectly. Lastly, you got to write some code using these primitives. It turns out that some cryptographic libraries out there are hard to use, because they assume that you’re an expert, and it’s easy to do the wrong thing with them. Then the end result is an insecure application. It’s always important to find a high quality implementation of one of those libraries. Finally, you got to learn how these cryptographic primitives come together into a protocol like TLS.

Problem 1: Detect Accidental Data

My most important recommendation to you is to actually wrap your head around these cryptographic primitives by looking at some simple problems that are realistic, things that you may actually encounter in the real world. I’ve got four problems for you to think about, and try to implement in your favorite programming language. On subsequent slides I also include links to a GitHub Org that has solutions to these in Java and Spring. Number one is, think about it as I want to detect accidental data corruption. In this scenario, you’re a shoe retailer who sells shoes online. People return shoes they don’t like. They arrive at your warehouse, and the warehouse employees look at the shoes that have been returned, check that they meet the return criteria. Then use the warehouse management application to approve a refund. That’s awesome. Then, what happens is that the warehouse management system generates a refunds.json. Think of it as having like an order number, say order number 123, is going to get a $50 refund. Then that is filed. The refunds.json goes on to the Payment Service, where the Payment Service is able to look at the Order ID and the amount to refund, and actually return the money to the credit card of the customer.

The problem we’re solving here is not even a security problem. We just want to basically say, how does the Payment Service make sure that the refunds.json is actually correct? It wasn’t accidentally data corrupted because there was a disk error, for example, or a network transfer error? The answer to that is you could use something called the Secure Hash Function to do that, and calculate a signature. That’s a problem for you to go to figure out how to solve on your own.

Problem 2: Detect Tampering and Validate Identity of File Creator

Number two is, now you’re like exactly the same scenario but this time the Payment Service wants to know that the warehouse management service was the one that generated the refunds.json file, and it wasn’t some rogue employee or hacker. Now what we say is that, the warehouse management application is going to sign the refunds.json file with a secret key that it knows. The Payment Service is going to read that refunds.json, and use that secret key to not only detect corruption of the data but also check that the only way this refunds.json could have been created was by somebody who possessed that secret key. That takes you to what’s called the Hash Message Authentication Code that was on my mind map earlier. Really simple to implement in something like Java, and incredibly useful as a foundational concept to wrap your head around.

Problem 3: Detect Corruption, Tampering, and Privacy of File Contents

The next requirement is, ok, I’ve got this refunds.json file, but it’s in plaintext. What I’d like to do is to encrypt it, so if somebody intercepts it, they cannot see what’s in the refunds.json. That’s where this idea of the Advanced Encryption Standard will come in, and you can do this with the Advanced Encryption Standard.

Problem 4: Solve Problem 1, 2, 3 without Using a Shared Secret

The next complication of the problem is, I have this shared secret, the warehouse management needs to know the password that was used to encrypt the refunds.json file, and the Payment Service needs the password in order to decrypt that. Sharing keys between things is a really hard problem. This is where public key cryptography comes in, and you can learn by solving this with a sample I’ve linked from a GitHub repo. You can learn how to actually do this using what’s called a Diffie-Hellman key exchange with the JOSE suite of libraries. I built it in such a way that it’s simple for a developer to follow along, again, with the goal of just understanding the concept, not really actually doing it like the way I would solve it here, because you just use TLS for that.

Standards to Learn

This is a list of all the things that you might want to know as a developer. For Java libraries, if you’re looking for stuff, I recommend the Google Tink library. It’s a developer friendly API for doing cryptography in general in Java and other languages, which is used by Google in production. It’s designed to allow you to not accidentally do the wrong thing.

Logging in Human Users

The next problem we run into is, we have all these users, we need to log them in. How do we do them, and we really don’t like passwords anymore. The answer to that is when we look at what do people want to do? How do they want to log in again? Think of the shoe retailer, ACME’s web shopping application. You might have a user who wants to log in with their thumbprint from their MacBook, another one wants to log in with the face ID on their iPhone. Somebody else wants to log in with their Facebook account, and somebody else wants to just use a plain old username and password. Those are all ways that you have to log in. Do you have to actually support all these ways to log in inside your app? The answer is no. What you should be doing is you should be delegating all of that into some single sign-on service. You can get these single sign-on services from the cloud, like a SaaS offering, things like Auth0 and Okta. Or you can write your own single sign-on service, maybe on top of something like Spring authorization server, or you could use a prepackaged authorization server that’s built into a platform that you’re using that comes with a larger platform maybe based on Kubernetes. The key thing here is that you as a developer only learn one protocol, which is the OpenID Connect protocol. That allows you to interact with all single sign-on servers, regardless of who wrote them, if that single sign-on server supports the OpenID Connect protocol.

Use a Phishing Resistant Hardware Key

However, even with all of these, when you do have that single sign-on server, you probably want to configure it to be phishing resistant. Hackers are getting very sophisticated, and they’ll create a fake website that looks just like your bank’s website. They’ll send you a text message, and this and that, to try to trick you into going to this fake website and actually entering your real username and password, and potentially your real one-time password, and get it. One of the ways to get around that is you can use something like this guy here. It’s a hardware security key. Let me just demonstrate how that works. I’m going to plug it into my MacBook. I’m going to switch over here to my GitHub. When I try to sign in, it’s going to say, your password was correct but I need you to use security keys. I’m going to say, yes, use a security key. Now it pops up and says, which security key do you want to use? I want to use the USB that I just plugged in. It’s flashing, but when I press it, I’m in. What that security key is doing is it’s checking that I’m actually talking to GitHub, so I don’t accidentally say yes to a website that is a phishing site.

Web Authentication Protocol

Another really exciting technology that you want to know about as a developer is something called web authentication. This is a protocol that allows you to really register users without ever asking them for a password. For example, I have my YubiKey plugged into my laptop right now. If I go here, and I say Adib Saikali, and I click on Register. It’s going to say, ok, how are you going to authenticate yourself? I’m going to use my hardware key. That’s going on. I’m going to press the button again, success. Let me log in now. When I log in, how would you like to log in? I’d like to use my hardware security key, please. I’m in. You can see here it actually knows who I am. It’s given me a user ID and it’s given me a public key. This Web Authentication Protocol actually works with all web browsers. This works on your iPhone, Android, Mac, Windows. It’s pretty prevalent these days. As you can see, this is wonderful for user experience, because how do you even have to sit there and come up with a password? What are the password rules for? How many characters and special characters and capital letters? You take that frustration out of the process. Bottom line is, as a developer, go ahead, learn OpenID Connect and learn web authentication.

Managing Credentials Securely

The next thing we want to talk about is credentials. How do you store them? You’ve got your monolithic app, and you’ve got a database password and an API key for the credit card processing API. Where do you store those keys? You should most definitely not store them in a text file on a server, because that’s easy for somebody to steal. You should put them in what’s called a credential service. The credential services are available from the cloud providers like Google KMS, or Azure Key Vault, or you can install your own vaults like HashiCorp Vault. That is very much a learn-the-credential vault that you have access to in your organization first. Once you learn how to use one of those vaults, it’s easy to learn other vaults later on. Bottom line, two things you should know about as a developer, know about OpenID Connect, know about web authentication, and learn how to use your credential service, that’s part of your organization.

Securing the Service-to-Service Call Chain

The last problem we want to talk about is the service-to-service call chain. Let’s actually break it down with a concrete example. Let’s say I have a webpage that looks like this, a product catalog. I’ve highlighted in the squares where some of the microservices might exist on the page. For example, you might have a price discount calculation service that factors in what marketing promotions are going on right now to assemble the page. If we take a step back, we’ll say, we’ve got a product service page. It calls a book detail service to get the details of the book. Calls the pricing service, which in turn calls the buying habits service, to find out what you’re buying habits are. Factor that into how much discount you’re getting, and whatever current marketing promotions are happening. This example can be generalized into this thing over here, whereby you have some external clients, requests enter at the edge microservice layer, and the microservice at the edge call other microservices down the chain. As you can see here with the service J in the bottom right corner, service J might want to know, who is the actual human at the other side of this application? Is it Adib or not? Whereas microservice H there in the middle, it basically says I don’t care who Adib is, who the user is, but I do need to know who called me? Was it microservice B, or was it microservice E? This is the idea of service identity versus user identity. It’s important to actually know both at all times.

This gets a little bit more complicated, because the lines on the diagram earlier were not really one protocol. I could have a situation where from the user to my edge microservice, it’s over HTTP, and that’s written, that’s a Java service, which uses OpenID Connect and web authentication to log the user in. You can then have that edge microservice in Java making an HTTP REST call to an internal microservice written in C#, which uses gRPC to call an internal service written in Go, which posts the message on a RabittMQ broker using the AMQP protocol, which is then picked up by a JavaScript Node.js service that is using AMQP. Looking at propagating user identity down the call chain, or service identity, you run into, how do I do that, or the variety of protocols that exist?

I have some bad news for you. There is no industry standard way of doing this, no app. There are patterns that can be used to solve this problem. The most important of those patterns is that you should use mutual TLS everywhere. Every service-to-service call should be over mutual TLS. Even if it’s going over messaging, you need to do it twice, on both ends. That’s why you need to be really good at TLS as a developer. Then, for the user identity, you’ve seen lots of ways to do it. People typically will say, I’m going to take a JWT token that describes the user, and signed by the login server, and I will just pass that down the chain and maybe bind it to a request with another signature or something like that. Lots of patterns that you can learn. However, if you know TLS, if you learn TLS, you will have the cryptography background to more easily understand this type of thing. Practically, there’s lots of infrastructure pieces that can help you with it. For example, if you might have a service mesh available in your Kubernetes cluster, that could be very helpful for implementing this or an API gateway.

Summary

I want to take a step back and say, as a developer looking to upskill yourself in security, here’s what I would do if I were you. I would start on this pyramid in the bottom left corner and focus on learning the standards and protocols. Set the goal for yourself to get really good at TLS, because that will force you to learn a lot of basic cryptography. Then, after you’ve figured that out, actually get good with a particular framework or language. For example, maybe you learn Spring Authorization Server, or Spring Security, Spring Cloud Gateway, and you learn how to put something together in Java, if you’re a Java dev. Then you also got to focus on learning some industry best practices, maybe like, how do I containerize my workload security?

Standards to Learn

Next, I just have some technologies and suggestions for things you may want to learn. Here’s my list of standards to learn to wrap your head around cryptography. Number one is what Secure Hash Function is, SHA-2 and SHA-3. Then learn about the Advanced Encryption Standard, in particular, the mode called authenticated encryption with associated data. What does that do? Then learn a bit about the JSON Object Signing and Encryption, because that’s used in the OpenID Connect protocol, so it helps you learn that later. Plus, it gives you some cool practical technologies you can use. You can’t escape knowing X.509 digital certificates, they’re part of TLS. You got to obviously learn TLS. You can learn the OpenID Connect protocol for the purpose of logging users in. The Web Authentication Protocol, so you can do these passwordless logins. Then finally, the Secure Production Framework for Everyone, or SPIFFE is an emerging standard for how to bootstrap trust. That really helps with solving some of the difficult problems in the service-to-service call chain scenario.

Frameworks for Java Developers

For Java frameworks, I recommend Google Tink, Nimbus for JOSE. Spring Security is just a general java security framework. The Spring Authorization Server is a brand new project from the Spring team, which gives you all the infrastructure you need to build your own custom SSL server. This can be handy when you’re trying to integrate with legacy environments where you have a non-standard internal service. There’s also Spring Cloud Gateway, which is wonderful for implementing a lot of patterns around the service-to-service call chain.

Cloud Infra

Finally, for cloud infrastructure, I’m assuming you’re doing stuff on Kubernetes because that’s emerged as the industry standard these days for cloud native. How do you containerize securely? Highly suggest you learn about something Called Cloud Native Buildpacks, for building a process to keep your containers patched all the time. Obviously, you got to learn how to run applications on Kubernetes in secure ways. There’s concepts on service mesh, like Istio. SPIRE is the runtime environment for SPIFFE, so that’s useful to know. Of course, whatever key vault you happen to have available in your environment.

Questions and Answers

Losio: The first question was really about TLS. That was, what if I run two different services inside the same container, or same cluster, or whatever, what’s really the benefit to do encryption using TLS or any kind of encryption? Here we are always assuming that we keep running for life inside the same cluster. Any thought or any feedback?

Saikali: In my line of business, I spend most of my time with customers, and there’s this kind of always the magical tool that’s going to mean you need to know less as a developer. I have yet to meet this tool. It doesn’t matter what it is. I remember back in the day when Hibernate came out, it was like, you don’t need to know SQL. Of course, you needed to know SQL sometimes better because, yes, I don’t have to spend all my time writing SQL statements but I need to know what my tools are using. I have maybe not a different perspective. The perspective I run into is a whole lot of people basically say your infrastructure is going to take care of things for you, and you don’t have to know anything as a developer. You don’t have to know anything about TLS. You don’t have to know anything about mutual TLS. A lot of security people have really given up on developers learning security. I’m the opposite of that. I actually have a very strong belief in the potential of developers to increase their security skills and what that means for the industry.

From my point of view, you don’t have to be an expert in TLS to learn the basics of TLS, and then you can configure your service mesh better, if you have a service mesh. You can participate more meaningfully in interactions with customers. I’ve seen insane stuff happen, where people basically thought that certain things gave them security when they didn’t. There were certain things that really did give them security and they thought they didn’t secure things. I think it’s a worthy goal for developers to learn TLS to a level that a developer needs to understand TLS, not to the level where you’re implementing the protocol. Just like you drive your car, you learn the rules of the road. It’s the same thing, why not learn TLS?

Losio: Actually, you raised another very good point, something that I was thinking when you started your presentation, talking about cloud services and things that are very available to use. I was thinking, even sometimes the cloud provider make things even too easy that if with a click of a button with just a true, false in an API call, I can encrypt data. Think about Amazon S3, I can encrypt data. When I store my data, I can say, encrypt by default with the default key, whatever. The problem is, on one side, if it’s easy enough that you should really do it, why not? On the other side, it’s easy enough that you might not even know what you’re doing, and assume that it’s much more than what that service is doing. You think, my data is secure just because I enable that. I think it is back to your problem of saying that, yes, you need to know, more or less, not just what the service provider, what test tool is there, but what it’s supposed to do.

Saikali: I 100% agree. It’s really because what I find in all of the different customer engagements I’ve been on for the last seven years, there’s always a point in the process where you get to go talk to InfoSec. They want to know how this application is secure. At that point is where you run into the issues of, can you speak the InfoSec’s language or is this all a black box? If they see that you are more knowledgeable about security, you can have a more fruitful conversation, and you can more easily get your application approved to go to production. If you don’t, then you’re going to struggle just getting through that internal review process and it may cause project delays. I’ve seen both of those situations pan out. For me, as a developer, I look at technologies from, what should I focus my energy on? What should I learn?

I use this model from Scott Ambler, where he talks about if you learn a paradigm, that knowledge is good for 25 years, if you learn a particular platform or technology, it’s only good for 10 years. For example, if you learned Kubernetes in 2015, you could charge whatever hourly rate you want, but you can bet that in three years, everybody that wants to know Kubernetes, knows Kubernetes, or needs to know Kubernetes knows Kubernetes. Over time, a lot of these technologies, they lose their value of the knowledge, but some of the things at the core of security, they’re never going to lose their value. If you learn what a cryptographic hash function is, and how AES works, these things are going to be around for a very long time, or TLS. The details will change. There’ll be a new version of the protocol. There’ll be a new version of the algorithm, but that’s timeless knowledge, in my opinion, for a developer far more useful than learning how to use a service mesh right now. If you have 10 hours to learn something and your choice is learn TLS or learn service mesh, go learn TLS. You’re going to use that TLS knowledge for the rest of your career. Service mesh, who knows how long that’ll be useful for?

Losio: Actually, there was something you mentioned in the beginning that I found quite interesting, was when you said basic application security is a CEO level problem, because the CEO can define that as well, how management is concerned about safety. I was thinking basically of two different scenarios. One is, I’m a software developer in a large corporation. In reality, I might be worried about the reputation of my company, but not too much about the CEO. I think like, it’s his problem, it’s there. If he gets fired. That’s not the reason why I’m implementing security. That’s the first problem. On a more realistic level, I was thinking more in the startup scenario, where I’m not saying that people don’t care about security, but it’s usually not high on the agenda because even the concept of risk is on a different level. If you tell a CEO of a startup that is trying to survive, make it through in very short time, if you say there’s a risk that your application gets hacked, and you’re going to lose everything. He is probably going to tell you, there’s a very high risk that I’m going to lose everything anyway. How do you get that mindset more so in the startup world, in a new company, when the pressure is just to get fast, get big, and whatever?

Saikali: Let’s break down those two scenarios. Let’s say you’re the enterprise developer working in a Mega Corp with 5000 other developers, and you’re like, CEO, who cares? The CEO is going to go buy a private island, I don’t care if they get fired. They have lots of money to play. That’s actually true, so I don’t care as a developer about the fate of the CEO. What I do care about is the quality of the tools that I’m using. What I see in a lot of these Mega Corps is really archaic processes, where it’s like, you would like to use something that makes your life better as a developer, but you’re not allowed to, or why aren’t you allowed to, because information security needs to approve it. Why doesn’t information security approve it? Because they’re scared to approve anything that isn’t already approved? Because that senior vice president who runs the security team doesn’t want to get fired. It has an impact on you. My call to action is, there’s like a cultural change that we need to go through as an industry around security. That’s what we have to do.

I’ll give you a really interesting story. On Sunday, I hosted a Matrix Resurrections watching party, and invited a bunch of family and friends, including my electrician who did the Reno in my house, because he helped me set up the TV. When he showed up, he’s like, my Google account got hacked, and all these people started buying laptops with my credit card and all the stuff that was saved. The mistake he did was he signed in on some unknown computer with his Google ID, with his personal ID, and the hack started. He was showing me off his Google key that he bought, the hardware key, the equivalent of the YubiKey. I was having to explain to him like, yes, never ever log in. These things have real world consequences. You should care about this as a developer. Just change your attitude around security. That’s number one.

In the startup world, startups are starting to get more complicated. The cost of launching a startup in 2022 is significantly higher than it was in 1995, or even 2000, or 2012. I look at it from the point of view, how many hours do I need to invest to learn the security stuff as a developer? How much more valuable I am. Security isn’t hard to do if you know how to do it. It’s difficult to do when you don’t know how to do it. The other side about not knowing the security stuff as a developer in a startup is like, how much time are you going to waste on Stack Overflow, “I got a TLS handshake error when I called this thing or that thing.” You waste more time because you didn’t take the time to learn it. This is why I’m putting this book together. I am not “a security professional.” I am basically saying, I want to explain the basics of security to software developers in a way that makes sense to security developers, so security developers can have a seat at the table with the wider InfoSec community and enable that higher, more meaningful communication. Where the InfoSec people don’t just literally say developers can’t be trusted, because they don’t know anything about security.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.