Temporal on AWS Aims to Ease Building Resilient Distributed Systems

MMS Founder
MMS Steef-Jan Wiggers

Article originally posted on InfoQ. Visit InfoQ

Temporal, an open-source microservices orchestration platform focused on durable execution, has been available on the AWS marketplace. By integrating with AWS, the company aims to simplify the development of resilient distributed systems for large-scale applications.

In a recent blog post, Sai Kotagiri, senior partner solutions architect at AWS, and Neil Dahlke, staff solutions architect at Temporal, detailed how the combination of Temporal’s platform and AWS infrastructure enables organizations to build fault-tolerant applications that maintain consistency and automatically recover from system failures.

The post addresses the inherent challenges of distributed systems, particularly for businesses processing high volumes of transactions. It uses a hypothetical e-commerce outage during a peak sales period as a prime example. Issues like database failures, network problems, API timeouts, and queue overflows can lead to degraded service and data inconsistencies. The authors argue that Durable Execution, a paradigm that ensures transaction completion despite disruptions by tracking each step’s state and allowing resumption from the point of failure, is crucial for overcoming these challenges.

Elaborating on the importance of durable execution, as Sewen, a participant in a Hacker News discussion about the related durable execution engine Restate, explained:

The way we think about durable execution is that it is not just for long-running code… But durable execution is immensely helpful for anything that has multiple steps that build on each other. Anytime your service interacts with multiple APIs, updates some state, keeps locks, or queues events… Almost all backend logic that changes state ultimately benefits from a durable execution foundation… The question is then: Can we make the overhead low enough and the system lightweight enough such that it becomes attractive to use it for all those cases? That’s what we are trying to build here.

Temporal’s platform provides durable execution through its SDKs, allowing applications to maintain state across failures and distributed processes. Developers define Workflows to orchestrate application logic and break them into individual Activities with configurable retry policies. Workers then communicate with the Temporal Server via task queues to execute these workflows and activities. The Temporal Service manages the execution and state of these workflows, with deployment options including self-hosting on AWS or using Temporal Cloud.

(Source: AWS Partner Network blog post)

A key advantage highlighted is Temporal’s event-driven architecture, which maintains a detailed, immutable event history of every workflow action. This, coupled with persistent state management, allows Temporal to systematically recover from failures by detecting the issue, assessing the current state, reviewing the event history, and resuming execution from the last valid point. The platform guarantees exactly-once execution semantics, ensuring data integrity even during disruptions.

As one respondent on a Reddit thread noted:

Yeah, I’ve used Temporal.io. It’s a solid framework if you need to manage complex, long-running workflows with reliability. Unlike AWS Step Functions, it handles retries, state persistence, and failure recovery out of the box without you having to write tons of custom code.

The blog post also emphasizes the efficiency gains of running Temporal Workers in containerized environments on AWS services like Amazon Elastic Kubernetes Service (Amazon EKS) and Amazon Elastic Container Service (Amazon ECS). This approach automatically scales processing capabilities based on real-time demand, ensuring consistent performance even during peak loads.

(Source: AWS Partner Network blog post)

The integration between Temporal and AWS extends to security and operational excellence. AWS Certificate Manager handles mTLS certificate management for secure communication, while AWS PrivateLink provides private connectivity between customer VPCs and Temporal Cloud. Amazon Kinesis integration streamlines real-time audit logging for enhanced operational visibility. Customers can deploy their Temporal-based applications on Amazon EC2 instances or containerized environments on ECS or EKS, leveraging AWS’s scalability and resource utilization.

The blog post concludes that by combining Temporal’s durable execution capabilities with AWS’s comprehensive infrastructure and managed services, organizations can focus on business logic rather than the complexities of managing infrastructure failures, ultimately leading to improved developer experience and reduced operational overhead.

Lastly, examples and tutorials are available on Learn Temporal.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.