Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDU-3813: Updates Encyclopedia to add Durable Execution landing #3298

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 206 additions & 0 deletions docs/encyclopedia/durable-execution/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
---
id: index
title: Durable Execution
sidebar_label: Durable Execution
fairlydurable marked this conversation as resolved.
Show resolved Hide resolved
slug: /durable-execution
description: Build scalable and reliable applications with the Temporal Durable Execution orchestration framework.
toc_max_heading_level: 3
keywords:
- concepts
tags:
- Durable Execution
- Temporal
- Concepts
---

import PrettyImage from '@site/src/components/pretty-image/PrettyImage';


How would you change the way you code if your app _couldn't_ fail?
fairlydurable marked this conversation as resolved.
Show resolved Hide resolved
What if you could opt into crash-proof execution?

Durable Execution keeps your apps running, even under the worst scenarios.
It records the progress and state of your workflows, so disruptions won't lose or corrupt your work.
Whether your app is facing a service outage or unexpected shutdown, Durable Execution makes sure it picks up where it left off and you don't repeat work that was already done.
This reliability lets your app handle disruptions and deliver results as if the issue didn't happen in the first place.

## What's Durable Execution?

Durable Execution lets systems keep running and making forward progress even when things go wrong.
It uses state persistance and automatic task retries to create a fault-tolerant environment that ensures reliable execution.
Most commonly used for long-running and distributed systems, Durable Execution separates application state and progress from an application's hardware or cloud-based execution.
If one of your computers suddenly dies, Durable Execution can transfer its running application workflow to another computer or processing center and pick up where it left off with no or minimal data loss.

Durable Execution platforms are resilient and support high levels of data integrity.
They're built to run jobs that are as short as moments or as long as years.
They'll keep running even if the underlying infrastructure changes over time.
Adopting Durable Execution makes your code simpler and your deployments more observable.

## Business logic focus {#focus}

Durable Execution shrinks your code, letting you move external dependcy mitigation handling out of your apps.
With Durable Execution, you can focus on your workflows and business logic, not on handling errors.
The following code is real and it works:

![Sample showing minimal code for a long-running process](/img/encyclopedia/durable-execution/remind-user-workflow.png)

Adopting the Durable Execution paradigm produces streamlined code:

- **Cleaner code**.
Move abnormal condition handling out of your logic.

- **Run forever.**
fairlydurable marked this conversation as resolved.
Show resolved Hide resolved
Don’t worry about crashes or system outages, even over years or decades.

- **Runs under every condition.**
Durable Execution separates oversight like progress tracking from your running code instances.
When things go wrong, you can wait for them to resolve, move processing to other systems or to other regions and centers.

- **Deploy and run at the same time.**
Durable Execution makes sure that each time your code runs, it follows the original logic and pathway.
Ship updates and patches without changing outcomes for your existing long-running processes.

You gain these advantages by adopting Durable Execution into your applications.

## Temporal and Durable Execution
fairlydurable marked this conversation as resolved.
Show resolved Hide resolved

When using Temporal, Durable Execution separates your work's state and progress (called your "Event History") from its code.
This abstracted oversight (called "orchestration") takes place on a central server.
It uses a persistent state and progress data store, so if your computing breaks, your workflows won't.

Temporal's approach offers specific advantages:
fairlydurable marked this conversation as resolved.
Show resolved Hide resolved

- **Separation of management and execution.**
The Temporal Service isn't tied to specific task workers or computing platforms.

- **Scale as needed.**
Durable Execution scales with your business.
Each execution is a unique progress abstraction.
Add more computing resources to match your needs.
This lets you managing additional work without affecting the consistency or reliability of your execution process.

- **Reduce latency**.
Durable Execution is fast and reliable.
It processes tasks quickly and efficiently, ensuring short and predictable response times.

These features combine to provide responsive and reliable services.

## Self healing and catastrophes {#issue-types}

Imagine developing a system to handle reimbursements for your employees.
Now, consider ways your process might get blocked -- and resolved.
For example:

- **Your finance manager goes on vacation and can't approve a reimbursement**.
What do you do? You can set a time-out policy ("it's been more than 3 business days") and use alternate routing (redirect the approval to another coworker) or messaging ("Hey, I'll be out of the office until _date_") so every reimbursement gets addressed in time or delayed with full clarity.

- **Your direct deposit with the reimbursed funds failed**.
For example, there might be an outage at the recipient's bank.
After setting a retry policy that won't overload the API provider’s capacity, your process can keep trying until the deposit works.
After giving the provider time to recover, you can run your code again and succeed.

- **A printer for paper checks is jammed or out of paper**.
Not every employee opts into direct deposit.
You may need someone to manually walk over and take care of the printer issue before the check can be cut and sent.
Once resolved, they can sign off to confirm the check printing task was completed.

These examples cover both hybrid human-technology situations (approval and the printer) as well as fully automated ones (the bank).

With Durable Execution, any problem that recovers over time isn’t really a problem.
You have a built-in way to retry your task later.
Durable Execution keeps your tasks alive and moving, whether they're fully automated or integrated with human actions.
It doesn't matter if your problems originate with computing, API calls, machinery, or personnel.
Durable Execution is built to keep processes moving forwards, regardless.

To be clear, not all tasks heal over time.
For example, one of your service providers might go out of business.
Retrying your API calls won't get you anywhere if that happens.
That's why Durable Execution is designed to handle catastrophes as well as intermittent issues.

When you run into outlier cases where something is truly broken, you need a solution like Temporal.
With Temporal, you can patch your code to use a new provider and safely deploy your fixes.
You can "replay" your flow's execution history to pick up real-world changes.
This allows it to complete your process without losing or repeating work.

Temporal capably handles both the self-healing and catastrophic scenarios.
To opt in, you need to be aware of the restrictions that allow Temporal to work its magic.

## Temporal requirements {#temporal-requirements}

Temporal's use of Durable Execution depends on a few critical factors to ensure you won’t lose or repeat work.
Temporal uses a technique known as History Replay, which depends on the following:

- **A durable store**:
Event History must be saved durably using your server's persistent store.
A workflow run, or its abstract execution, must persist forever or until you explicitly no longer need it.

- **Idempotency**:
Idempotency means you design tasks to succeed once and only once.
An idempotent approach prevents process duplication, like withdrawing money twice or accidentally shipping extra orders.
Run-once actions maintain data integrity and prevent costly errors.
Idempotency keeps operations from producing additional effects, protecting your processes from accidental or repeated actions, ensuring reliable execution.

- **Determinism**:
Durable Execution stores and tracks every workflow as an abstract entity.
If you need to restart the process under extreme circumstances, that process must align with the original run.
You can't change a random number or a real measurement (like temperature, time, or location) from the first run.
If you do, you can't just pick up from where you left off because the work no longer matches the earlier history.

Durable Execution requires your workflow code to be deterministic.
Every time it runs or is replayed, the outcomes must be the same.
This is the only way centralized control can provide all of Durable Execution's features.

Does this mean you can’t use random numbers or run your work on different days or in different environments?
Of course not.
It means your code must reliably pick up from where it left off without changing the past in any logical way.
fairlydurable marked this conversation as resolved.
Show resolved Hide resolved
This is called determinism.
It ensures that given the same starting conditions, your workflows behave identically during each execution.
Your results are reliable and assured.

With Temporal's pre-requisites in place, you're ready to adopt Durable Execution into your applications.

## Temporal and Durable Execution {#value}

Durable Execution offers a powerful solution for building reliable and scalable applications.
It ensures that your workflows continue seamlessly, even when facing failures or disruptions.
Durable Execution is:

- **Stateful and persistent**:
Durable Execution tracks progress and maintains state even when your service restarts or experiences failures.
It stores checkpoints in external databases and logs, ensuring your system handles outages or crashes without losing progress.

- **Fault tolerant**:
Durable Execution handles failures automatically, keeping tasks running even when parts of your system go down.
When a failure occurs, it recovers tasks without interrupting your entire application.

- **Designed to separate concerns**:
Durable Execution splits oversight (task orchestration) from infrastructure management.
Focus your app's logic on on business processes and application-level logic, like managing fraud alerts or insufficient funds in a banking app, and not on status recovery.
Durable Execution handles state and errors related to platform issues, such as network outages or infrastructure failures so you don't have to.

- **Won't repeat work**:
Durable Execution ensures tasks are not repeated unnecessarily.
When a task fails, it retries it using policies designed to ensure success without duplicating work.
This keeps the process consistent, eliminating redundant work even when errors arise.
You won't be sending out seven pizzas when the customer ordered just one.

- **Naturally recoverable**:
Even in worst-case scenarios, Durable Execution recovers execution without losing progress.
Moving to new hardware or service center deployments won't interrupt your workflows.

- **Inherently observable**:
Durable Execution makes the state, health, and progress of your app fully visible.
It tracks tasks in real time, so you see progress, failures, and retries as they happen.

These features work together to make sure your process will keep moving forward and complete successfully.
Temporal's implementation of Durable Execution, whether you're self hosting or using our world class Temporal Cloud service, provide the solution.

Durable Execution helps you build reliable and scalable applications.
fairlydurable marked this conversation as resolved.
Show resolved Hide resolved
It keeps your workflows running smoothly, even through system failures or disruptions.
By separating your application logic from task orchestration, Durable Execution ensures that your processes are consistent, reliable, and error-free.

With automatic recovery, Durable Execution guarantees that tasks complete without losing or repeating work.
It simplifies your code, lets you scale easily, and ensures that your app can handle any challenges along the way.
Durable Execution makes sure your critical processes keep moving forward, no matter what.

Getting started with Temporal helps ensure your work is reliable, efficient, and scalable.
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
---
id: temporal-sdks
title: About Temporal SDKs
sidebar_label: About the SDKs
description: Temporal SDKs are open-source tools enabling scalable and reliable application development. They feature APIs for Workflow and Activity execution, automatic retries, and resilience mechanisms, making it easier to build fault-tolerant applications.
toc_max_heading_level: 4
title: Develop with Temporal SDKs
sidebar_label: Develop with Temporal SDKs
slug: /encyclopedia/temporal-sdks
description: Temporal SDKs are open-source frameworks enabling scalable and reliable application development with automatic retries, and resilience mechanisms for fault-tolerant applications.
toc_max_heading_level: 3
keywords:
- components
- developers guide
Expand Down
111 changes: 111 additions & 0 deletions docs/encyclopedia/durable-execution/temporal.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
id: temporal
title: What is Temporal?
sidebar_label: What is Temporal?
slug: /temporal
description: Temporal is a scalable platform that ensures the Durable Execution of application code, allowing reliable and resilient Workflow Executions even in the face of failures like network outages or server crashes.
toc_max_heading_level: 3
keywords:
- durable execution
- explanation
- temporal
- term
tags:
- Durable Execution
- Temporal
- Concepts
---

Temporal is a scalable and reliable runtime for [Durable Executions](/durable-execution).
Temporal helps you build applications as if failures don’t exist.
Your application runs reliably, even when facing issues like network outages or server crashes, which would typically disrupt other applications.
The Temporal Platform handles these problems, allowing you to focus on your business logic instead of writing code to detect and recover from failures.

The Temporal System abstracts away failure handling and mitigation.
It's an inherently scalable solution.
It handles both millions, or even billions, of workflow processes, as well as processes that can last decades:

<div className="tdiw">
<div className="tditw">
<p className="tdit">The Temporal System</p>
</div>
<div className="tdiiw" height="740">
<img
className="img_ev3q"
src="/diagrams/temporal-system-simple.svg"
alt="The Temporal System"
/>
</div>
</div>

## The Temporal platform {#temporal-platform}

The Temporal Platform consists of the [Temporal Service](/clusters) and [Worker Processes](/workers#worker-process).
The [Temporal Service](/clusters) supervises the system, while application code is bundled with the [Worker Processes](/workers#worker-process).
These components work together to create a runtime for your application.

<div className="tdiw">
<div className="tditw">
<p className="tdit">The Temporal Platform</p>
</div>
<div className="tdiiw" height="740">
<img
className="img_ev3q"
src="/diagrams/temporal-platform-simple.svg"
alt="The Temporal Platform"
/>
</div>
</div>


## The Temporal Service

A Temporal Service consists of the Temporal Server and a database.
Our software-as-a-service (SaaS) offering, [Temporal Cloud](https://cloud.temporal.io), provides an alternative to hosting the Temporal Service yourself.

Your Worker Processes are hosted and operated by you, executing your code.
Workers run using our SDKs.

<div className="tdiw">
<div className="tditw">
<p className="tdit">Basic component topology of the Temporal Platform</p>
</div>
<div className="tdiiw" height="1121">
<img
className="img_ev3q"
src="/diagrams/temporal-platform-component-topology.svg"
alt="Basic component topology of the Temporal Platform"
/>
</div>
</div>

## Temporal Applications {#temporal-application}

A Temporal Application is a set of [Temporal Workflow Executions](/workflows#workflow-execution).
Each Temporal Workflow Execution has exclusive access to its local state, runs concurrently with other Workflow Executions, and communicates with them and the environment via message passing.

A Temporal Application can consist of millions to billions of Workflow Executions.
Workflow Executions are lightweight and consume few compute resources.
If a Workflow Execution is suspended (for example, when waiting), it uses no compute resources at all.

A Temporal Workflow Execution is a **re-entrant process**.
It is *resumable*, *recoverable*, and *reactive*:

- **Resumable**: It can continue after being suspended on an _awaitable_.
- **Recoverable**: It can continue after being suspended due to a _failure_.
- **Reactive**: It can respond to external events.

A Temporal Workflow Execution runs a [Temporal Workflow Definition](/workflows#workflow-definition), also known as a Temporal Workflow Function, executing your application code exactly once and to completion—whether your code runs for seconds or years, even under heavy load or failure conditions.

## Failures and resilience {#failure}

[Temporal Failures](/references/failures) represent different types of errors in the system, seen in both the SDKs and Event History.

Handling failure is a key part of development.
For more details on the difference between application-level and platform-level failures, check out [Handling Failure From First Principles](https://dominik-tornow.medium.com/handling-failures-from-first-principles-1ed976b1b869).
For how to apply these concepts in Temporal, see [Failure Handling in Practice](https://temporal.io/blog/failure-handling-in-practice).

In languages that throw errors (or exceptions), throwing a non-Temporal Failure causes the Workflow Task to fail.
This results in the Task being retried until it succeeds.
Throwing a Temporal Failure (or allowing one to propagate from Temporal calls, such as an [Activity Failure](/references/failures#activity-failure) from an Activity) causes the Workflow Execution to fail.
For more details, see [Application Failure](/references/failures#application-failure).
Loading
Loading