Part 2: Simulate Failures
In this part, you'll simulate failures to see how Temporal handles them. This demonstrates why Temporal is particularly useful for building reliable systems.
The key concept here is durable execution: your workflow's progress is saved after every step. When failures and crashes happen (network issues, bugs in your code, server restarts), Temporal resumes your workflow exactly where it stopped. No lost work, no restarting from the beginning.
You'll crash a server mid-transaction and see zero data loss, then inject bugs into code and fix them live while your application continues running.
Experiment 1 of 2: Crash Recovery Test
Unlike other solutions, Temporal is designed with failure in mind. You're about to simulate a server crash mid-transaction and watch Temporal handle it flawlessly.
The Challenge: Kill your Worker process while money is being transferred. In traditional systems, this would corrupt the transaction or lose data entirely.
What We're Testing
Before You Start
What's happening behind the scenes?
The Temporal Server acts like a persistent state machine for your Workflow. When you kill the Worker, you're only killing the process that executes the code - but the Workflow state lives safely in Temporal's durable storage. When a new Worker starts, it picks up exactly where the previous one left off.
This is fundamentally different from traditional applications where process crashes mean lost work.
Instructions
Step 1: Start Your Worker
First, stop any running Worker (Ctrl+C) and start a fresh one in Terminal 2.
python run_worker.py
go run worker/main.go
mvn compile exec:java -Dexec.mainClass="moneytransferapp.MoneyTransferWorker"
npm run worker
dotnet run --project MoneyTransferWorker
bundle exec ruby worker.rb
Step 2: Start the Workflow
Now in Terminal 3, start the Workflow. Check the Web UI - you'll see your Worker busy executing the Workflow and its Activities.
python run_workflow.py
go run start/main.go
mvn compile exec:java -Dexec.mainClass="moneytransferapp.TransferApp"
npm run client
dotnet run --project MoneyTransferClient
bundle exec ruby starter.rb
Step 3: Simulate the Crash
The moment of truth! Kill your Worker while it's processing the transaction.
Jump back to the Web UI and refresh. Your Workflow is still showing as "Running"!
That's the magic! The Workflow keeps running because Temporal saved its state, even though we killed the Worker.
Go back to Terminal 2 and kill the Worker with Ctrl+C
Step 4: Bring Your Worker Back
Restart your Worker in Terminal 2. Watch Terminal 3 - you'll see the Workflow finish up and show the result!
python run_worker.py
go run worker/main.go
mvn compile exec:java -Dexec.mainClass="moneytransferapp.MoneyTransferWorker"
npm run worker
dotnet run --project MoneyTransferWorker
bundle exec ruby worker.rb
Mission Accomplished! You just simulated killing the Worker process and restarting it. The Workflow resumed where it left off without losing any application state.
Try killing the Worker at different points during execution. Start the Workflow, kill the Worker during the withdrawal, then restart it. Kill it during the deposit. Each time, notice how Temporal maintains perfect state consistency.
Check the Web UI while the Worker is down - you'll see the Workflow is still "Running" even though no code is executing.
Experiment 2 of 2: Live Bug Fixing
The Challenge: Inject a bug into your production code, watch Temporal retry automatically, then fix the bug while the Workflow is still running.
Live Debugging Flow
Before You Start
What makes live debugging possible?
Traditional applications lose all context when they crash or fail. Temporal maintains the complete execution history and state of your Workflow in durable storage. This means you can:
- Fix bugs in running code without losing progress
- Deploy new versions while Workflows continue executing
- Retry failed operations with updated logic
- Maintain perfect audit trails of what happened and when
This is like having version control for your running application state.
Instructions
Step 1: Stop Your Worker
Before we can simulate a failure, we need to stop the current Worker process. This allows us to modify the Activity code safely.
In Terminal 2 (where your Worker is running), stop it with Ctrl+C.
What's happening? You're about to modify Activity code to introduce a deliberate failure. The Worker process needs to restart to pick up code changes, but the Workflow execution will continue running in Temporal's service - this separation between execution state and code is a core Temporal concept.
Step 2: Introduce the Bug
Now we'll intentionally introduce a failure in the deposit Activity to simulate real-world scenarios like network timeouts, database connection issues, or external service failures. This demonstrates how Temporal handles partial failures in multi-step processes.
Find the deposit() method and uncomment the failing line while commenting out the working line:
activities.py
@activity.defn
async def deposit(self, data: PaymentDetails) -> str:
reference_id = f"{data.reference_id}-deposit"
try:
# Comment out this working line:
# confirmation = await asyncio.to_thread(
# self.bank.deposit, data.target_account, data.amount, reference_id
# )
# Uncomment this failing line:
confirmation = await asyncio.to_thread(
self.bank.deposit_that_fails,
data.target_account,
data.amount,
reference_id,
)
return confirmation
except InvalidAccountError:
raise
except Exception:
activity.logger.exception("Deposit failed")
raise
Save your changes. You've now created a deliberate failure point in your deposit Activity. This simulates a real-world scenario where external service calls might fail intermittently.
Find the Deposit() function and uncomment the failing line while commenting out the working line:
activity.go
func Deposit(ctx context.Context, data PaymentDetails) (string, error) {
log.Printf("Depositing $%d into account %s.\n\n",
data.Amount,
data.TargetAccount,
)
referenceID := fmt.Sprintf("%s-deposit", data.ReferenceID)
bank := BankingService{"bank-api.example.com"}
// Uncomment this failing line:
confirmation, err := bank.DepositThatFails(data.TargetAccount, data.Amount, referenceID)
// Comment out this working line:
// confirmation, err := bank.Deposit(data.TargetAccount, data.Amount, referenceID)
return confirmation, err
}
Save your changes. You've now created a deliberate failure point in your deposit Activity. This simulates a real-world scenario where external service calls might fail intermittently.
Find the deposit() method and change activityShouldSucceed to false:
AccountActivityImpl.java
public String deposit(PaymentDetails details) {
// Change this to false to simulate failure:
boolean activityShouldSucceed = false;
// ... rest of your method
}
Save your changes. You've now created a deliberate failure point in your deposit Activity. This simulates a real-world scenario where external service calls might fail intermittently.
Find the deposit() function and uncomment the failing line while commenting out the working line:
activities.ts
export async function deposit(details: PaymentDetails): Promise<string> {
// Comment out this working line:
// return await bank.deposit(details.targetAccount, details.amount, details.referenceId);
// Uncomment this failing line:
return await bank.depositThatFails(details.targetAccount, details.amount, details.referenceId);
}
Save your changes. You've now created a deliberate failure point in your deposit Activity. This simulates a real-world scenario where external service calls might fail intermittently.
Find the DepositAsync() method and uncomment the failing line while commenting out the working block:
MoneyTransferWorker/Activities.cs
[Activity]
public static async Task<string> DepositAsync(PaymentDetails details)
{
var bankService = new BankingService("bank2.example.com");
Console.WriteLine($"Depositing ${details.Amount} into account {details.TargetAccount}.");
// Uncomment this failing line:
return await bankService.DepositThatFailsAsync(details.TargetAccount, details.Amount, details.ReferenceId);
// Comment out this working block:
/*
try
{
return await bankService.DepositAsync(details.TargetAccount, details.Amount, details.ReferenceId);
}
catch (Exception ex)
{
throw new ApplicationFailureException("Deposit failed", ex);
}
*/
}
Save your changes. You've now created a deliberate failure point in your deposit Activity. This simulates a real-world scenario where external service calls might fail intermittently.
Find the deposit method and uncomment the failing line that causes a divide-by-zero error:
activities.rb
def deposit(details)
# Uncomment this line to introduce the bug:
result = 100 / 0 # This will cause a divide-by-zero error
# Your existing deposit logic here...
end
Save your changes. You've now created a deliberate failure point in your deposit Activity. This simulates a real-world scenario where external service calls might fail intermittently.
Step 3: Start Worker & Observe Retry Behavior
Now let's see how Temporal handles this failure. When you start your Worker, it will execute the withdraw Activity successfully, but hit the failing deposit Activity. Instead of the entire Workflow failing permanently, Temporal will retry the failed Activity according to your retry policy.
python run_worker.py
Here's what you'll see:
- The
withdraw()Activity completes successfully - The
deposit()Activity fails and retries automatically
go run worker/main.go
Here's what you'll see:
- The
Withdraw()Activity completes successfully - The
Deposit()Activity fails and retries automatically
Make sure your Workflow is still running in the Web UI, then start your Worker:
mvn clean install -Dorg.slf4j.simpleLogger.defaultLogLevel=info 2>/dev/null
mvn compile exec:java -Dexec.mainClass="moneytransferapp.MoneyTransferWorker" -Dorg.slf4j.simpleLogger.defaultLogLevel=warn
Here's what you'll see:
- The
withdraw()Activity completes successfully - The
deposit()Activity fails and retries automatically
npm run worker
Here's what you'll see:
- The
withdraw()Activity completes successfully - The
deposit()Activity fails and retries automatically
dotnet run --project MoneyTransferWorker
Here's what you'll see:
- The
WithdrawAsync()Activity completes successfully - The
DepositAsync()Activity fails and retries automatically
bundle exec ruby worker.rb
In another terminal, start a new Workflow:
bundle exec ruby starter.rb
Here's what you'll see:
- The
withdrawActivity completes successfully - The
depositActivity fails and retries automatically
Check the Web UI - click on your Workflow to see the failure details and retry attempts.
Key observation: Your Workflow isn't stuck or terminated. Temporal automatically retries the failed Activity according to your configured retry policy, while maintaining the overall Workflow state. The successful withdraw Activity doesn't get re-executed - only the failed deposit Activity is retried.
Step 4: Fix the Bug
Here's where Temporal really shines - you can fix bugs in production code while Workflows are still executing. The Workflow state is preserved in Temporal's durable storage, so you can deploy fixes and let the retry mechanism pick up your corrected code.
Go back to activities.py and reverse the comments - comment out the failing line and uncomment the working line:
activities.py
@activity.defn
async def deposit(self, data: PaymentDetails) -> str:
reference_id = f"{data.reference_id}-deposit"
try:
# Uncomment this working line:
confirmation = await asyncio.to_thread(
self.bank.deposit, data.target_account, data.amount, reference_id
)
# Comment out this failing line:
# confirmation = await asyncio.to_thread(
# self.bank.deposit_that_fails,
# data.target_account,
# data.amount,
# reference_id,
# )
return confirmation
except InvalidAccountError:
raise
except Exception:
activity.logger.exception("Deposit failed")
raise
Go back to activity.go and reverse the comments - comment out the failing line and uncomment the working line:
activity.go
func Deposit(ctx context.Context, data PaymentDetails) (string, error) {
log.Printf("Depositing $%d into account %s.\n\n",
data.Amount,
data.TargetAccount,
)
referenceID := fmt.Sprintf("%s-deposit", data.ReferenceID)
bank := BankingService{"bank-api.example.com"}
// Comment out this failing line:
// confirmation, err := bank.DepositThatFails(data.TargetAccount, data.Amount, referenceID)
// Uncomment this working line:
confirmation, err := bank.Deposit(data.TargetAccount, data.Amount, referenceID)
return confirmation, err
}
Go back to AccountActivityImpl.java and change activityShouldSucceed back to true:
AccountActivityImpl.java
public String deposit(PaymentDetails details) {
// Change this back to true to fix the bug:
boolean activityShouldSucceed = true;
// ... rest of your method
}
Go back to activities.ts and reverse the comments - comment out the failing line and uncomment the working line:
activities.ts
export async function deposit(details: PaymentDetails): Promise<string> {
// Uncomment this working line:
return await bank.deposit(details.targetAccount, details.amount, details.referenceId);
// Comment out this failing line:
// return await bank.depositThatFails(details.targetAccount, details.amount, details.referenceId);
}
Go back to Activities.cs and reverse the comments - comment out the failing line and uncomment the working block:
MoneyTransferWorker/Activities.cs
[Activity]
public static async Task<string> DepositAsync(PaymentDetails details)
{
var bankService = new BankingService("bank2.example.com");
Console.WriteLine($"Depositing ${details.Amount} into account {details.TargetAccount}.");
// Comment out this failing line:
// return await bankService.DepositThatFailsAsync(details.TargetAccount, details.Amount, details.ReferenceId);
// Uncomment this working block:
try
{
return await bankService.DepositAsync(details.TargetAccount, details.Amount, details.ReferenceId);
}
catch (Exception ex)
{
throw new ApplicationFailureException("Deposit failed", ex);
}
}
Go back to activities.rb and comment out the failing line:
activities.rb
def deposit(details)
# Comment out this problematic line:
# result = 100 / 0 # This will cause a divide-by-zero error
# Your existing deposit logic here...
end
Save your changes. You've now restored the working implementation. The key insight here is that you can deploy fixes to Activities while Workflows are still executing - Temporal will pick up your changes on the next retry attempt.
Step 5: Restart Worker
To apply your fix, you need to restart the Worker process so it picks up the code changes. Since the Workflow execution state is stored in Temporal's servers (not in your Worker process), restarting the Worker won't affect the running Workflow.
# Stop the current Worker
Ctrl+C
# Start it again with the fix
python run_worker.py
On the next retry attempt, your fixed deposit() Activity will succeed, and you'll see the completed transaction in Terminal 3:
Transfer complete.
Withdraw: {'amount': 250, 'receiver': '43-812', 'reference_id': '1f35f7c6-4376-4fb8-881a-569dfd64d472', 'sender': '85-150'}
Deposit: {'amount': 250, 'receiver': '43-812', 'reference_id': '1f35f7c6-4376-4fb8-881a-569dfd64d472', 'sender': '85-150'}
# Stop the current Worker
Ctrl+C
# Start it again with the fix
go run worker/main.go
On the next retry attempt, your fixed Deposit() Activity will succeed, and you'll see the completed transaction in your starter terminal:
Transfer complete (transaction IDs: W1779185060, D1779185060)
# Stop the current Worker
Ctrl+C
# Start it again with the fix
mvn clean install -Dorg.slf4j.simpleLogger.defaultLogLevel=info 2>/dev/null
mvn compile exec:java -Dexec.mainClass="moneytransferapp.MoneyTransferWorker" -Dorg.slf4j.simpleLogger.defaultLogLevel=warn
On the next retry attempt, your fixed deposit() Activity will succeed:
Depositing $32 into account 872878204.
[ReferenceId: d3d9bcf0-a897-4326]
[d3d9bcf0-a897-4326] Transaction succeeded.
# Stop the current Worker
Ctrl+C
# Start it again with the fix
npm run worker
On the next retry attempt, your fixed deposit() Activity will succeed, and you'll see the completed transaction in your client terminal:
Transfer complete (transaction IDs: W3436600150, D9270097234)
# Stop the current Worker
Ctrl+C
# Start it again with the fix
dotnet run --project MoneyTransferWorker
On the next retry attempt, your fixed DepositAsync() Activity will succeed, and you'll see the completed transaction in your client terminal:
Workflow result: Transfer complete (transaction IDs: W-caa90e06-3a48-406d-86ff-e3e958a280f8, D-1910468b-5951-4f1d-ab51-75da5bba230b)
# Stop the current Worker
Ctrl+C
# Start it again with the fix
bundle exec ruby worker.rb
On the next retry attempt, your fixed deposit Activity will succeed, and you'll see the Workflow complete successfully.
Check the Web UI - your Workflow shows as completed. You've just demonstrated Temporal's key differentiator: the ability to fix production bugs in running applications without losing transaction state or progress. This is possible because Temporal stores execution state separately from your application code.
Mission Accomplished. You have just fixed a bug in a running application without losing the state of the Workflow or restarting the transaction.
Real-World Scenario: Try this advanced experiment:
- Change the retry policy in
workflows.pyto only retry 1 time - Introduce a bug that triggers the refund logic
- Watch the Web UI as Temporal automatically executes the compensating transaction
Question to consider: How would you handle this scenario in a traditional microservices architecture?
Summary: What You Accomplished
Congratulations! You've experienced firsthand why Temporal is a game-changer for reliable applications. Here's what you demonstrated:
What You Learned
Crash-Proof Execution
You killed a Worker mid-transaction and watched Temporal recover seamlessly. Traditional applications would lose this work entirely, requiring complex checkpointing and recovery logic.
Live Production Debugging
You fixed a bug in running code without losing any state. Most systems require you to restart everything, losing all progress and context.
Automatic Retry Management
Temporal handled retries according to your configured policy, without cluttering your business logic with error-handling code.
Complete Observability
The Web UI gave you full visibility into every step, retry attempt, and state transition. No more debugging mysterious failures.
Summary
Advanced Challenges
Try these advanced scenarios:
- Modify the retry policy in
workflows.pyto only retry 1 time - Force the deposit to fail permanently
- Watch the automatic refund execute
Mission objective: Prove that Temporal can handle complex business logic flows even when things go wrong.
- Start a long-running Workflow
- Disconnect your network (or pause the Temporal Server container)
- Reconnect after 30 seconds
Mission objective: Demonstrate Temporal's resilience to network failures.
Knowledge Check
Test your understanding of what you just experienced:
Q: Why do we use a shared constant for the Task Queue name?
Answer: Because the Task Queue name connects your Workflow starter to your Worker. If they don't match exactly, your Worker will never see the Workflow tasks, and execution will stall indefinitely.
Real-world impact: This is like having the wrong radio frequency - your messages never get delivered.
Q: What happens when you modify Activity code for a running Workflow?
Answer: You must restart the Worker to load the new code. The Workflow will continue from where it left off, but with your updated Activity logic.
Real-world impact: This enables hot-fixes in production without losing transaction state.