Job not retrying after an unexpected termination
Last updated: October 9, 2025
Problem
When a job fails due to an unexpected termination, you may notice that the retry policy is not applied as expected. This behavior occurs due to the default settings when both op-level and run-level retries are enabled.
You may see the following in your logs:
Label:
dagster/failure_reason:UNEXPECTED_TERMINATIONError:
dagster._core.errors.DagsterExecutionInterruptedError
Cause
By default, when both op-level and run-level retries are enabled, the system prioritizes op-level retries. This means that if a run fails due to an op failure, the run-level retry policy is bypassed.
Solutions
To ensure that your job retries even in cases of unexpected termination, you have two options: you can override the behavior for all jobs in the deployment, or only override it for specific jobs.
Override the default behavior for all jobs in the deployment
You can change this behavior for all jobs in your deployment:
Go to your Deployment's settings.
Set
retry_on_asset_or_op_failuretofalse
Override the default behavior for specific jobs
If you want to control this behavior on a per-job basis, you can use a job tag:
@job(
tags={"dagster/retry_on_asset_or_op_failure": "false"}
)
def my_job():
...