Job not retrying after an unexpected termination

Last updated: October 9, 2025

Problem

When a job fails due to an unexpected termination, you may notice that the retry policy is not applied as expected. This behavior occurs due to the default settings when both op-level and run-level retries are enabled.

You may see the following in your logs:

  • Label: dagster/failure_reason:UNEXPECTED_TERMINATION

  • Error: dagster._core.errors.DagsterExecutionInterruptedError

Cause

By default, when both op-level and run-level retries are enabled, the system prioritizes op-level retries. This means that if a run fails due to an op failure, the run-level retry policy is bypassed.

Solutions

To ensure that your job retries even in cases of unexpected termination, you have two options: you can override the behavior for all jobs in the deployment, or only override it for specific jobs.

Override the default behavior for all jobs in the deployment

You can change this behavior for all jobs in your deployment:

  1. Go to your Deployment's settings.

  2. Set retry_on_asset_or_op_failure to false

Override the default behavior for specific jobs

If you want to control this behavior on a per-job basis, you can use a job tag:

@job(
    tags={"dagster/retry_on_asset_or_op_failure": "false"}
)
def my_job():
    ...