Preventing Kubernetes node disruption for Dagster jobs

Last updated: May 6, 2025

When running Dagster on Kubernetes with Karpenter as the cluster autoscaler, you may encounter DagsterExecutionInterruptedError if Karpenter terminates nodes running Dagster jobs. Here's how to prevent this disruption.

Using Karpenter's do-not-disrupt annotation

Since Dagster runs are executed as Kubernetes Jobs (which don't support PodDisruptionBudgets), you can use Karpenter's pod-level annotation to prevent node disruption. Add the annotation karpenter.sh/do-not-disrupt: "true" to your run configuration.

Configuration Options

You can add this configuration at different levels depending on your needs:

Deployment-wide configuration

To apply to all runs in your deployment, configure the workspace.RunK8sConfig in your Helm values:

workspace:
  runK8sConfig:
    podTemplateSpecMetadata:
      annotations:
        karpenter.sh/do-not-disrupt: "true"

Code location-specific configuration

To apply only to specific code locations, add the configuration to your location's container context in your deployment configuration.

Job-specific configuration

For individual ops or jobs, use the dagster-k8s/config tag in your code:

@job(
    tags={
        "dagster-k8s/config": {
            "pod_template_spec_metadata": {
                "annotations": {
                    "karpenter.sh/do-not-disrupt": "true"
                }
            }
        }
    }
)
def my_job():
    ...

Choose the configuration level that best matches your needs. Deployment-wide configuration is recommended if you want to protect all Dagster jobs from node disruption.