Preventing Kubernetes node disruption for Dagster jobs
Last updated: May 6, 2025
When running Dagster on Kubernetes with Karpenter as the cluster autoscaler, you may encounter DagsterExecutionInterruptedError if Karpenter terminates nodes running Dagster jobs. Here's how to prevent this disruption.
Using Karpenter's do-not-disrupt annotation
Since Dagster runs are executed as Kubernetes Jobs (which don't support PodDisruptionBudgets), you can use Karpenter's pod-level annotation to prevent node disruption. Add the annotation karpenter.sh/do-not-disrupt: "true" to your run configuration.
Configuration Options
You can add this configuration at different levels depending on your needs:
Deployment-wide configuration
To apply to all runs in your deployment, configure the workspace.RunK8sConfig in your Helm values:
workspace:
runK8sConfig:
podTemplateSpecMetadata:
annotations:
karpenter.sh/do-not-disrupt: "true"Code location-specific configuration
To apply only to specific code locations, add the configuration to your location's container context in your deployment configuration.
Job-specific configuration
For individual ops or jobs, use the dagster-k8s/config tag in your code:
@job(
tags={
"dagster-k8s/config": {
"pod_template_spec_metadata": {
"annotations": {
"karpenter.sh/do-not-disrupt": "true"
}
}
}
}
)
def my_job():
...Choose the configuration level that best matches your needs. Deployment-wide configuration is recommended if you want to protect all Dagster jobs from node disruption.