Why are files not persisting between asset runs in cloud/serverless environments?

Last updated: October 27, 2025

In serverless/cloud environments, each job or asset materialization runs in a fresh execution container. The /tmp directory is ephemeral and gets wiped between runs or when containers restart.

To persist data between runs, use a proper Dagster resource like S3, database, or Redis instead of local file storage. Here's an example using S3:

  from dagster import ConfigurableResource, Definitions
  import boto3
  import json

  class FileTypeStorage(ConfigurableResource):
      bucket_name: str

      def read_file_types(self) -> dict:
          s3 = boto3.client('s3')
          try:
              response = s3.get_object(Bucket=self.bucket_name, Key='egnyte_pending_file_types.json')
              return json.loads(response['Body'].read())
          except s3.exceptions.NoSuchKey:
              return {"materialized_ts": None, "file_types": []}

      def write_file_types(self, data: dict) -> None:
          s3 = boto3.client('s3')
          s3.put_object(
              Bucket=self.bucket_name,
              Key='egnyte_pending_file_types.json',
              Body=json.dumps(data, indent=2)
          )

  # In your definitions
  defs = Definitions(
      assets=[egnyte_receiver_op, egnyte_file_type_processor],
      sensors=[egnyte_file_type_sensor],
      resources={
          "file_type_storage": FileTypeStorage(bucket_name="your-bucket-name")
      }
  )

Note that while execution containers are ephemeral, code location containers (where sensors and definitions live) are persistent and not restarted per asset/sensor/job/op.

For more information about Dagster resources and persistence, visit our Resources documentation.