Why are files not persisting between asset runs in cloud/serverless environments?
Last updated: October 27, 2025
In serverless/cloud environments, each job or asset materialization runs in a fresh execution container. The /tmp directory is ephemeral and gets wiped between runs or when containers restart.
To persist data between runs, use a proper Dagster resource like S3, database, or Redis instead of local file storage. Here's an example using S3:
from dagster import ConfigurableResource, Definitions
import boto3
import json
class FileTypeStorage(ConfigurableResource):
bucket_name: str
def read_file_types(self) -> dict:
s3 = boto3.client('s3')
try:
response = s3.get_object(Bucket=self.bucket_name, Key='egnyte_pending_file_types.json')
return json.loads(response['Body'].read())
except s3.exceptions.NoSuchKey:
return {"materialized_ts": None, "file_types": []}
def write_file_types(self, data: dict) -> None:
s3 = boto3.client('s3')
s3.put_object(
Bucket=self.bucket_name,
Key='egnyte_pending_file_types.json',
Body=json.dumps(data, indent=2)
)
# In your definitions
defs = Definitions(
assets=[egnyte_receiver_op, egnyte_file_type_processor],
sensors=[egnyte_file_type_sensor],
resources={
"file_type_storage": FileTypeStorage(bucket_name="your-bucket-name")
}
)Note that while execution containers are ephemeral, code location containers (where sensors and definitions live) are persistent and not restarted per asset/sensor/job/op.
For more information about Dagster resources and persistence, visit our Resources documentation.