Configuration
A ftm-datalake archive can be configured via environment variables or a yaml configuration file. Individual datasets within the archive can have their own configuration, which actually enables creating an archive with different storage configurations per dataset.
Using environment vars
Simply point to a local base folder containing the archive:
LEAKRFC_URI=./data/
Or point to a (local or remote) yaml configuration (see below):
LEAKRFC_URI=https://data.example.org/archive.yml
More granular config with more env vars. ftm-datalake uses pydantic-settings to parse the configuration. Nested configuration keys can be accessed via __ delimiter.
LEAKRFC_ARCHIVE__URI=s3://ftm-datalake
LEAKRFC_ARCHIVE__PUBLIC_URL=https://cdn.example.org/{dataset}/{key}
LEAKRFC_ARCHIVE__STORAGE__READONLY=true
YAML config
Create a base config and enable it via LEAKRFC_URI=ftm-datalake.yml:
Within the local archive, one dataset could be actually living in the cloud:
./archive/remote_dataset/.ftm-datalake/config.yml:
This means, the local folder ./archive/remote_dataset/ would only contain this yaml configuration and use the remote contents of the dataset.