Configuration
The quickest way to get started is to generate starter templates with the CLI:
deep-code generate-config # writes to current directory
deep-code generate-config -o ./configs # custom output folder
This creates dataset_config.yaml and workflow_config.yaml with all supported fields and placeholder values. Fill them in, then run deep-code publish.
The sections below document every field in those templates.
Dataset config (YAML)
# Required
dataset_id: your-dataset.zarr
collection_id: your-collection # no spaces — use hyphens
license_type: CC-BY-4.0
stac_catalog_s3_root: s3://bucket/stac/your-collection/
# Optional
osc_themes: [cryosphere] # must match slugs at opensciencedata.esa.int/themes/catalog — auto-lowercased
osc_region: global
dataset_status: completed # ongoing | completed | planned (default: ongoing)
documentation_link: https://example.com/docs
visualisation_link: https://example.com/viewer # URL to a visualisation of the dataset
osc_project: deep-earth-system-data-lab # defaults to deep-earth-system-data-lab
access_link: s3://bucket/your-dataset.zarr # defaults to s3://deep-esdl-public/{dataset_id}
# CF parameter overrides (list of {name, units, ...} dicts)
cf_parameter:
- name: sea_surface_temperature
units: kelvin
Field reference
| Field | Required | Description |
|---|---|---|
dataset_id |
Yes | Zarr store identifier (used to open the dataset). |
collection_id |
Yes | Unique ID for the STAC collection in the OSC catalog. Must not contain spaces — use hyphens as word separators (e.g. My-Dataset-2024). |
license_type |
Yes | SPDX license identifier (e.g. CC-BY-4.0). Publishing fails if this field is absent. |
osc_themes |
No | List of OSC theme slugs (e.g. [cryosphere, oceans]). Values are automatically lowercased so Land and land are equivalent. |
osc_region |
No | Geographical region label (default: Global). |
dataset_status |
No | One of ongoing, completed, or planned (default: ongoing). |
access_link |
No | Public S3 URL of the Zarr store. Defaults to s3://deep-esdl-public/{dataset_id}. |
description |
No | Human-readable description of the dataset. Overrides the description attribute in the Zarr store; falls back to "No description available." if neither is set. |
documentation_link |
No | URL to dataset documentation. |
visualisation_link |
No | URL to a visualisation of the dataset (e.g. xcube Viewer, WMS). Added as a visualisation link with title "Dataset visualisation". |
osc_project |
No | OSC project ID this dataset belongs to (e.g. deep-earth-system-data-lab). Defaults to deep-earth-system-data-lab. |
cf_parameter |
No | List of CF metadata dicts to override variable attributes (e.g. name, units). |
stac_catalog_s3_root |
Yes | S3 root where the STAC Catalog and Item are published. Publishing fails if this field is absent. See STAC Catalog on S3. |
STAC Catalog on S3
stac_catalog_s3_root is required. deep-code writes a two-file STAC hierarchy to S3 alongside the data:
s3://bucket/stac/your-collection/
├── catalog.json # STAC Catalog (root)
└── your-collection/
└── item.json # STAC Item covering the full Zarr store
The item has two assets:
zarr-data— points to the Zarr store (application/vnd+zarr).zarr-consolidated-metadata— points to.zmetadata(application/json).
The OSC collection gains a via link to catalog.json so STAC-aware clients
can discover the data path. rel="child" is intentionally avoided because the
OSC validator requires every child link to resolve inside the metadata repository.
S3 credentials for writing the STAC catalog are resolved in this order:
STAC_S3_KEY / STAC_S3_SECRET env vars (STAC-specific, can target any bucket),
then AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY,
then the boto3 default chain (IAM role, ~/.aws/credentials).
Workflow config (YAML)
# Required
workflow_id: your-workflow
properties:
title: "My workflow"
description: "What this workflow does"
keywords: ["Earth Science"]
themes: ["cryosphere"]
license: proprietary
# jupyter_kernel_info is optional — only published when jupyter_notebook_url is set
jupyter_kernel_info:
name: deepesdl-xcube-1.8.3
python_version: 3.11
env_file: https://example.com/environment.yml
# Optional
jupyter_notebook_url: https://github.com/org/repo/path/to/notebook.ipynb
contact:
- name: Jane Doe
organization: Example Org
links:
- rel: about
type: text/html
href: https://example.org
links:
- rel: related
type: text/html
href: https://example.com/related-resource
title: Related resource
Field reference
| Field | Required | Description |
|---|---|---|
workflow_id |
Yes | Unique identifier for the workflow (spaces converted to hyphens, lowercased). |
properties.title |
Yes | Human-readable title. |
properties.description |
No | Short summary of what the workflow does. |
properties.keywords |
No | List of keyword strings. |
properties.themes |
No | List of OSC theme slugs. |
properties.license |
Yes | SPDX license identifier (e.g. CC-BY-4.0, proprietary). Publishing fails if this field is absent. |
jupyter_notebook_url |
No | Link to the source notebook on GitHub. When omitted, kernel and application links are skipped. |
properties.jupyter_kernel_info |
No | Kernel name, Python version, and environment file URL. Only published when jupyter_notebook_url is set. |
contact |
No | List of contact objects with name, organization, and links. |
links |
No | Additional OGC API record links (e.g. related, describedby). |
More templates and examples live in dataset_config.yaml, workflow_config.yaml, and example-config/.