Configuration
The quickest way to get started is to generate starter templates with the CLI:
deep-code generate-config # writes to current directory
deep-code generate-config -o ./configs # custom output folder
This creates dataset_config.yaml and workflow_config.yaml with all supported fields and placeholder values. Fill them in, then run deep-code publish.
The sections below document every field in those templates.
Dataset config (YAML)
# Required
dataset_id: your-dataset.zarr
collection_id: your-collection
license_type: CC-BY-4.0
# Optional
osc_themes: [cryosphere] # must match slugs at opensciencedata.esa.int/themes/catalog
osc_region: global
dataset_status: completed # ongoing | completed | planned (default: ongoing)
documentation_link: https://example.com/docs
access_link: s3://bucket/your-dataset.zarr # defaults to s3://deep-esdl-public/{dataset_id}
# CF parameter overrides (list of {name, units, ...} dicts)
cf_parameter:
- name: sea_surface_temperature
units: kelvin
# Optional: publish a STAC Catalog + Item next to the data on S3.
# When set, a lightweight STAC hierarchy (catalog.json → item.json) is written
# directly to S3 and a "via" link is added to the OSC collection pointing to it.
# S3 write credentials are resolved in order:
# 1. STAC_S3_KEY / STAC_S3_SECRET (STAC-specific, any bucket)
# 2. AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY
# 3. boto3 default chain (IAM role, ~/.aws/credentials)
stac_catalog_s3_root: s3://bucket/stac/your-collection/
Field reference
| Field | Required | Description |
|---|---|---|
dataset_id |
Yes | Zarr store identifier (used to open the dataset). |
collection_id |
Yes | Unique ID for the STAC collection in the OSC catalog. |
license_type |
Yes | SPDX license identifier (e.g. CC-BY-4.0). |
osc_themes |
No | List of OSC theme slugs (e.g. [cryosphere, oceans]). |
osc_region |
No | Geographical region label (default: Global). |
dataset_status |
No | One of ongoing, completed, or planned (default: ongoing). |
access_link |
No | Public S3 URL of the Zarr store. Defaults to s3://deep-esdl-public/{dataset_id}. |
documentation_link |
No | URL to dataset documentation. |
cf_parameter |
No | List of CF metadata dicts to override variable attributes (e.g. name, units). |
stac_catalog_s3_root |
No | S3 root for the dataset-level STAC Catalog/Item. See STAC Catalog on S3. |
STAC Catalog on S3
Setting stac_catalog_s3_root generates a two-file STAC hierarchy on S3 alongside
the data:
s3://bucket/stac/your-collection/
├── catalog.json # STAC Catalog (root)
└── your-collection/
└── item.json # STAC Item covering the full Zarr store
The item has two assets:
zarr-data— points to the Zarr store (application/vnd+zarr).zarr-consolidated-metadata— points to.zmetadata(application/json).
The OSC collection gains a via link to catalog.json so STAC-aware clients
can discover the data path. rel="child" is intentionally avoided because the
OSC validator requires every child link to resolve inside the metadata repository.
S3 credentials for writing the STAC catalog are resolved in this order:
STAC_S3_KEY / STAC_S3_SECRET env vars (STAC-specific, can target any bucket),
then AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY,
then the boto3 default chain (IAM role, ~/.aws/credentials).
Workflow config (YAML)
# Required
workflow_id: your-workflow
properties:
title: "My workflow"
description: "What this workflow does"
keywords: ["Earth Science"]
themes: ["cryosphere"]
license: proprietary
# jupyter_kernel_info is optional — only published when jupyter_notebook_url is set
jupyter_kernel_info:
name: deepesdl-xcube-1.8.3
python_version: 3.11
env_file: https://example.com/environment.yml
# Optional
jupyter_notebook_url: https://github.com/org/repo/path/to/notebook.ipynb
contact:
- name: Jane Doe
organization: Example Org
links:
- rel: about
type: text/html
href: https://example.org
links:
- rel: related
type: text/html
href: https://example.com/related-resource
title: Related resource
Field reference
| Field | Required | Description |
|---|---|---|
workflow_id |
Yes | Unique identifier for the workflow (spaces converted to hyphens, lowercased). |
properties.title |
Yes | Human-readable title. |
properties.description |
No | Short summary of what the workflow does. |
properties.keywords |
No | List of keyword strings. |
properties.themes |
No | List of OSC theme slugs. |
properties.license |
No | License identifier (e.g. proprietary, CC-BY-4.0). |
jupyter_notebook_url |
No | Link to the source notebook on GitHub. When omitted, kernel and application links are skipped. |
properties.jupyter_kernel_info |
No | Kernel name, Python version, and environment file URL. Only published when jupyter_notebook_url is set. |
contact |
No | List of contact objects with name, organization, and links. |
links |
No | Additional OGC API record links (e.g. related, describedby). |
More templates and examples live in dataset_config.yaml, workflow_config.yaml, and example-config/.