Skip to content

Configuration

The quickest way to get started is to generate starter templates with the CLI:

deep-code generate-config              # writes to current directory
deep-code generate-config -o ./configs # custom output folder

This creates dataset_config.yaml and workflow_config.yaml with all supported fields and placeholder values. Fill them in, then run deep-code publish.

The sections below document every field in those templates.


Dataset config (YAML)

# Required
dataset_id: your-dataset.zarr
collection_id: your-collection       # no spaces — use hyphens
license_type: CC-BY-4.0
stac_catalog_s3_root: s3://bucket/stac/your-collection/

# Optional
osc_themes: [cryosphere]        # must match slugs at opensciencedata.esa.int/themes/catalog — auto-lowercased
osc_region: global
dataset_status: completed       # ongoing | completed | planned (default: ongoing)
documentation_link: https://example.com/docs
visualisation_link: https://example.com/viewer   # URL to a visualisation of the dataset
osc_project: deep-earth-system-data-lab          # defaults to deep-earth-system-data-lab
access_link: s3://bucket/your-dataset.zarr   # defaults to s3://deep-esdl-public/{dataset_id}

# CF parameter overrides (list of {name, units, ...} dicts)
cf_parameter:
  - name: sea_surface_temperature
    units: kelvin

Field reference

Field Required Description
dataset_id Yes Zarr store identifier (used to open the dataset).
collection_id Yes Unique ID for the STAC collection in the OSC catalog. Must not contain spaces — use hyphens as word separators (e.g. My-Dataset-2024).
license_type Yes SPDX license identifier (e.g. CC-BY-4.0). Publishing fails if this field is absent.
osc_themes No List of OSC theme slugs (e.g. [cryosphere, oceans]). Values are automatically lowercased so Land and land are equivalent.
osc_region No Geographical region label (default: Global).
dataset_status No One of ongoing, completed, or planned (default: ongoing).
access_link No Public S3 URL of the Zarr store. Defaults to s3://deep-esdl-public/{dataset_id}.
description No Human-readable description of the dataset. Overrides the description attribute in the Zarr store; falls back to "No description available." if neither is set.
documentation_link No URL to dataset documentation.
visualisation_link No URL to a visualisation of the dataset (e.g. xcube Viewer, WMS). Added as a visualisation link with title "Dataset visualisation".
osc_project No OSC project ID this dataset belongs to (e.g. deep-earth-system-data-lab). Defaults to deep-earth-system-data-lab.
cf_parameter No List of CF metadata dicts to override variable attributes (e.g. name, units).
stac_catalog_s3_root Yes S3 root where the STAC Catalog and Item are published. Publishing fails if this field is absent. See STAC Catalog on S3.

STAC Catalog on S3

stac_catalog_s3_root is required. deep-code writes a two-file STAC hierarchy to S3 alongside the data:

s3://bucket/stac/your-collection/
├── catalog.json        # STAC Catalog (root)
└── your-collection/
    └── item.json       # STAC Item covering the full Zarr store

The item has two assets:

  • zarr-data — points to the Zarr store (application/vnd+zarr).
  • zarr-consolidated-metadata — points to .zmetadata (application/json).

The OSC collection gains a via link to catalog.json so STAC-aware clients can discover the data path. rel="child" is intentionally avoided because the OSC validator requires every child link to resolve inside the metadata repository.

S3 credentials for writing the STAC catalog are resolved in this order: STAC_S3_KEY / STAC_S3_SECRET env vars (STAC-specific, can target any bucket), then AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY, then the boto3 default chain (IAM role, ~/.aws/credentials).

Workflow config (YAML)

# Required
workflow_id: your-workflow
properties:
  title: "My workflow"
  description: "What this workflow does"
  keywords: ["Earth Science"]
  themes: ["cryosphere"]
  license: proprietary
  # jupyter_kernel_info is optional — only published when jupyter_notebook_url is set
  jupyter_kernel_info:
    name: deepesdl-xcube-1.8.3
    python_version: 3.11
    env_file: https://example.com/environment.yml

# Optional
jupyter_notebook_url: https://github.com/org/repo/path/to/notebook.ipynb
contact:
  - name: Jane Doe
    organization: Example Org
    links:
      - rel: about
        type: text/html
        href: https://example.org
links:
  - rel: related
    type: text/html
    href: https://example.com/related-resource
    title: Related resource

Field reference

Field Required Description
workflow_id Yes Unique identifier for the workflow (spaces converted to hyphens, lowercased).
properties.title Yes Human-readable title.
properties.description No Short summary of what the workflow does.
properties.keywords No List of keyword strings.
properties.themes No List of OSC theme slugs.
properties.license Yes SPDX license identifier (e.g. CC-BY-4.0, proprietary). Publishing fails if this field is absent.
jupyter_notebook_url No Link to the source notebook on GitHub. When omitted, kernel and application links are skipped.
properties.jupyter_kernel_info No Kernel name, Python version, and environment file URL. Only published when jupyter_notebook_url is set.
contact No List of contact objects with name, organization, and links.
links No Additional OGC API record links (e.g. related, describedby).

More templates and examples live in dataset_config.yaml, workflow_config.yaml, and example-config/.