Skip to content

Configuration

The quickest way to get started is to generate starter templates with the CLI:

deep-code generate-config              # writes to current directory
deep-code generate-config -o ./configs # custom output folder

This creates dataset_config.yaml and workflow_config.yaml with all supported fields and placeholder values. Fill them in, then run deep-code publish.

The sections below document every field in those templates.


Dataset config (YAML)

# Required
dataset_id: your-dataset.zarr
collection_id: your-collection
license_type: CC-BY-4.0

# Optional
osc_themes: [cryosphere]        # must match slugs at opensciencedata.esa.int/themes/catalog
osc_region: global
dataset_status: completed       # ongoing | completed | planned (default: ongoing)
documentation_link: https://example.com/docs
access_link: s3://bucket/your-dataset.zarr   # defaults to s3://deep-esdl-public/{dataset_id}

# CF parameter overrides (list of {name, units, ...} dicts)
cf_parameter:
  - name: sea_surface_temperature
    units: kelvin

# Optional: publish a STAC Catalog + Item next to the data on S3.
# When set, a lightweight STAC hierarchy (catalog.json → item.json) is written
# directly to S3 and a "via" link is added to the OSC collection pointing to it.
# S3 write credentials are resolved in order:
#   1. STAC_S3_KEY / STAC_S3_SECRET  (STAC-specific, any bucket)
#   2. AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY
#   3. boto3 default chain (IAM role, ~/.aws/credentials)
stac_catalog_s3_root: s3://bucket/stac/your-collection/

Field reference

Field Required Description
dataset_id Yes Zarr store identifier (used to open the dataset).
collection_id Yes Unique ID for the STAC collection in the OSC catalog.
license_type Yes SPDX license identifier (e.g. CC-BY-4.0).
osc_themes No List of OSC theme slugs (e.g. [cryosphere, oceans]).
osc_region No Geographical region label (default: Global).
dataset_status No One of ongoing, completed, or planned (default: ongoing).
access_link No Public S3 URL of the Zarr store. Defaults to s3://deep-esdl-public/{dataset_id}.
documentation_link No URL to dataset documentation.
cf_parameter No List of CF metadata dicts to override variable attributes (e.g. name, units).
stac_catalog_s3_root No S3 root for the dataset-level STAC Catalog/Item. See STAC Catalog on S3.

STAC Catalog on S3

Setting stac_catalog_s3_root generates a two-file STAC hierarchy on S3 alongside the data:

s3://bucket/stac/your-collection/
├── catalog.json        # STAC Catalog (root)
└── your-collection/
    └── item.json       # STAC Item covering the full Zarr store

The item has two assets:

  • zarr-data — points to the Zarr store (application/vnd+zarr).
  • zarr-consolidated-metadata — points to .zmetadata (application/json).

The OSC collection gains a via link to catalog.json so STAC-aware clients can discover the data path. rel="child" is intentionally avoided because the OSC validator requires every child link to resolve inside the metadata repository.

S3 credentials for writing the STAC catalog are resolved in this order: STAC_S3_KEY / STAC_S3_SECRET env vars (STAC-specific, can target any bucket), then AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY, then the boto3 default chain (IAM role, ~/.aws/credentials).

Workflow config (YAML)

# Required
workflow_id: your-workflow
properties:
  title: "My workflow"
  description: "What this workflow does"
  keywords: ["Earth Science"]
  themes: ["cryosphere"]
  license: proprietary
  # jupyter_kernel_info is optional — only published when jupyter_notebook_url is set
  jupyter_kernel_info:
    name: deepesdl-xcube-1.8.3
    python_version: 3.11
    env_file: https://example.com/environment.yml

# Optional
jupyter_notebook_url: https://github.com/org/repo/path/to/notebook.ipynb
contact:
  - name: Jane Doe
    organization: Example Org
    links:
      - rel: about
        type: text/html
        href: https://example.org
links:
  - rel: related
    type: text/html
    href: https://example.com/related-resource
    title: Related resource

Field reference

Field Required Description
workflow_id Yes Unique identifier for the workflow (spaces converted to hyphens, lowercased).
properties.title Yes Human-readable title.
properties.description No Short summary of what the workflow does.
properties.keywords No List of keyword strings.
properties.themes No List of OSC theme slugs.
properties.license No License identifier (e.g. proprietary, CC-BY-4.0).
jupyter_notebook_url No Link to the source notebook on GitHub. When omitted, kernel and application links are skipped.
properties.jupyter_kernel_info No Kernel name, Python version, and environment file URL. Only published when jupyter_notebook_url is set.
contact No List of contact objects with name, organization, and links.
links No Additional OGC API record links (e.g. related, describedby).

More templates and examples live in dataset_config.yaml, workflow_config.yaml, and example-config/.