Skip to content

Get chunk sizes

get_chunk_sizes#

def get_chunk_sizes(ds: xr.Dataset) -> List[Tuple[str, int]]

Description#

The maximum chunk sizes for all data variables in a given xarray.Dataset is determined. This allows to understand the chunking scheme of the dataset and for setting up consistent chunk sizes for processing.

Parameters#

  • ds (xarray.Dataset): The dataset for which the maximum chunk sizes are to be determined. The dataset should have dimensions that can be chunked.

Returns#

  • List[Tuple[str, int]]: A list of tuples where each tuple contains a dimension name (str) and its corresponding maximum chunk size (int) over all variables.

Example#

import numpy as np
import xarray as xr
from ml4xcube.utils import get_chunk_sizes

# Example dataset with chunking
data = np.random.rand(100, 200, 300)
ds = xr.Dataset({
    'temperature': (('time', 'lat', 'lon'), data),
    'precipitation': (('time', 'lat', 'lon'), data)
}).chunk({'time': 10, 'lat': 20, 'lon': 30})

# Get maximum chunk sizes
chunk_sizes = get_chunk_sizes(ds)
print(chunk_sizes)

In this example, the get_chunk_sizes function returns the chunk sizes for each dimension in the dataset.

Notes#

  • The function iterates over all data variables in the dataset and retrieves their chunk sizes.
  • If the variable has chunk sizes, it calculates the maximum chunk size for each dimension.
  • The returned list contains tuples with dimension names and their maximum chunk sizes, which can be used for further processing.