Get chunk by index
get_chunk_by_index#
def get_chunk_by_index(ds: xr.Dataset, index: int, block_size: List[Tuple[str, int]] = None) -> Dict[str, np.ndarray]
Description#
get_chunk_by_index
retrieves a specific data chunk from an xarray.Dataset
based on a given linear index. This way the extraction of subsets is feasible. Further the chunks of a cube can be iterated.
Parameters#
- ds (
xarray.Dataset
): Thexarray.Dataset
from which to retrieve a chunk. - index (
int
): The index of the chunk to retrieve. This index will be converted to a multi-dimensional index based on the chunk sizes. - block_size (
List[Tuple[str, int]]
): An optional list of tuples specifying the block size for each dimension. Each tuple should contain a dimension name and a block size for that dimension. A chunk with the specified block sizes will be returned. If not provided, the function will use the dataset's default chunk sizes.
Returns#
Dict[str, np.ndarray]
: A dictionary where keys are variable names from the dataset and values are NumPy arrays containing the data of the specified chunk.
Example#
import numpy as np
import xarray as xr
from ml4xcube.utils import get_chunk_by_index
# Example dataset
data = np.random.rand(100, 200, 300)
ds = xr.Dataset({
'temperature': (('time', 'lat', 'lon'), data),
'precipitation': (('time', 'lat', 'lon'), data)
})
# Define the block size (chunk size)
block_size = [('time', 10), ('lat', 20), ('lon', 30)]
# Get the 5th chunk (index starts from 0)
chunk_data = get_chunk_by_index(ds, index=5, block_size=block_size)
# Output the chunk data
for var_name, chunk in chunk_data.items():
print(f"{var_name} chunk shape: {chunk.shape}")
get_chunk_by_index
function retrieves the 5th chunk from the dataset, using the specified block sizes for each dimension.