Assign block split
assign_block_split#
def assign_block_split(ds: xr.Dataset, block_size: List[Tuple[str, int]] = None, split: float = 0.8) -> xr.Dataset
Description#
Assigns blocks of data to training or testing sets based on a specified split ratio. This method uses a deterministic random seed generated from the indices of each block, ensuring that the same blocks are consistently assigned to the same subset across different runs, given the same initial conditions.
Parameters#
- ds (
xarray.Dataset
): The input dataset. - block_size (
List[Tuple[str, int]]
): List of tuples specifying the dimensions and their respective sizes for block division. IfNone
, chunk sizes are inferred from the dataset. - split (
float
): The fraction of data to assign to the training set. The remainder is assigned to the testing set. Default is0.8
.
Returns#
xarray.Dataset
: The dataset with an added 'split' variable that indicates whether each block belongs to the training set (1.
) or the testing set (0.
).
Example#
import numpy as np
import xarray as xr
from ml4xcube.data_split import assign_block_split
# Example dataset
data = xr.Dataset({'temperature': (('time', 'lat', 'lon'), np.random.rand(10, 2, 3))})
block_size = [('time', 5), ('lat', 2), ('lon', 3)]
split_dataset = assign_block_split(data, block_size)
print(split_dataset)
Block Assignment of Train/Test Split