Skip to content

Fill nan values

fill_masked_data#

def fill_nan_values(ds: Dict[str, np.ndarray], vars: List[str], method: str = 'mean', const: Union[float, str, bool] = None) ->  Union[Dict[str, np.ndarray], xr.Dataset]

Description#

The fill_nan_values function fills NaN values in the dataset using a specified method. The methods available are 'mean', 'noise', or 'constant'. Depending on the method, NaN values are replaced with the mean of non-NaN values, random noise within the range of non-NaN values, or a specified constant value. In some cases in certain areas no values are intended (e.g. where mask values False). To incorporate samples containing boundaries (like coastlines in ESDC), the fill_masked_data function can be utilized to prepare the data for masked machine learning. This approach is demonstrated in this jupyter notebook.

Parameters#

  • ds (Union[Dict[str, numpy.ndarray], xarray.Dataset]): The dataset to fill. It should be a dictionary or xarray.Dataset where keys are variable names and with the values containing the data to fill.
  • vars (List[str]): The list of variables for which to fill NaN values. These variables should be present in the dataset.
  • method (str): The method to use for filling NaN values. Options are mean, sample_mean, noise, constant, or None.
  • None: NaNs are not filled.
  • mean: NaNs are filled with the mean value of the non-NaN values
  • sample_mean: NaNs are filled with the sample mean value.
  • noise: NaN are filled with random noise within the range of the non-NaN values.
  • constant: NaNs are filled with the specified constant value (const).
  • const (Union[float, str, bool]): The constant value to use for filling NaN values when the method is 'constant'. This parameter is required when the method is 'constant'.#

Returns#

  • Union[Dict[str, numpy.ndarray], xarray.Dataset]: The dataset with NaN values filled, where keys are variable names and values are NumPy arrays with filled data.

Example#

import numpy as np
from ml4xcube.preprocessing import fill_nan_values

# Example dataset
ds = {
    'temperature': np.random.rand(10, 20, 30),
    'precipitation': np.random.rand(10, 20, 30)
}

# Introduce some NaN values
ds['temperature'][0, 0, 0] = np.nan
ds['precipitation'][1, 1, 1] = np.nan

# Fill NaN values using the mean method
filled_ds_mean = fill_nan_values(ds, vars=['temperature', 'precipitation'], method='mean')

# Fill NaN values using the noise method
filled_ds_noise = fill_nan_values(ds, vars=['temperature', 'precipitation'], method='noise')

# Fill NaN values using a constant value
filled_ds_constant = fill_nan_values(ds, vars=['temperature', 'precipitation'], method='constant', const=0.0)

In this example, the fill_nan_values function fills the NaN values in the dataset using different methods: 'mean', 'noise', and 'constant'.

Notes#

  • The vars parameter specifies the list of variables for which to fill NaN values. Ensure these variables exist in the dataset.
  • When using the 'constant' method, the const parameter must be provided to specify the constant value for filling NaNs.
  • The function handles both single-dimensional and multi-dimensional arrays for filling NaN values.