dynabench.dataset

Module for loading the data.

Functions

download_equation(equation, structure, ...)

Download a dataset and unpack it to the right place.

Classes

DynabenchIterator([split, equation, ...])

Iterator for the Dynabench dataset.

DynabenchSimulationIterator([split, ...])

EquationMovingWindowIterator(data_path, ...)

Iterator for arbitrary equations generated using the dynabench solver.

class dynabench.dataset.DynabenchIterator(split: str = 'train', equation: str = 'wave', structure: str = 'cloud', resolution: str = 'low', base_path: str = 'data', lookback: int = 1, rollout: int = 1, download: bool = False, *args, **kwargs)[source]

Bases: object

Iterator for the Dynabench dataset. This iterator will iterate over each simulation in the dataset, by moving a window over the simulation data. The window size is defined by the lookback and rollout parameters, which define the number of timesteps to be used as input and output, respectively.

Parameters:
  • split (str) – The split of the dataset to use. Can be “train”, “val” or “test”.

  • equation (str) – The equation to use. Can be “advection”, “burgers”, “gasdynamics”, “kuramotosivashinsky”, “reactiondiffustion” or “wave”.

  • structure (str) – The structure of the dataset. Can be “cloud” or “grid”.

  • resolution (str) – The resolution of the dataset. Can be low, medium, high or full. Low resolution corresponds to 225 points in total (aranged in a 15x15 grid for the grid structure). Medium resolution corresponds to 484 points in total (aranged in a 22x22 grid for the grid structure). High resolution corresponds to 900 points in total (aranged in a 30x30 grid for the grid structure). Full resolution uses the full simulation grid of shape (64x64) that has been used to numerically solve the simulations.

  • base_path (str) – Location where the data is stored. Defaults to “data”.

  • lookback (int) – Number of timesteps to use for the input data. Defaults to 1.

  • rollout (int) – Number of timesteps to use for the target data. Defaults to 1.

  • download (int) – Whether to download the data. Defaults to False.

class dynabench.dataset.DynabenchSimulationIterator(split: str = 'train', equation: str = 'wave', structure: str = 'cloud', resolution: str = 'low', base_path: str = 'data', download: bool = False, *args, **kwargs)[source]

Bases: object

class dynabench.dataset.EquationMovingWindowIterator(data_path: str, lookback: int, rollout: int)[source]

Bases: object

Iterator for arbitrary equations generated using the dynabench solver. Each sample returned by the __getitem__ method is a tuple of (data_input, data_target, points), where data_input is the input data of shape (L, F, H, W), data_target is the target data of shape (R, F, H, W), and points are the points in the grid of shape (H, W, 2). In this context L corresponds to the lookback parameter and R corresponds to the rollout parameter. H and W are the height and width of the grid, respectively. F is the number of variables in the equation system.

Parameters:
  • data_path (str) – Path to the data file in h5 format.

  • lookback (int) – Number of time steps to look back. This corresponds to the L parameter.

  • rollout (int) – Number of time steps to predict. This corresponds to the R parameter.

get_full_simulation_data()[source]

This method returns the full simulation data from the data file, along with the points in the grid.

Returns:

The data and the points. The data has shape (T, F, H, W) and the points have shape (H, W, 2), where T is the number of time steps, F is the number of variables, H and W are the height and width of the grid, respectively.

Return type:

np.ndarray, np.ndarray

dynabench.dataset.download_equation(equation: str, structure: str, resolution: str, data_dir: str = 'data', tmp_dir: str = 'tmp')[source]

Download a dataset and unpack it to the right place.

Parameters:
  • equation (str) – Name of the equation to download.

  • structure (str) – Description of how the observation points are structured. Can be “cloud” or “grid”.

  • resolution (str) – Resolution of the dataset. Can be “low”, “medium”, or “high”.

  • data_dir (str) – Directory where the dataset should be saved. Defaults to “data/”.

  • tmp_dir (str) – Directory where the temporary files should be saved. Defaults to “data/tmp/”. This directory will be deleted after the dataset is unpacked.

Return type:

None