dynabench.dataset

Module for loading the data.

Functions

download_equation(equation, structure, ...)

Download a dataset and unpack it to the right place.

Classes

`BaseListMovingWindowIterator`(data_paths, ...)	Iterator for arbitrary equations generated using the dynabench solver.
`BaseListSimulationIterator`(data_paths, ...)	Iterates over full simulations.
`DynabenchIterator`(split, equation, ...)	Iterator for the Dynabench dataset.
`DynabenchSimulationIterator`(split, equation, ...)	Iterator for the Dynabench dataset.
`EquationMovingWindowIterator`(eq_dir, ...)	Iterator for arbitrary equations generated using the dynabench solver.
`EquationSimulationIterator`(eq_dir, ...)	Iterator for full equations generated using the dynabench solver.

class dynabench.dataset.BaseListMovingWindowIterator(data_paths: ~typing.List[str], lookback: int, rollout: int, squeeze_lookback_dim: bool = True, is_batched: bool = False, transforms: ~dynabench.dataset.transforms.BaseTransform | None = None, dtype: ~numpy.dtype = <class 'numpy.float32'>)[source]

Bases: object

Iterator for arbitrary equations generated using the dynabench solver. Each sample returned by the __getitem__ method is a tuple of (data_input, data_target, points), where data_input is the input data of shape (L, F, H, W), data_target is the target data of shape (R, F, H, W), and points are the points in the grid of shape (H, W, 2). In this context L corresponds to the lookback parameter and R corresponds to the rollout parameter. H and W are the height and width of the grid, respectively. F is the number of variables in the equation system.

Parameters:

data_paths (str) – List of paths to the files containing the simulation data.
lookback (int) – Number of time steps to look back. This corresponds to the L parameter.
rollout (int) – Number of time steps to predict. This corresponds to the R parameter.
squeeze_lookback_dim (bool) – Whether to squeeze the lookback dimension. Defaults to False. If lookback > 1 has no effect.
is_batched (bool) – Whether the data is batched. Defaults to False. If True, the data is expected to be of shape (B, L, F, H, W), where B is the batch size.
dtype (np.dtype) – Data type of the input data. Defaults to np.float32.

class dynabench.dataset.BaseListSimulationIterator(data_paths: ~typing.List[str], is_batched: bool = False, transforms: ~dynabench.dataset.transforms.BaseTransform | None = None, dtype: ~numpy.dtype = <class 'numpy.float32'>)[source]

Bases: object

Iterates over full simulations. Each sample returned by the __getitem__ method is a tuple of (data, points), where data is the simulation data of shape (T, F, H, W) and points are the points in the grid of shape (H, W, 2). In this context T corresponds to the number of time steps, H and W are the height and width of the grid, respectively. F is the number of variables in the equation system.

Parameters:

data_paths (str) – List of paths to the files containing the simulation data.
lookback (int) – Number of time steps to look back. This corresponds to the L parameter.
rollout (int) – Number of time steps to predict. This corresponds to the R parameter.

class dynabench.dataset.DynabenchIterator(split: str = 'train', equation: str = 'wave', structure: str = 'cloud', resolution: str = 'low', base_path: str = 'data', lookback: int = 1, squeeze_lookback_dim: bool = False, rollout: int = 1, transforms: ~dynabench.dataset.transforms.BaseTransform = DefaultTransform{}, dtype: ~numpy.dtype = <class 'numpy.float32'>, download: bool = False, *args, **kwargs)[source]

Bases: BaseListMovingWindowIterator

Iterator for the Dynabench dataset. This iterator will iterate over each simulation in the dataset, by moving a window over the simulation data. The window size is defined by the lookback and rollout parameters, which define the number of timesteps to be used as input and output, respectively.

Parameters:

split (str) – The split of the dataset to use. Can be “train”, “val” or “test”.
equation (str) – The equation to use. Can be “advection”, “burgers”, “gasdynamics”, “kuramotosivashinsky”, “reactiondiffustion” or “wave”.
structure (str) – The structure of the dataset. Can be “cloud” or “grid”.
resolution (str) – The resolution of the dataset. Can be low, medium, high or full. Low resolution corresponds to 225 points in total (aranged in a 15x15 grid for the grid structure). Medium resolution corresponds to 484 points in total (aranged in a 22x22 grid for the grid structure). High resolution corresponds to 900 points in total (aranged in a 30x30 grid for the grid structure). Full resolution uses the full simulation grid of shape (64x64) that has been used to numerically solve the simulations.
base_path (str) – Location where the data is stored. Defaults to “data”.
lookback (int) – Number of timesteps to use for the input data. Defaults to 1.
squeeze_lookback_dim (bool) – Whether to squeeze the lookback dimension. Defaults to False. If lookback > 1 has no effect.
rollout (int) – Number of timesteps to use for the target data. Defaults to 1.
download (int) – Whether to download the data. Defaults to False.

class dynabench.dataset.DynabenchSimulationIterator(split: str = 'train', equation: str = 'wave', structure: str = 'cloud', resolution: str = 'low', transforms: ~dynabench.dataset.transforms.BaseTransform = DefaultTransform{}, base_path: str = 'data', download: bool = False, dtype: ~numpy.dtype = <class 'numpy.float32'>, *args, **kwargs)[source]

Bases: object

Iterator for the Dynabench dataset. This iterator will iterate all the simulations in the dataset, returning the full simulation as a single sample.

Parameters:

split (str) – The split of the dataset to use. Can be “train”, “val” or “test”.
equation (str) – The equation to use. Can be “advection”, “burgers”, “gasdynamics”, “kuramotosivashinsky”, “reactiondiffustion” or “wave”.
structure (str) – The structure of the dataset. Can be “cloud” or “grid”.
resolution (str) – The resolution of the dataset. Can be low, medium, high or full. Low resolution corresponds to 225 points in total (aranged in a 15x15 grid for the grid structure). Medium resolution corresponds to 484 points in total (aranged in a 22x22 grid for the grid structure). High resolution corresponds to 900 points in total (aranged in a 30x30 grid for the grid structure). Full resolution uses the full simulation grid of shape (64x64) that has been used to numerically solve the simulations.
base_path (str) – Location where the data is stored. Defaults to “data”.
download (int) – Whether to download the data. Defaults to False.
dtype (np.dtype) – Data type of the input data. Defaults to np.float32.

class dynabench.dataset.EquationMovingWindowIterator(eq_dir: str, lookback: int, rollout: int, selected_simulations: ~typing.List[str] | None = None, squeeze_lookback_dim: bool = True, is_batched: bool = True, transforms: ~dynabench.dataset.transforms.BaseTransform | None = None, dtype: ~numpy.dtype = <class 'numpy.float32'>)[source]

Bases: BaseListMovingWindowIterator

Iterator for arbitrary equations generated using the dynabench solver. Each sample returned by the __getitem__ method is a tuple of (data_input, data_target, points), where data_input is the input data of shape (L, F, H, W), data_target is the target data of shape (R, F, H, W), and points are the points in the grid of shape (H, W, 2). In this context L corresponds to the lookback parameter and R corresponds to the rollout parameter. H and W are the height and width of the grid, respectively. F is the number of variables in the equation system.

Parameters:

eq_dir (str) – Path to the directory where the generated simulations are stored.
lookback (int) – Number of time steps to look back. This corresponds to the L parameter.
rollout (int) – Number of time steps to predict. This corresponds to the R parameter.
squeeze_lookback_dim (bool) – Whether to squeeze the lookback dimension. Defaults to False. If lookback > 1 has no effect.
is_batched (bool) – Whether the data is batched. Defaults to False. If True, the data is expected to be of shape (B, L, F, H, W), where B is the batch size.
dtype (np.dtype) – Data type of the input data. Defaults to np.float32.
selected_simulations (List[str]) – List of selected simulation names to load. If None, all simulations in the directory are loaded.

class dynabench.dataset.EquationSimulationIterator(eq_dir: str, selected_simulations: ~typing.List[str] | None = None, is_batched: bool = True, transforms: ~dynabench.dataset.transforms.BaseTransform | None = None, dtype: ~numpy.dtype = <class 'numpy.float32'>)[source]

Bases: BaseListSimulationIterator

Iterator for full equations generated using the dynabench solver. Each sample returned by the __getitem__ method is a tuple of (data_input, points), where data_input is the input data of shape (L, F, H, W), data_target is the target data of shape (R, F, H, W), and points are the points in the grid of shape (H, W, 2). In this context L corresponds to the lookback parameter and R corresponds to the rollout parameter. H and W are the height and width of the grid, respectively. F is the number of variables in the equation system.

Parameters:

eq_dir (str) – Path to the directory where the generated simulations are stored.
is_batched (bool) – Whether the data is batched. Defaults to False. If True, the data is expected to be of shape (B, L, F, H, W), where B is the batch size.
dtype (np.dtype) – Data type of the input data. Defaults to np.float32.
selected_simulations (List[str]) – List of selected simulation names to load. If None, all simulations in the directory are loaded.

dynabench.dataset.download_equation(equation: str, structure: str, resolution: str, data_dir: str = 'data', tmp_dir: str = 'tmp')[source]

Download a dataset and unpack it to the right place.

Parameters:

equation (str) – Name of the equation to download.
structure (str) – Description of how the observation points are structured. Can be “cloud” or “grid”.
resolution (str) – Resolution of the dataset. Can be “low”, “medium”, or “high”.
data_dir (str) – Directory where the dataset should be saved. Defaults to “data/”.
tmp_dir (str) – Directory where the temporary files should be saved. Defaults to “data/tmp/”. This directory will be deleted after the dataset is unpacked.

Return type:

None

Modules

transforms