dymad.io.trajectory_manager¶
Classes
|
A class to manage trajectory data loading, preprocessing, and dataloader creation. |
|
A class to manage trajectory data loading, preprocessing, and dataloader creation - graph version. |
- class dymad.io.trajectory_manager.TrajectoryManager(metadata, data_key=None, device=device(type='cpu'))¶
Bases:
objectA class to manage trajectory data loading, preprocessing, and dataloader creation.
The workflow includes:
Loading raw data from a binary file.
Preprocessing (trimming trajectories, subsetting, etc.).
Creating a dataset.
Normalizing and transforming the data using specified transformations.
Creating a dataloader.
The class is configured via a YAML configuration file.
- Parameters:
metadata (dict) – Configuration dictionary.
mode (str) – Dataset to read, one of ‘train’, ‘valid’, ‘test’.
device (torch.device) – Torch device to use.
- apply_data_transformations()¶
Apply data transformations to the loaded trajectories and control inputs. This creates the dataset.
This method applies transformations defined in the configuration for x, y, u, p
- Return type:
None
- create_dataloaders(*, typed=False)¶
Create dataloaders for the data set.
- Return type:
None
- create_regular_series_dataset(indices=None)¶
Expose the first typed data seam for regular trajectory preprocessing.
- Return type:
list[RegularSeries]
- data_truncation()¶
Truncate the loaded data according to the configuration.
- Return type:
None
- This includes:
Subsetting the number of trajectories and horizon (n_steps).
Populating basic metadata (dt, tf, shapes, etc.).
- load_data()¶
Load raw data from a binary file.
- Return type:
dict
- The file is assumed to store (in order):
x: array-like or list of array-like, shape (n_samples, n_state_features) data. If data contains multiple trajectories, x should be a list containing data for each trajectory. Individual trajectories may contain different numbers of samples.
t: float, numpy array of shape (n_samples,), or list of numpy arrays If t is a float, it specifies the timestep between each sample. If array-like, it specifies the time (seconds in physical time) at which each sample was collected. In this case the values in t must be strictly increasing. In the case of multi-trajectory data, t may also be a list of arrays containing the collection times for each individual trajectory.
u: array-like or list of array-like, shape (n_samples, n_control_features), optional (default None) Control variables/inputs. If data contains multiple trajectories (i.e. if x is a list of array-like), then u should be a list containing control variable data for each trajectory. Individual trajectories may contain different numbers of samples.
- prepare_data()¶
Handy function to load and truncate data in one call.
- Return type:
None
- process_all(*, typed=False)¶
- Returns:
dataloader, dataset, metadata
- Return type:
A tuple containing
- process_data(*, typed=False)¶
Latter half of process_all
- Return type:
tuple[Union[DataLoader[RegularTrainerBatch],DataLoader[GraphTrainerBatch]],list[RegularSeries] |list[GraphSeries],dict]
- set_data_index(index=None)¶
Set the data index for this TrajectoryManager.
- Return type:
None
- set_transforms(metadata=None, trajmgr=None)¶
- Return type:
None
- update_config(config)¶
Update the configuration metadata. After this step, data transformations need to be refitted.
- Return type:
None
- class dymad.io.trajectory_manager.TrajectoryManagerGraph(metadata, data_key='train', device=device(type='cpu'), adj=None)¶
Bases:
TrajectoryManagerA class to manage trajectory data loading, preprocessing, and dataloader creation - graph version.
The graph data is assumed to be homogeneous, that each node has the same number of features. Hence the normalization, if done, is applied globally to all nodes.
However, the number of edges can vary over time, and hence other quantities defined on edges.
In the raw data, the nodal state features are expected to be concatenated sequentially. For example, for N nodes with M features each, the raw data for states at a time step is
\[x = [x_1, x_2, ..., x_N], \text{where } x_i \in R^M,\]Same applies to other data members, if present.
- Parameters:
metadata (dict) – Configuration dictionary.
device (torch.device) – Torch device to use.
adj (torch.Tensor or np.ndarray, optional) – Adjacency matrix for GNN models. If not provided, will try to get from config.
- apply_data_transformations()¶
Apply data transformations to the loaded trajectories and control inputs. This creates the dataset.
The raw data is expected to be [T, n_nodes * n_features], but the transformation assumes [T * n_nodes, n_features]. So extra reshaping is needed.
- Return type:
None
- create_dataloaders(*, typed=False)¶
For graph data, we aggregate the trajectories into batches of graphs.
- Return type:
None
- create_graph_series_dataset(indices=None)¶
Expose the typed graph-series seam for graph trajectory preprocessing.
- Return type:
list[GraphSeries]
- data_truncation()¶
Truncate the loaded data according to the configuration.
- Return type:
None
- This includes:
Subsetting the number of trajectories and horizon (n_steps).
Populating basic metadata (dt, tf, shapes, etc.).
- load_data()¶
Load raw data from a binary file.
- Return type:
dict
- The file is assumed to store (in order):
x: array-like or list of array-like, shape (n_samples, n_state_features) data. If data contains multiple trajectories, x should be a list containing data for each trajectory. Individual trajectories may contain different numbers of samples.
t: float, numpy array of shape (n_samples,), or list of numpy arrays If t is a float, it specifies the timestep between each sample. If array-like, it specifies the time (seconds in physical time) at which each sample was collected. In this case the values in t must be strictly increasing. In the case of multi-trajectory data, t may also be a list of arrays containing the collection times for each individual trajectory.
u: array-like or list of array-like, shape (n_samples, n_control_features), optional (default None) Control variables/inputs. If data contains multiple trajectories (i.e. if x is a list of array-like), then u should be a list containing control variable data for each trajectory. Individual trajectories may contain different numbers of samples.
- set_transforms(metadata=None, trajmgr=None)¶
- Return type:
None