Model Architecture¶

Author: Dr. Daning Huang
Date: 01/03/2026

Overview¶

DyMAD employs a highly modularized and compositional approach to define the model. Specifically, it takes the following architecture of encoder-dynamics-decoder.

Autoencoder¶

The encoder-decoder pair, or the autoencoder, maps physical states or observations (\(x\)) and/or inputs (\(u\)) to and from the latent states \(z\). It contains the following three types, and each type has GNN variants that process data on graph and possibly RNN variants that process sequential data.

Component	Full	State-only	None
Encoder	\(z = f_E(x, u)\)	\(z = f_E(x)\)	\(z = x\)
Decoder	\(x = f_D(z)\)	\(x = f_D(z)\)	\(x = z\)

Dynamics¶

The dynamics in latent space is defined as

\[ z' = f_C(z, u; f_F, f_P) \]

which consists of three components:

Component	Expression
Feature	\(s = f_F(z, u)\)
Processor	\(r = f_P(s, u)\)
Composer	\(z' = f_C(z, r)\)

The increased granularity allows more flexibility in the model definition.

Prediction method¶

In the model definition, we do not specify the meaning of \(z'\). It can mean either \(\dot{z}\) (for continuous-time/CT model) or \(z_{k+1}\) (for discrete-time/DT model). Eventually, it is the time integration method applied to the model that determines the CT/DT.

For example, if RK4 is used, \(z'\) would be treated as \(\dot{z}\) during training, and the resulting model is CT. If simple roll-out is used, the dynamics simply maps \(z_k\) to \(z_{k+1}\), and then the model will be DT.

There are three types of predictions:

Free in latent space: After the initial condition is mapped to the latent space (by encoder), the dynamics is predicted purely within the \(z\) space. After the prediction is done, the \(z\) trajectory is mapped back to \(x\) space (by decoder).
Projected dynamics: At every evaluation of the dynamics, a given \(z\) is decoded to \(x\) and encoded back to \(z\). One interpretation of this more complicated procedure is to constrain the dynamics on manifolds. In addition, if full autoencoder is used for dynamics with inputs, this prediction must be used, so that new inputs can be factored into \(z\) dynamics.
Exponential propagator: For autonomous linear systems in the latent space, the time integration is simply the evaluation of matrix exponentials, which is implemented as a special case to maximize the computational efficiency.

Diagram¶

The computational graph of the full model is illustrated below.

%run ./figures.ipynb   # Load the figure

model_arch

../_images/f728663b714b2258979c97a5673d3a64f1a1a11c8b153fc98193f029fdcb6ba3.svg

Model Families¶

In DyMAD, for each of encoder/ decoder/ feature/ processor/ composers, several typical options are implemented; the users can implement their own as well. Furthermore, for each option, different architectures of neural networks (MLP, GNN, RNN, etc.) can be used. As long as the data dimensions match, the options/architectures can be combined arbitrarily; this can result in combinatorially many models.

For the ease of usage, there are plenty of predefined models, as explained below.

Latent/Sequential dynamics model (LDM/SDM)¶

The LDM maps the observations \(x\) and inputs \(u\) to a latent space \(z\), learns the dynamics in \(z\), and extracts \(x\) from predicted \(z\). The SDM is similar to LDM, but the observation can be a time-delayed sequence of length \(T\), which for clarity we denote \(x_{1:T}\); similarly for inputs \(u_{1:T}\).

Both LDM and SDM use full autoencoder (i.e., encoding \(x\) and \(u\) simultaneously), with the following components

Component	Expression	LDM	SDM
Encoder	\(z = f_E(x, u)\)	\(z = f_E(x, u)\)	\(z_{1:T} = f_E(x_{1:T}, u_{1:T})\)
Feature	\(s = f_F(z, u)\)	\(s = z\)	\(s = z_{1:T}\)
Processor	\(r = f_P(s, u)\)	\(r = f_P(s, u)\)	\(r = z_{T+1} = f_P(s, u_{1:T})\)
Composer	\(z' = f_C(z, r)\)	\(z' = r\)	\(z_{2:T+1} = [z_{2:T}, r]\)
Decoder	\(x = f_D(z)\)	\(x = f_D(z)\)	\(x_{1:T} = f_D(z_{1:T})\)

In LDM, the feature is a trivial identity map; the model representation is all in the processor; the autoencoder or the processor can be GNN, so that the model is graph-compatible. LDM can be both CT or DT.

In SDM, the encoder and decoder accept and produce sequences of length \(T\); the processor takes sequences of length \(T\) and only predicts the following one step. The models that process the sequences can be RNN’s or simply MLP/GNN applied to each step of \(x/u/z\). SDM is strictly DT.

More details on SDM are provided at the end of this section.

Linear feature model (LFM)¶

Using state-only autoencoder (MLP or GNN), with

Component	Expression	Auto	LTI	KBF-1	KBF-2
Feature	\(s = f_F(z, u)\)	\(s = z\)	\(s = [z, u]\)	\(s = [z, z u]\)	\(s = [z, z u, u]\)
Processor	\(r = f_P(s, u)\)	\(r = Ps\)	``	``	``
Composer	\(z' = f_C(z, r)\)	\(z' = r\)	``	``	``

`` means the same as left.

The features provide a series of \(z\) and \(u\) combinations: Auto - autonomous, LTI - linear time invariant, KBF - Koopman bilinear form.

The processor is a linear mapping, so depending on the chosen feature, the dynamics is simply linear or bilinear in the latent space.

DyMAD provides several linear solvers for the fast estimation of the linear mapping.

Kernel machine (KM)¶

Using state-only autoencoder (MLP or GNN), with

Component	Expression	Direct	Skip
Feature	\(s = f_F(z, u)\)	as in LFM	``
Processor	\(r = f_P(s, u)\)	\(r = k(s,S)\alpha\)	``
Composer	\(z' = f_C(z, r)\)	\(z' = r\)	\(z' = z + r\)

The KM is nearly the same as LFM, except that the processor uses a kernel model to capture more nonlinearity than LFM’s.

Multiple kernels are implemented in DyMAD, from standard Gaussian radial basis functions to data-driven manifold aware kernels.
A specialized kernel ridge regression solver is implemented to learn the kernel coefficients.
Using appropriate graph-compatible autoencoders, KM can extend to graph data seamlessly.

Physics-infused model¶

Using standard autoencoder, with

Component	Expression	Algebraic	Differential
Feature	\(s = f_F(z, u)\)	\(s = z\)	``
Processor	\(r = f_P(s, u)\)	\(r = f_P(s,u)\)	``
Composer	\(z' = f_C(z, r)\)	\(z' = f_U(z,u) + r\)	\(z' = \begin{bmatrix}f_U(z,u) + r \\ f_H(z,u)\end{bmatrix}\)

Here the main variation is in the composer:

In the algebraic case, the user needs to supply \(f_U\), which is usually a physics-based model. The processor output \(r\) is effectively a residual force that corrects \(f_U\).
The differential case entails the same idea for \(f_U\) and \(r\), but it also involves a hidden dynamics \(f_H\) to be learned from data that can also impact the residual force.

Additional note on SDM¶

Consider a generic case, where we are given an input sequence \(p_{1:T}\) and hope to obtain an output sequence \(q_{1:T}\) (in autoencoding) or output \(q_T\) (in processing). There are several options:

Brutal force: Construct one NN that takes in \(p_{1:T}\) simultaneously and maps to the desired output.
- This is often not recommended as the size of MLP would grow with \(T\), whereas \(T\) can be large.
- But could be the first thing to try, esp. for processing as only one output is needed.
Stepwise mapping: Construct one NN that takes in only one step \(p_i\) and maps to \(q_i\), and one sequentially apply the NN to \(p_{1:T}\) \(T\) times to obtain \(q_{1:T}\).
- This ignores the sequential relation across the steps.
- But useful for encoding/decoding purposes to conform the data space into the latent space; for processing one only needs to apply NN to step \(T\).
Recurrent model: Use a recurrent NN structure to map \(p_{1:T}\) to \(q_{1:T}\). For example, a vanilla RNN is

\[\begin{split} \begin{aligned} \text{Hidden:}&\ h_i = \sigma(W_h h_{i-1} + W_p p_i),\ i=1,\cdots,T \\ \text{Readout:}&\ q_i = W_z h_i \end{aligned} \end{split}\]
- The potential issue is the cost of evaluation, that scales with \(T\) (so trading time with space when comapred to brutal force case)
- The good part is certainly the account of sequential relation.

In practice, one can arbitrarily combine the options for the encoder, processor, and decoder to produce different SDM architectures. The inputs and outputs are summarized below,

Component	Input \(p_i\)	Output \(q_i\)
Encoder	\([x_i,u_i]\) or \(x_i\)	\(z_i\)
Processor	\(z_i\)	\(q_T=r\)
Decoder	\(z_i\)	\(x_i\)