Repository Architecture¶
Purpose¶
This document is the current source of truth for:
repo package boundaries
MCP user/developer surfaces
registry/compiler/execution layering
checkpoint compatibility flow
where new features should plug in
For algorithm-specific integration expectations, read feature-placement.md
after this document.
Maintenance rule:
If a change alters MCP surfaces, boundary ownership, registry/compiler responsibilities, checkpoint materialization flow, or persisted handle types, update this document in the same change.
Package Map¶
Area |
Owns |
|---|---|
|
MCP-facing tool adapters and server assembly |
|
transport-neutral app services shared by user-facing adapters |
|
user-facing capability metadata, profiles, schemas, supported analyses/evaluations |
|
typed request validation and compilation into persisted requests |
|
workflow orchestration and compatibility execution |
|
stable typed boundary over persisted objects |
|
in-memory/filesystem-backed artifact records and handle persistence |
|
model families, collections, typed model specs, rollout helpers |
|
training runtime, phases, trainers, execution helpers |
|
checkpoint loading, trajectory/data managers, legacy public runtime entrypoints |
|
typed runtime/series/transform building blocks |
|
math and linear-algebra utilities |
|
spectral analysis runtime and adapters |
Layer Stack¶
Current user-facing stacks:
MCP server
-> user_tools / developer_tools
-> app services where transport-neutral workflow assembly is shared
-> registry + compiler
-> CompatibilityExecutor
-> FacadeOperations
-> ObjectStore / FilesystemArtifactStore
-> legacy runtime/training/checkpoint/analysis code
dymad CLI
-> cli.py argument adapter
-> agent/app path-first workflow service
-> registry + compiler
-> CompatibilityExecutor
-> FacadeOperations
-> ObjectStore / FilesystemArtifactStore
-> legacy runtime/training/checkpoint/analysis code
Important distinction:
server.pyonly registers tools and mode splits.user_tools.pyis the high-level surface.demo_tools.pyplusdeveloper_tools.pyexpose the raw/developer surface.cli.pyis the package-level path-first user interface; it should stay thin and delegate workflow assembly toagent/app.CompatibilityExecutorstill owns orchestration, but some compatibility flows intentionally materialize through legacyio/*code instead of fully executor-native implementations.
User Transports¶
DyMAD now has two user-facing transports over the same registry/compiler/executor/facade/store boundary:
MCP user mode is structured and handle-first. It assumes dataset handles already exist and keeps
{"ok": ..., "data": ...}envelopes.The
dymadCLI is path-first and reproducibility-focused. It loads YAML configs, registers dataset paths through the facade, compiles through the user-mode training compiler, launches the same async worker, and writesdymad-run.jsonunder the run directory so later CLI commands can recover handles and store location.dymad train --config ...can derive the run directory from the config file’s directory plusrun.name;--outremains available to choose and validate an explicit run directory.
MCP Surfaces¶
build_server(mode=...) supports three registrations:
mode="user": high-level workflowsmode="developer": raw/debug/compatibility toolsmode="both": both surfaces on one server
User Mode¶
User mode is registry/compiler-backed. It currently exposes:
list_training_capabilitieslist_analysis_capabilitieslist_evaluation_capabilitiesdescribe_training_capabilitycompile_training_requeststart_training_rundescribe_training_runread_training_run_logevaluate_checkpointcompile_analysis_requestrun_analysis_request
Notes:
user mode does not require raw
model_refuser mode compiles
model_keyplus validated overrides into persisted compiled requestsdescribe_training_capabilityis the authoritative contract for allowed overrides, phase-entry schemas, CV sweep support metadata, natural-language-to-override translation guidance, and surfaced training constraintsuser mode currently assumes dataset handles already exist
Developer Mode¶
Developer mode keeps the raw and compatibility-oriented path available:
register_dataset_fileinspect_datasetregister_checkpointprepare_prediction_requestplan_checkpoint_predictionstart_model_trainingdescribe_training_runread_training_run_logevaluate_modellist_evaluation_capabilitieslist_model_capabilitiesresolve_model_capabilitylist_profile_capabilitiesdescribe_training_capabilitydescribe_objectlist_objects
Use developer mode when debugging boundary behavior, raw config/profile selection, or compatibility flows.
Current Workflow Paths¶
Training and Evaluation¶
High-level path:
register_dataset_file
-> describe_training_capability / list_training_capabilities
-> compile_training_request
-> start_training_run
-> describe_training_run / read_training_run_log
-> evaluate_checkpoint
CLI training enters the same path after resolving files from a YAML config:
dymad train --config config.yaml [--out runs/foo]
-> agent/app CLI workflow service
-> register_dataset_file for train/valid/test paths
-> compile_training_request
-> start_training_run
-> describe_training_run / read_training_run_log
-> evaluate_checkpoint via dymad eval
Compilation resolves:
model_key-> model capability -> defaultmodel_refdataset kind compatibility
default or explicit profile
allowed user overrides
optional single-split CV sweep settings under
overrides.cv, including:param_gridcandidate definitions for grid or legacy candidate-based adaptive searchoptional
searchpolicy whosemodeselects the CV optimizer (gridornelder_mead_like) plus optimizer-specific config such as simplex-style coefficients; in current runtimenelder_mead_likecan either run a bounded continuous search oversearch.boundslower/upper pairs or, when bounds are omitted, the legacy adaptive path over numeric single-splitparam_gridcandidatesoptional
selectionpolicy (goalplus ordered tie-breakers) for deterministic best-model choice
phase overrides normalized against matching profile defaults so trainer-specific phase config is preserved unless explicitly overridden
translation guidance and surfaced constraint notes for clients that map natural-language requests into structured overrides, including CV sweep requests
effective config
trainer kind
Execution is now submit-and-poll:
compile_training_requeststill persists the validated compiled requeststart_training_run/start_model_trainingpersist atraining_runrecord immediately and spawndymad.agent.exec.training_workerthe worker reloads the persisted context, marks the run
RUNNING, executes the private synchronous_execute_training_run(...)helper, then persistsSUCCEEDEDorFAILEDdescribe_training_runis the polling surface and reconciles staleRUNNINGjobs whose worker pid has disappeared without a terminal writeread_training_run_logreturns incremental log chunks from the persisted worker log
Analysis¶
Current analysis path:
compile_analysis_request
-> persisted compiled analysis request
-> run_analysis_request
-> analysis-specific execution in CompatibilityExecutor
Currently supported workflow keys:
spectral_koopmanvortex_transform_modes
Checkpoint Compatibility¶
Current checkpoint load path:
dymad.io.load_model(...)
-> CompatibilityExecutor.plan_checkpoint_prediction(...)
-> FacadeOperations.register_checkpoint(...)
-> FacadeOperations.prepare_prediction_request(...)
-> legacy checkpoint materialization in dymad.io.checkpoint
This is an important current-state detail:
CompatibilityExecutor.plan_checkpoint_prediction(...)is active.CompatibilityExecutor.materialize_checkpoint_prediction(...)is not the active materialization path today; it is a placeholder that raisesNotImplementedError.the persisted checkpoint and prediction-request handles still record the boundary state used by
load_model(...).
So the boundary plan is real, but final checkpoint materialization still goes through
dymad.io.checkpoint.
Persisted Artifacts and Handles¶
The object store persists the main boundary objects used by MCP and compatibility workflows:
datasets:
ds_*checkpoints:
chk_*training runs:
run_*compiled training requests:
trainreq_*compiled analysis requests:
analysisreq_*evaluations:
eval_*prediction requests:
pred_*spectral snapshots:
specsnap_*
If a new workflow needs durable planning or inspection across calls, it usually needs a new record
type in agent/store plus matching facade helpers.
Design Rules¶
Keep policy and validation out of
server.py.Prefer stable user-facing keys in
registry/*over raw import strings in user-mode flows.Put request-shape validation in
compiler/*, not in MCP adapters.Put orchestration in
exec/*, not in registry or MCP modules.Put persistence logic in
store/*andfacade/*, not in executor methods.Keep model/math/runtime behavior in the implementation packages unless the public boundary changes.
Tests That Define the Boundary¶
Use these as the fastest ground truth for the current architecture:
tests/test_mcp_server_modes.py: user/developer mode splittests/test_mcp_user_tools.py: user-mode compile/train/evaluate pathtests/test_training_compiler.py: typed training compiler behaviortests/test_analysis_workflows.py: compiled analysis workflowstests/test_checkpoint_e2e_layering.py: checkpoint planning through exec/facade/storetests/test_public_load_model_boundary.py:load_model(...)still materializes throughdymad.io.checkpoint
When Adding Features¶
If you are deciding where a change belongs, use feature-placement.md.
If your change moves the answer, update that file too.