Validation Overview¶
pyGOTM validates compiled-runtime output against the official GOTM 6.0.7 Fortran NetCDF reference files with a Frechet-distance comparison pipeline. The validation suite compares every numeric reference variable, records structural mismatches as failed variables, and writes an HTML index plus one HTML report page per case.
The current parity gate is the Frechet suite in
src/pygotm/validation/run_validation.py and
src/pygotm/validation/compare.py. Status is driven by the Frechet
indicators described below.
Reference Data Distribution¶
Top-level validation/ is mostly excluded from normal Git history.
validation/report/ is the exception: it contains the curated HTML/JSON
report snapshot used by documentation builds in a clean checkout. Generated
NetCDF outputs under validation/runs are reproducible, and the official
Fortran reference NetCDF files are large enough that they should be kept outside
the source repository.
To run the full 22-case validation suite, provide a local GOTM reference-data
tree under validation/reference/<case>/. Each case directory must contain
the GOTM YAML input files and the Fortran reference NetCDF outputs used by the
validation runner. Maintainers may distribute a separate reference-data archive
through a release artifact or scientific data repository.
The pyGOTM source repository does vendor a minimal seven-case reference set
at tests/fixtures/cases/ (couette, channel, asics_med,
rouse, seagrass, wave_breaking, entrainment) covering the
distinct physics regimes pyGOTM exercises. The full python -m pytest gate
resolves every case-aware test through these in-tree fixtures via
tests.fixtures.bundled_case, so a clean checkout passes the test suite
without any external download.
Validation Pipeline¶
Each validation run follows the same implementation path:
Case metadata is resolved from
validation/reference.The case YAML is selected explicitly, or by trying
gotm.yaml,gotm.yml, then a single non-fabm.yaml/output.yamlYAML file.The reference NetCDF is selected as the largest
*.ncfile in the case directory, excludingrestart.nc.GotmDriver(case.yaml_path).run(output_path=...)produces the pyGOTM NetCDF output undervalidation/runs/<run_name>/<run_name>.nc.The output dataset must report
runtime == "compiled". Validation does not silently fall back to legacy Python timestep logic.compare_nc(py_path, ref_path, case_name)opens both NetCDF files and compares all numeric reference variables.Each case report is written directly from the reference and pyGOTM NetCDF files. The final index is written to
validation/report/report.html. The lightweight case summary is serialized tovalidation/report/report.json, and per-variable machine-readable metrics are serialized tovalidation/report/results.jsonwithout embedded Plotly payloads.
The comparison is reference-driven. A numeric variable present in the
reference but missing from pyGOTM is BROKEN. Extra numeric variables in the
pyGOTM output are also reported as BROKEN so the report captures output
schema drift in either direction.
Alignment Before Comparison¶
The Frechet calculation is applied after structural and time alignment:
Numeric variables with no time dimension must have identical shapes.
If one side has a time dimension and the other does not, the variable is structurally
BROKEN.Recognized time dimension names are
timeandt.Non-time dimensions must match exactly.
Time coordinates come from the dataset coordinate when present, otherwise from
np.arange(size).datetime64andtimedelta64coordinates are converted to nanosecond integer coordinates before interpolation.Both arrays are interpolated onto the union of finite reference and pyGOTM time coordinates with
np.interp(..., left=np.nan, right=np.nan).Paired non-finite samples are dropped before Frechet distances are computed.
After alignment, the time axis is moved first and the aligned values are
flattened. The Frechet curve points are value-only points of shape
(-1, 1); time itself is not a coordinate in the Frechet point. For large
variables, the paired aligned samples are evenly downsampled to at most
frechet_k points using np.linspace(0, n - 1, frechet_k). The default
frechet_k is 200. This value intentionally balances shape sensitivity
against localized peak-timing sensitivity in chaotic turbulence regimes.
Frechet Theory¶
For aligned reference values \(p_0, \ldots, p_{n-1}\) and calculated values \(q_0, \ldots, q_{m-1}\), pyGOTM computes the discrete Frechet distance. The dynamic-programming table \(C\) is initialized with Euclidean distance between value-only points:
The first row and column accumulate the largest distance needed to traverse one curve while staying on the start of the other:
Interior cells use the standard discrete Frechet recurrence:
The raw Frechet distance is:
This distance answers the “leash length” question for two ordered value sequences: how far apart the two trajectories must be, at worst, when both are walked forward without reordering samples. That makes it stricter than a mean error while remaining less sensitive to a single isolated point than a pure maximum pointwise error.
The implementation uses the Numba-accelerated iterative version when available and falls back to the pure NumPy/Python iterative implementation if needed.
Normalization Theory¶
d_raw is reported in the native units of each variable. To classify many
variables on a common scale, the suite also computes a normalized Frechet
distance:
where \(F\) is the discrete Frechet operator and \(N\) is a dynamic range normalization based on absolute magnitudes from both arrays.
Normalization is section-aware. Core PyGOTM variables use the full finite
minimum and maximum magnitude range by default. This preserves active
turbulence tails for fields such as tke, eps, kb, G and
Pb; clipping those variables with ordinary 1st/99th robust percentiles can
make floor-dominated profiles look worse by collapsing the useful dynamic
range.
Non-PyGOTM variables, primarily PyFABM outputs, use a wide robust range by default:
where \(x\) is the combined finite set of reference and calculated values.
The wide robust range suppresses isolated biogeochemical outliers while keeping
nearly all physically meaningful active tails. If the selected range is
invalid, the implementation falls back to the finite minimum and maximum. If
the high estimate is non-positive, both arrays normalize to zero and the
normalization mode is degenerate.
For variables spanning fewer than switch_oom orders of magnitude, linear
normalization is used:
For variables spanning at least switch_oom orders of magnitude, log
normalization is used:
The default switch threshold is 2.0 orders of magnitude, so log
normalization is used for fields whose meaningful magnitudes span at least two
decades.
Frechet Indicators¶
Each variable result reports these current indicators:
d_rawDiscrete Frechet distance on aligned original values. If
d_rawis belowfrechet_abs_tol(default1e-12), bothd_rawandd_normare reported as zero.d_normDiscrete Frechet distance after dynamic linear/log normalization. If
d_rawis belowfrechet_rel_tol * signal_scale(default relative tolerance1e-6),d_normis reported as zero.d_relRelative raw Frechet score used only for below-floor signals:
d_raw / signal_scale. Heresignal_scaleis the maximum finite absolute magnitude across the aligned reference and calculated arrays, with aneps_floorlower bound.scoreThe status-driving value stored in report rows. For normal signals this is
d_norm. For variables whosesignal_scaleis positive but below the variable magnitude floor, this isd_rel. Themetric_modefield is"d_norm"or"d_rel"accordingly.peak_d_normNon-classifying diagnostic normalized Frechet distance computed with the stricter peak-sensitive configuration: full-range normalization and
frechet_k = 400. It preserves a debugging signal for localized peak misalignment without letting that signal decide case status.
The report displays full-precision reference and calculated values at the aligned sample with the largest absolute finite difference. Structural failures use infinite distances internally and are displayed without plots.
Status Bands¶
Variable status is assigned from score:
Status |
Color |
Score range |
Meaning |
|---|---|---|---|
PASS |
green |
\(score < 0.01\) |
The variable is within the current Frechet parity threshold. |
MARGINAL |
yellow |
\(0.01 \leq score < 0.05\) |
The variable is close to parity but outside the pass threshold. |
DISCREPANT |
orange |
\(0.05 \leq score < 0.20\) |
The variable has a deterministic implementation difference. |
BROKEN |
red |
\(score \geq 0.20\) or structural failure |
The variable is severely mismatched, missing, extra, non-finite, or structurally incompatible. |
A case has status PASS only when all compared variables pass. It has
status FAIL when one or more variables are MARGINAL, DISCREPANT, or
BROKEN. Runtime or validation exceptions are reported as ERROR.
Tolerance Configuration¶
Frechet thresholds and normalization settings are defined by FrechetConfig
in src/pygotm/validation/tolerances.py.
Setting |
Default |
Purpose |
|---|---|---|
|
|
Upper score bound for |
|
|
Upper score bound for |
|
|
Upper score bound for |
|
|
Absolute raw-distance cutoff reported as exact zero. |
|
|
Relative raw-distance cutoff for zero normalized distance. |
|
|
Maximum paired samples used for the status-driving Frechet calculation. |
|
|
Use full finite range estimates for PyGOTM variables. |
|
|
Robust percentiles used when robust PyGOTM normalization is explicitly enabled. |
|
|
Use percentile-based range estimates for non-PyGOTM variables. |
|
|
Robust normalization percentiles for non-PyGOTM variables. |
|
|
Sample count for the non-classifying peak-sensitive diagnostic score. |
|
|
Order-of-magnitude span that switches from linear to log normalization. |
|
|
Lower bound used in signal and log normalization. |
|
|
Default tiny-signal floor for |
Variable-specific magnitude floors are used only to choose d_rel instead
of d_norm for tiny signals:
Floor |
Variables |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Report Structure¶
For each case the report contains separate sections:
PyGOTM variables - core GOTM physics outputs, including temperature, salinity, velocity, turbulence quantities, and related diagnostics.
PyFABM variables - biogeochemical outputs and any other numeric variables not listed in
PYGOTM_VARIABLES.
This split shows whether a non-conformance originates in the physics solver or in biogeochemical coupling.
Per-variable tables use these columns:
Status
Variable
Reference (full precision)
Calculated (full precision)
Raw Frechet
Score (Normalized Frechet / d_rel)
Peak-sensitive d_norm
Parameter plot
Comparison plots are embedded only for MARGINAL and DISCREPANT
variables. PASS variables do not receive plots. BROKEN variables are
listed without plots because structural mismatches should be debugged before
time-series visualization.
Interpreting Remaining Differences¶
The current 22-case snapshot has seven non-PASS reference cases and no
BROKEN variables. The generated case HTML reports show the following
non-PASS rows.
gotlandThe report contains 13 PyGOTM
MARGINALrows, 16 PyGOTMDISCREPANTrows, and one PyFABMMARGINALrow. The PyFABM row isqh. The non-PASS PyGOTM rows aremld_surf,u,v,num,nuh,nus,tke,eps,SS,P,uu,vv,ww,NN,NNT,NNS,kb,G,avh,Rig,L,cmue1,cmue2,an,as,at,taux,tauy, andEturb. Invalidation/runs/gotland/turbulence_debug.json,tkehasmax_abs_errno larger than2.5894818603364984e-6through time index 20; indices 21-30 include values up to8.138172948438899e-5; the case summary reports the largesttkemax_abs_erras6.560663751105973e-4.ows_papaThe report contains two non-PASS rows: PyGOTM
tauxisMARGINALand PyGOTMtauyisDISCREPANT.resoluteThe report contains 17 PyGOTM
MARGINALrows and 6 PyGOTMDISCREPANTrows, with no PyFABM non-PASS rows. The non-PASS rows areint_total,mld_surf,v,tke,eps,SS,P,uu,vv,ww,NN,NNT,NNS,kb,epsb,G,Rig,L,an,as,at,taux, andEturb.langmuirThe report contains 111
PASSrows and two non-PASS rows. The non-PASS rows are PyGOTMNNSatMARGINALand PyFABMdsatDISCREPANT.medsea_eastThe report contains 16 non-PASS rows, all PyGOTM
MARGINALrows. The rows arenum,nuh,nus,SS,P,ww,G,avh,L,cmue1,cmue2,an,as,at,taux, andtauy.medsea_westThe report contains 11 PyGOTM
MARGINALrows and one PyFABMDISCREPANTrow. The PyGOTM rows arenum,nuh,nus,tke,ww,NN,NNT,G,Rig,L, andan. The PyFABM row isjrc_med_ergom_OFL.nns_annualThe report contains three PyGOTM
MARGINALrows, four PyFABMMARGINALrows, and six PyFABMDISCREPANTrows. The PyGOTM rows aremld_surf,v, andtke. The PyFABM rows arenpzd_nut,npzd_phy,npzd_zoo,npzd_det,u_taub,npzd_PPR,npzd_NPR,npzd_PAR,attenuation_coefficient_of_photosynthetic_radiative_flux, andtotal_nitrogen.
The generated reports identify variable status, Frechet metrics, full-precision
reference and calculated values at the largest aligned absolute difference, and
comparison plots for MARGINAL and DISCREPANT rows. They do not identify
root causes for the remaining differences.
Running the Validation Suite¶
All project Python commands must run inside the pygotm conda environment
with conda run -n pygotm.
# Run the default Frechet cases: couette, channel, entrainment
conda run -n pygotm python -m pygotm.validation.run_validation
# Run specific cases
conda run -n pygotm python -m pygotm.validation.run_validation \
--cases couette,channel,entrainment
# Run all reference cases
conda run -n pygotm python -m pygotm.validation.run_validation --all
# Run a named group
conda run -n pygotm python -m pygotm.validation.run_validation \
--group non-stim
# Re-render report pages directly from existing NetCDF outputs
conda run -n pygotm python -m pygotm.validation.render_report \
--all --output-dir validation
Useful suite options include --exclude, --device, --workers,
--dashboard-port, --output-dir, --no-run, --no-warmup, and
--debug-turbulence.
The default case set (couette,channel,entrainment) is expected to pass.
The current full 22-case suite completes all cases but returns a nonzero exit
status because the generated snapshot is PARTIAL PARITY: 15 cases pass and
7 cases fail.
Output files:
validation/report/report.html- human-readable HTML index with one frame per case.validation/report/report.json- lightweight structured report JSON with case statuses, counts, paths, wall times, hardware metadata, and verdict.validation/report/results.json- per-variable structured report JSON that round-trips throughpygotm.validation.report.load_json(); Plotly HTML payloads are omitted so the artifact remains suitable for release triage.validation/report/<run_name>.html- per-case report generated directly from the reference and pyGOTM NetCDF files, with embedded Plotly plots for marginal and discrepant variables.validation/runs/<run_name>/<run_name>.nc- generated pyGOTM NetCDF files.
Generated Reports¶
The documentation build copies the HTML reports currently present in
validation/report/ (produced by conda run -n pygotm python -m
pygotm.validation.run_validation) into the built docs at validation/.
The index page links to each per-case report:
Because validation/report/ is tracked, the link above should work in a
fresh checkout and in GitHub CI. Regenerate the report snapshot after validation
changes so the committed docs report stays in sync with the current parity
state.
Developer Benchmarking¶
Benchmarking is separate from Frechet parity validation. The developer-only benchmark command measures execution timing and does not classify scientific parity:
conda run -n pygotm python -m pygotm.execution.benchmark \
--cases couette,channel
Run pygotm validate or python -m pygotm.validation.run_validation as a
separate acceptance gate after any benchmarked code change. See
Command and Developer Surfaces for every user and developer command
option.
Validation API¶
from pathlib import Path
from pygotm.validation.runner import validate_case
result = validate_case("couette", Path("validation/runs"), skip_run=True)
See Validation for the full API reference.
Test Cases¶
See Test Cases for the current generated case status summary and the case-level aggregation rules.