Biogeochemistry and FABM Coupling =================================== pyGOTM integrates biogeochemical modelling through the Framework for Aquatic Biogeochemical Models (FABM) via its Python front-end, **pyfabm**. The coupling enables any FABM-compatible biogeochemical or ecological model to run within a pyGOTM single-column simulation without modifying the pyGOTM physics kernel. What is FABM? -------------- FABM (Framework for Aquatic Biogeochemical Models) is an open-source, model-agnostic coupling framework that allows biogeochemical models to be compiled once and coupled to different physical ocean and lake models without code changes. Each FABM model describes: * **State variables** — concentrations of biogeochemical quantities (e.g. phytoplankton, dissolved oxygen, nutrients). * **Source/sink rates** — the net production or consumption of each state variable due to biological reactions, chemistry, and air-sea exchange. * **Dependencies** — physical environment fields that the biogeochemical model needs from the host (temperature, salinity, light, density, etc.). * **Diagnostics** — derived quantities the model exposes back to the host (e.g. light attenuation, chlorophyll *a*). **pyfabm** is the Python binding to the FABM shared library. It exposes the FABM model described in a ``fabm.yaml`` configuration file as a ``pyfabm.Model`` object whose state arrays, rates, diagnostics, and dependencies can be read and written from Python. A wide range of models are available — from simple nutrient-phytoplankton-zooplankton (NPZ) models to complex biogeochemical models such as ERSEM and HBM-NEMO-PISCES. Enabling FABM in pyGOTM ------------------------ FABM is activated by adding a ``fabm`` block to the GOTM YAML configuration: .. code-block:: yaml fabm: use: true config_file: fabm.yaml # path to the FABM model YAML, relative to gotm.yaml freshwater_impact: true # dilute/concentrate tracers with precipitation repair_state: false # clamp negative concentrations to zero feedbacks: shade: false # allow FABM to modify light attenuation albedo: false # allow FABM to modify surface albedo surface_drag: false # allow FABM to modify surface drag The ``config_file`` key (also accepted as ``config``, ``yaml``, or ``file``) points to the FABM model YAML that specifies which biological models are active. If not supplied, the default ``fabm.yaml`` in the same directory as ``gotm.yaml`` is used. The coupling is completely optional. If ``fabm.use: false`` (the default), the physics-only compiled Numba loop runs unchanged and pyfabm is never imported. FABM output variables ~~~~~~~~~~~~~~~~~~~~~ When FABM is active, pyGOTM discovers output variables directly from the instantiated ``pyfabm.Model``. Every FABM state variable is written to the NetCDF result, and every pyfabm diagnostic whose ``output`` flag is enabled is written as well. Names use FABM's ``output_name`` metadata, so slashes in model paths are normalized to underscores. For example, ``gotm/npzd`` variables are reported as ``npzd_phy`` and ``npzd_PAR``, and ``bb/passive`` configured as ``sed/c`` is reported as ``sed_c``. Interior state variables and interior diagnostics are written as ``(time, z, lat, lon)`` profiles. Surface state, bottom state, and horizontal diagnostics are written as ``(time, lat, lon)`` scalar fields. pyGOTM attaches the FABM ``units`` and ``long_name`` metadata to each emitted NetCDF variable, which makes the output schema and downstream integrations model-agnostic. The host feedback placeholders ``surface_albedo``, ``surface_drag_coefficient_in_air``, and ``attenuation_coefficient_of_photosynthetic_radiative_flux`` remain present for FABM-active runs for compatibility with existing validation reports. They are separate from the model-discovered FABM namespace. Use ``pygotm schema output --config path/to/gotm.yaml --json`` to inspect the exact dynamic output surface before running a FABM-active configuration. Installation Boundary ~~~~~~~~~~~~~~~~~~~~~ FABM coupling requires ``pyfabm`` from conda-forge. The project conda environment includes it through ``pygotm-conda-env.yml``. The Python package does not expose a ``pygotm[fabm]`` pip extra because current FABM-capable ``pyfabm`` builds are not available as matching PyPI dependencies. If FABM is enabled without importable ``pyfabm``, pyGOTM raises a clear runtime error during FABM engine setup. .. _fabm-coupling-design: Coupling Architecture: Chunked Interleaved Loop ------------------------------------------------ A fundamental constraint drives the design of the coupling: **Numba-compiled kernels cannot call back into Python**, and pyfabm is a Python-level library. They cannot be interleaved at the sub-timestep level. The solution adopted in pyGOTM is a **chunk-based sequential coupling** that interleaves physics and biogeochemistry at the granularity of *chunks* — contiguous windows of timesteps large enough to amortise the Python overhead while short enough to maintain tight coupling between the physical environment and the biological state. Chunk size ~~~~~~~~~~ The default chunk size is approximately **one day of simulation time**: .. math:: N_\mathrm{chunk} = \max\!\left( \left\lfloor \frac{86400\,\mathrm{s}}{\Delta t} \right\rceil, 1 \right), where :math:`\Delta t` is the physics timestep in seconds. This is then rounded up to the nearest multiple of the output interval so that output boundaries are always aligned between physics and biogeochemistry. The chunk size can be overridden by passing ``chunk_size`` to :func:`~pygotm.gotm.gotm.integrate_gotm_compiled`. There is a natural trade-off: * **Smaller chunks** — lower memory for the hydrodynamic state buffers, but more Python overhead. * **Larger chunks** — less Python overhead, but larger scratch arrays (shape :math:`(N_\mathrm{chunk}+1) \times (N_\mathrm{lev}+1)`). Memory grows as approximately :math:`7 N_\mathrm{chunk} N_\mathrm{lev}` doubles (``float64`` arrays for T, S, ρ, h, nuh, radiation, and bottom stress). * **One-day default** — balances memory and overhead across typical timesteps of 60–600 s. The chunk size has *no impact on numerical accuracy* for the biological equations — the biogeochemistry is advanced at the same timestep :math:`\Delta t` as the physics within each chunk; chunking only determines how often the stored hydrodynamic snapshots are refreshed. Per-chunk execution sequence ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Each chunk k covers timesteps :math:`k \cdot C \ldots k \cdot C + C - 1` and executes two sequential passes. .. mermaid:: flowchart TD start(["Chunk k — timesteps k·C … k·C + C − 1"]) P["1 · Physics pass — Numba JIT\nrun_compiled_time_loop(nt=C, hydro_store=ON)"] buf["Advances T, S, U, V, k, ε by C steps\nStores T, S, ρ, h, nuh, rad, τ_b at every step\n(hydro buffers · shape C+1 × nlev+1)"] loop["2 · FABM pass — Python (for step = 0 … C)"] fa["a. set_environment(T[step], S[step], ρ[step], …)"] fb["b. sinking / rising advection (adv_center)"] fc["c. turbulent diffusion (diff_center)"] fd["d. biogeochemical rates (engine.get_rates)"] fe["e. apply source/sink rates: cc += dt · rates"] done(["cc — updated biological state passed to chunk k+1"]) start --> P --> buf buf -->|"hydro snapshots"| loop loop --> fa --> fb --> fc --> fd --> fe --> done style start fill:#e8f4e8,stroke:#4a8f4a,color:#1a4a1a style P fill:#dce8f7,stroke:#3a78b5,color:#1a4a7a style buf fill:#dce8f7,stroke:#3a78b5,color:#1a4a7a style loop fill:#fef3cd,stroke:#c8960c,color:#7a5c00 style fa fill:#fef3cd,stroke:#c8960c,color:#7a5c00 style fb fill:#fef3cd,stroke:#c8960c,color:#7a5c00 style fc fill:#fef3cd,stroke:#c8960c,color:#7a5c00 style fd fill:#fef3cd,stroke:#c8960c,color:#7a5c00 style fe fill:#fef3cd,stroke:#c8960c,color:#7a5c00 The biological state ``cc`` (shape ``n_vars × nlev``) is passed from one chunk to the next as a direct argument, providing continuity. .. figure:: ../figures/fabm_coupling.png :align: center :width: 90% :alt: pyGOTM–pyfabm chunked coupling diagram **Figure** — Chunked interleaved coupling between the pyGOTM Numba physics kernel and the pyfabm biogeochemical engine. Each chunk runs the full physics for :math:`N_\mathrm{chunk}` timesteps, stores snapshots of the hydrodynamic state, then drives pyfabm through those same timesteps. Variables exchanged at each step are shown as labelled arrows. Variables Exchanged ------------------- pyGOTM → pyfabm (dependencies) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ At every biogeochemical timestep, pyGOTM sets the following FABM dependencies from the stored hydrodynamic snapshots: .. list-table:: :header-rows: 1 :widths: 40 15 45 * - FABM dependency name - Shape - Source in pyGOTM * - ``temperature`` - ``(nlev,)`` - Potential temperature :math:`\theta` [°C] — stored snapshot * - ``practical_salinity`` - ``(nlev,)`` - Practical salinity :math:`S` [psu] — stored snapshot * - ``density`` - ``(nlev,)`` - In-situ density :math:`\rho` [kg m⁻³] — stored snapshot * - ``cell_thickness`` - ``(nlev,)`` - Layer thicknesses :math:`h_i` [m] — stored snapshot * - ``downwelling_photosynthetic_radiative_flux`` - ``(nlev,)`` - PAR profile [W m⁻²] — computed from radiation snapshot, updated after light-feedback * - ``surface_downwelling_photosynthetic_radiative_flux`` - scalar - Surface PAR [W m⁻²] — top-of-column value * - ``wind_speed`` - scalar - Wind speed :math:`|\mathbf{u}_{10}|` [m s⁻¹] from (u10, v10) forcing * - ``number_of_days_since_start_of_the_year`` - scalar - Fractional day of year (yearday + secondsofday / 86400) * - ``bottom_stress`` - scalar - Bottom friction velocity :math:`\tau_b` [Pa] — stored snapshot Dependencies that are not exposed by the particular FABM model are silently skipped. A call to ``engine.unresolved_dependencies()`` reports any required dependencies that have not been set after initialization. pyfabm → pyGOTM (feedbacks and diagnostics) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 40 15 45 * - FABM diagnostic / output - Shape - Effect in pyGOTM * - ``attenuation_coefficient_of_photosynthetic_radiative_flux`` (or ``kc``) - ``(nlev,)`` - Bio-shading: modifies the PAR profile fed back to FABM at the next rates call within the same timestep (light-feedback loop) * - Source/sink rates (all state variables) - ``(n_vars, nlev)`` - Applied to biological concentrations: ``cc += dt * rates`` * - Surface exchange rates - ``(n_vars, nlev)`` - Added only to the top layer (index ``−1``) as air-sea exchange * - Bottom exchange rates - ``(n_vars, nlev)`` - Added only to the bottom layer (index ``0``) as sediment flux * - Vertical movement velocities - ``(n_interior, nlev)`` - Sinking or rising advection before the diffusion/reaction step Within-Chunk FABM Timestep --------------------------- Each biological timestep within a chunk applies the following operations in order, which exactly mirrors the sequence used by ``gotm_fabm.F90`` in the reference Fortran GOTM implementation: 1. **Set environment** — copy the stored hydrodynamic snapshots into pyfabm dependencies (temperature, salinity, density, cell thickness, PAR, wind, year-day, bottom stress). 2. **Vertical advection** (sinking/rising) — interior state variables with non-zero vertical movement velocities are advected using the conservative P2-PDM scheme (``adv_center``) with zero-flux boundary conditions at top and bottom. Face velocities are computed by thickness-weighted interpolation of cell-centre velocities to interfaces, replicating lines 1161–1199 of ``gotm_fabm.F90``. 3. **Turbulent diffusion** — each interior state variable is diffused vertically using the implicit Crank–Nicolson solver (``diff_center``) with the turbulent diffusivity :math:`\nu_h` from the stored hydrodynamic snapshot. Precipitation dilution is applied as a Neumann surface boundary condition: :math:`F_\mathrm{surf} = -c_{n,\mathrm{sfc}} \cdot \text{precip}`. 4. **Biogeochemical rates** — pyfabm ``getRates()`` is called *after* the transport step so that reaction rates act on the post-transport concentrations. The rates are retrieved three times with different surface/bottom flags to separate bulk, surface, and bottom contributions: .. code-block:: text bulk_rates = engine.get_rates(surface=False, bottom=False) surf_rates = engine.get_rates(surface=True, bottom=False) bot_rates = engine.get_rates(surface=False, bottom=True) 5. **Light feedback** — if ``attenuation_coefficient_of_photosynthetic_radiative_flux`` is exposed by the model, the PAR profile is recomputed from the biological attenuation and fed back into FABM before a second rates call within the same timestep. This two-pass approach captures the immediate effect of phytoplankton self-shading on the light available for growth. 6. **Apply rates** — concentrations are updated: .. math:: c_i^{n+1} = c_i^n + \Delta t \cdot R_i^\mathrm{bulk} + \Delta t \cdot \delta_{i,\mathrm{top}} \left(R_i^\mathrm{surf} - R_i^\mathrm{bulk}\right)\big|_\mathrm{top} + \Delta t \cdot \delta_{i,\mathrm{bot}} \left(R_i^\mathrm{bot} - R_i^\mathrm{bulk}\right)\big|_\mathrm{bot}, where surface and bottom exchange contributions are concentrated in the top and bottom cells respectively. Surface and bottom state variables (e.g. sediment fractions exposed by some FABM models) are updated separately from their single-layer rate arrays. PAR Profile and Light Feedback ------------------------------- The downwelling photosynthetically active radiation (PAR) at cell-centre depth :math:`z_k` is computed from the surface radiation snapshot using a two-component exponential attenuation: .. math:: I_k = I_0 (1 - A) \exp\!\left(-\frac{z_k}{g_2} - K_\mathrm{bio}(z_k)\right), where :math:`I_0` is the surface shortwave, :math:`A` is the fraction absorbed in the first attenuation component (``light_A``), :math:`g_2` is the background attenuation depth scale (``light_g2``), and :math:`K_\mathrm{bio}(z_k) = \sum_{k'=k}^{N} k_{c,k'} \Delta h_{k'}` is the cumulative biological attenuation integrated downward from the surface using the ``kc`` (or ``attenuation_coefficient_of_photosynthetic_radiative_flux``) diagnostic exposed by the FABM model. When ``light_g2 ≤ 0``, light feedback is disabled and the PAR profile is the stored GOTM radiation profile unchanged. Memory Layout -------------- The hydrodynamic state buffers allocated once per integration are .. list-table:: :header-rows: 1 :widths: 15 25 60 * - Array - Shape - Content * - ``hydro_T`` - ``(C+1, nlev+1)`` - Temperature snapshots (bottom→top, index 0 = initial condition) * - ``hydro_S`` - ``(C+1, nlev+1)`` - Salinity snapshots * - ``hydro_rho`` - ``(C+1, nlev+1)`` - Density snapshots * - ``hydro_h`` - ``(C+1, nlev+1)`` - Cell-thickness snapshots * - ``hydro_nuh`` - ``(C+1, nlev+1)`` - Turbulent diffusivity snapshots * - ``hydro_rad`` - ``(C+1, nlev+1)`` - Shortwave radiation snapshots (PAR source) * - ``hydro_taub`` - ``(C+1,)`` - Bottom stress snapshots (scalar per step) Row 0 of each buffer holds the state at the *start* of the chunk (or the initial condition on the first chunk). Rows 1…C hold the state *after* each physics timestep. The last column (index ``nlev``) is the surface interface value. If the final chunk is smaller than the default chunk size, the buffers are reallocated to the smaller size so no stale data is used. FABM NetCDF output buffers are allocated after ``pyfabm.Model`` has been instantiated, because only the live model knows its state and diagnostic output names. Memory for those buffers scales as :math:`O(N_\mathrm{vars} N_\mathrm{out} N_\mathrm{lev})` for interior variables and :math:`O(N_\mathrm{vars} N_\mathrm{out})` for horizontal variables. Physics- only runs allocate no FABM output buffers. See Also -------- * :doc:`/api/fabm` — full FABM module API reference (FABMConfig, FABMEngine, coupling, fabm_loop) * :doc:`overview` — global solution sequence and where biogeochemistry fits * :doc:`airsea` — surface forcing that provides wind and radiation to the coupling