Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculation of annual fAPAR and LAI values from a fitted PModel and precipitation data #348

Open
5 tasks
Tracked by #347
davidorme opened this issue Nov 6, 2024 · 11 comments · May be fixed by #403
Open
5 tasks
Tracked by #347

Calculation of annual fAPAR and LAI values from a fitted PModel and precipitation data #348

davidorme opened this issue Nov 6, 2024 · 11 comments · May be fixed by #403
Assignees
Labels
enhancement New feature or request

Comments

@davidorme
Copy link
Collaborator

davidorme commented Nov 6, 2024

This is the first step in implementing #347 and the description below is taken from the draft implementation document in the meta issue.

Theory

The predicted fAPARmax / LAImax mainly has two roles:

  1. is to calculate a parameter m which is the ratio of steady-state LAI to steady-state GPP;
  2. is to limit the prediction of LAI timeseries, predicted LAI ≤ LAImax.

The equations to calculate fAPARmax (Eqn 1) and LAImax (Eqn 2) are:

(Eqn. 1) $f_{APAR_{max}}= min\left[ (1 – z/(kA_0)), [c_a (1–\chi)/1.6 D] [f_0 P/A_0]\right]$
(Eqn. 2) $LAI_{max}= –(1/k) ln (1 –f_{APAR_{opt}})$

Of those terms, the following are new constants across the calculations. There aren't too many, so I think these are parameters to the calculation but we might want to bundle them into a new constants class (MaxFAPARConst?)

  • $z$ accounts for the costs of building and maintaining leaves and the total below-ground allocation required to support the nutrient demand of those leaves, and the globally fitted values for z was 12.227 mol m–2 year–1.
  • $k$ is the light extinction coefficient (set at 0.5).
  • $f_0$ is the ratio of annual total transpiration of annual total precipitation, which is an empirical function (Eqn 3) of the climatic Aridity Index (AI).

(Eqn 3) $f_0=0.65 e^{-b ln^2 \left(\frac{AI}{1.9}\right) }$

where 0.65 is the maximum value, 1.9 is the AI at which this maximum occurs, and b = 0.604169. For AI , see below.

The following need to be be extracted from a P Model

  • $A_0$ is the annual sum of potential GPP. Potential GPP would be achieved if fAPAR = 1. (mol m-2 year-1).
  • $c_a$ is the ambient CO2 partial pressure (Pa).
  • $\chi$ is the annual mean ratio of leaf-internal CO2 partial pressure (Pa) to ca during the growing season (>0℃).
  • $D$ is the annual mean vapour pressure deficit (VPD, Pa) during the growing season (>0℃).

And lastly, precipitation data and aridity data is needed:

  • P is the annual total precipitation (mol m–2 year–1),
    *AI is a climatological estimate of local aridity index, typically calculated as total PET over total precipitation for an appropriate period, typically at least 20 years.

An Example to predict fAPARmax:

z is 12.227 mol m–2 year–1; 
k is 0.5; 
A0 is 200 mol m–2 year–1; 
ca is 40 Pa;
χ is 0.8; 
D is 3500 Pa; 
P is 51888 mol m–2 year–1; 
f0 is 0.62.
fAPAR_max=min⁡{1 –12.227/(0.5×200),[37×(1–0.8)/(1.6×3500)][0.62×51888/200]}  
                  = min {0.87, 0.21}
                  = 0.21

Implementation

This is a draft workflow for implementing the calculation

  • A user fits a P Model - this could be the standard or subdaily model. The model must cover at least one year and should be fitted with a $fAPAR$ fixed at 1 to give predictions of potential GPP ($A_0$ above). This could be fitted to a single site or to arrays for multiple sites.
  • The user needs to provide a sequence of datetimes for each observation, which have to map onto one of the axes of the input data. For now, we follow the model used in the SubdailyPModel and SPLASH of insisting this is the first axis of the arrays.
  • Get the growing seasons from the datetimes and temperatures (see Implement a routine for calculating growing season from datetime and temperature series. #352).
  • Using the PModel, the datetimes and the growing season, we can then extract annual values for annual mean $c_a, \chi, D$.
  • We then need to take user provided precipitation to get annual precipitation.

So, I think the signature will look like this:

class MaximumFAPAR:

    def __init__(
        self, 
        pmodel: PModel, 
        datetimes: NPArray[np.datetime64],
        growing_season: NPArray[np.bool_],
        precipitation: NPArray[np.datetime64],
        aridity_index: float,
        additional_constants: ...
    ):
        ...

The requirement for precipitation and aridity index ties this quite closely to the SPLASH module. The demonstration implementation should probably use SPLASH to set up the scenario.

@zhouboya
Copy link
Collaborator

zhouboya commented Nov 8, 2024

For the parameter z, insteady of directly giving a global fitted value (12.227 mol m-2 year-1), Shirely suggested it is better to allow the users to fit this parameter values based on their own satellite/climate data.

z accounts for the costs of building and maintaining leaves and the total below-ground allocation required to support the nutrient demand of those leaves and should have a globally fitted value for simplicity. Considering the fitted z value could vary with inputted satellite data, thus, it is better to fit z value based on the satellite data users have.

Methods to fit z values are:

In equation 1, fAPARmax is calculated as the minimum values of energy-limited fAPAR (fAPARc, Eqn5) and water-limited fAPAR (fAPARw, Eqn 4).

(Eqn. 4) fAPARw= [𝑐𝑎 (1–𝜒)/1.6 𝐷] [𝑓0 𝑃/𝐴0]
(Eqn. 5) fAPARc= 1 – 𝑧/(𝑘𝐴0)

The minimum of energy-limited fAPAR (fAPARc, Eqn 5) and water-limited fAPAR (fAPARw, Eqn4) was smoothly approximated using the log-sum-exp function:

(Eqn.6) 𝑚𝑖𝑛{𝑓𝐴𝑃𝐴𝑅𝑐,𝑓𝐴𝑃𝐴𝑅𝑤}≈−1𝐾𝑙𝑜𝑔[exp(−𝐾.𝑓𝐴𝑃𝐴𝑅𝑐)+exp(−𝐾.𝑓𝐴𝑃𝐴𝑅𝑤)]

where K (≥ 1) is a constant. The larger the value for K, the closer the approximation is to minimum function (here we adopted K=10). With fAPARw and fAPARc defined by equations (4) and (5) respectively, parameter z can be fitted using non-linear regression (nls function in R), with user's own satellite data as inputs.

@zhouboya
Copy link
Collaborator

zhouboya commented Nov 8, 2024

For the inputted data (P) of Eqn 1:
P, which is the annual total precipitation (mol m–2 year–1).
One thing needs to be noticed is that, here the units of P is mol m-2 year-1.
But for most precipitation data, their given units are mm.
For users, they may often overlook this point or make mistakes easily.
So it is better to include a unit converter in the package to convert mm to mol, for the convenience of user.

@MarionBWeinzierl MarionBWeinzierl self-assigned this Dec 11, 2024
@MarionBWeinzierl MarionBWeinzierl moved this from Todo to In Progress in ICCS development board Dec 11, 2024
@MarionBWeinzierl MarionBWeinzierl added the enhancement New feature or request label Dec 11, 2024
@MarionBWeinzierl
Copy link
Collaborator

@davidorme , working on the implementation I am wondering: Why does fAPAR_max need to be a class, rather than just a function (same for LAI_max)? Is there a lot of stuff we need to handle later on which is useful to keep in a class?

@davidorme
Copy link
Collaborator Author

@MarionBWeinzierl My initial thoughts for the workflow was that this calculation would be more of a building block for other calculations. I'm less sure now, but I think this should also tell users which of the two terms (water or energy) limits $f_{APAR}$. That's probably easiest as a boolean array (e.g. water_limited). That's still only two return values though, so a function returning a tuple or a small dataclass both work.

@davidorme
Copy link
Collaborator Author

@MarionBWeinzierl

I've been thinking about the API for this some more. Big picture, this algorithm is making a prediction whether a site is water or energy limited in a given year and the outputs should give annual predictions of :

  • FAPAR_max,
  • LAI_max and,
  • whether or not the site is water or energy limited for each year in the inputs.

With three outputs, I think it might make sense to package the results in a simple dataclass. Users might want average behaviour over many years of inputs, but we should make annual predictions and then users can average over those if they want to (or provide an climatic_averages method on the dataclass).

I think there are two ways people will want to use this.

  1. A simple function that just takes arrays of the raw values for all of the inputs in the equation. This interface allows us to test things without needing to fit P Models but also allows users to explore the behaviour of the algorithm directly.

  2. A wrapper function that takes a P Model and other data and calculates the annual predictions across observations from that P Model.

    One feature of this particular step in the this meta issue is that I think users might completely reasonably want to do this using data at a wide range of temporal scales: people could do it from monthly or even seasonal data all the way down to a subdaily model. That differs from the LAI steps (Calculate tunable parameter $m$ defined as the ratio of steady state annual LAI to GPP. #349, #350m Calculate the realised LAI given a lagged process. #351), which have to be daily at a minimum because they are making daily predictions.

    The wrapper function (or class) has a signature like the MaximumFAPAR above and takes:

    • A PModel or SubdailyPModel and a set of timestamps for the observations. That is actually already built into the SubdailyPModel.datetimes attribute but we'd need it for PModel inputs. The observations in either case need to cover complete years because the outputs are annual predictions - I think it is reasonable to expect users to meet that condition rather than internally trimming down to complete years.

    • We do need to allow users to provide GPP separately, so that they can apply posthoc soil moisture penalties, so we need gpp = None in the signatures to override the default use of (Subdaily)PModel.gpp.

    • We need annual total precipitation data provided separately as it isn’t in the PModel data. I think it’s actually easiest to ask users to provide these quantities with a mapping to the included years (A dictionary? Honestly, a labelled xarray.DataArray would be good here but that’s opening up the wider can of worms about supporting xarray inputs directly Using xarray with pyrealm? #43).

    • A boolean array showing which observations are growing season. Now this gets awkward because this only really makes sense down to the daily temporal resolution - seasons, months, weeks, days can all be in or out of the growing season, but bits of a day not so much. So we need something that makes it easy to map subdaily model observations onto growing season. The obvious answer is for users to just provide a threshold temperature and then apply the calculation internally from PModel.env.tc but I dislike that because its a very fixed definition of growing season. I think this is something the growing_season module needs to handle. More thoughts to add here.

    • I think everything else can then be taken from the PModel data.

@Shirleycwj - can you comment on the thoughts above. Does that seem like an API you want for this and does the application across different data scales make sense? I basically think people might want this from monthly CRU models all the way down to subdaily WFDE models, and I don't see a fundamental reason to restrict it?

@Shirleycwj
Copy link
Collaborator

I agree with David's idea of providing the option of annual prediction of water- and energy-limited region, as you could observed the shift of energy-limited area to water-limited area through recent decade and that's something that people would be interested in. Also, I used multi-year average of inputs to estimate spatially and temporally constant value for z and f0 and consenquently for annual maximum fAPAR, but once we are able to relate z and f0 with environmental variables, it certainly makes more sense to predict fAPARmax and LAImax year by year in order to investigate temporal trends and prepare for future projections - I think this is what we aim for.

For combining the fAPARmax with various version of Pmodel, I think it's reasonable to add output 'potential GPP (A0)' as the product of LUE and ppfd. On an annual basis, the total potential GPP would be sum of A0 in a year defined by date.time, whether it's running on monthly or subdaily timesteps.

If the user provide their own GPP and benchmarking fAPAR they used to evaluate predicted fAPAR, then they could simply calculate A0 as GPP/fAPAR (both should be in the same timestep).

With SPLASH included in Pyrealm, I actually thought we already have precipitation input included? Of course, SPLASH require daily inputs of precipitation while in estimating fAPARmax we only need annual precipitation. So there are two ways to do this: (1) user provide precipitation data with time axis that could be summed up to annual totals; Climate datasets rarely provide annual precipitation directly; (2) if user adopted SPLASH to calculate soil moisture stress, then we could directly use daily precipitation to calculate annual totals and apply to the equation. But I guess that would depend on how SPLASH is incorporated into pyrealm.

It's a little bit more complicated when it comes to the growing season. I think we should open up the option of different methods to define growing season but use the temperature threshold as the simplest, default method. Using the temperature threshold only (above 0 degree), timestep up to monthly is actually acceptable due to temperature oscillation at the beginning and end of the growing season. Currently there are no absolute consensus on how growing season is defined (thermal or radiative or productive growing season, see Körner et al 2023 Ecology letters and Fu et al 2022 Global change biology ) or method to define growing season...so we could ask the user to provide DOY of start of the growning season (SOS) and end of growing season (EOS) to calculate growing season value for input variables. If the user works on phenology then they would have such information, otherwise they could use our simplified way to define productive growning season.

To conclude, I think it's reasonable to provide a universal method (or function) to apply across different version of P model with different timesteps. Thanks David for putting things together!

@davidorme
Copy link
Collaborator Author

davidorme commented Dec 19, 2024

Thanks @Shirleycwj! There's a lot in there! Some thoughts:

I think it's reasonable to add output 'potential GPP (A0)' as the product of LUE and ppfd.

Interesting - my assumption so far has been that users explicitly provide a model that is set up to calculate potential GPP, but we could have a PModel.potential_gppattribute that calculates this on demand as pmod.lue * pmod._ppfd.

@Shirleycwj - how does "potential GPP" work with kphio and soil moisture penalties: all of those reduce productivity below maximum potential. A "true" potential GPP could be what productivity can be achieved by a well-watered plant operating in these conditions if it intercepts all the light (fAPAR = 1). However most uses I've seen only pin fAPAR =1 and then also apply a soil moisture penalty or modify kphio to reflect some kind of water limitation.

I'm inclined to keep it that users provide a model setup to reflect what they think represents potential GPP, rather than trying to assert that it is one particular product? We should document this of course!

If the user provide their own GPP and benchmarking fAPAR they used to evaluate predicted fAPAR, then they could simply calculate A0 as GPP/fAPAR (both should be in the same timestep).

This option is intended purely to allow people to apply the Stocker or Mengoli corrections to GPP (or any similar posthoc corrections down the line). So we wouldn't need fAPAR here - the original model would be set up to calculate A0 (see previous paragraph) and then the penalised GPP is summed. We could instead use gpp_penalty_factor = None as a default and then apply the penalty internally if it is provided - that is probably a clearer indication of how it works.

With SPLASH included in Pyrealm, I actually thought we already have precipitation input included?

Yes - absolutely. If someone has run SPLASH already (to get AET and PET and hence AI) then that does contain precipitation data and we could use that SPLASH model as an input. However we do also need to support models where the user hasn't run SPLASH and is just using e.g. monthly precipitation from CRU and an AI from the literature.

It's a little bit more complicated when it comes to the growing season.

It really, really, really is. I think the only sensible way forward here is to get users to provide a boolean array showing which observations along the time access are "growing season" or not. We can provide tools to help with that - the tricky one is going from subdaily models. If we're going with temperature threshold is the reference mean daily temp? That's most consistent with what you'd do at coarser temporal scales, but you could also do mean daily temp during daylight hours or even during the acclimation window. Probably start with the easy one though 😄

I've added that bit of the comment to the relevant issue here: #352 (comment)

@davidorme
Copy link
Collaborator Author

davidorme commented Dec 20, 2024

I'm wondering if what we are after here is a class (or dataclass) that broadly looks like:

class FaparLimitation():

    def __init__(
        self, 
        annual_total_potential_gpp,
        annual_mean_ca,
        annual_mean_chi,
        annual_mean_vpd,
        annual_total_precip,
        aridity_index,
        z=12.227,
        k=0.5
    ):
        ...
   
    @classmethod
    def from_pmodel(
        cls,
        pmodel,
        growing_season,
        datetimes,
        precip, 
        aridity_index,
        z=12.227,
        k=0.5
    ):
        ...
        # do stuff to calculate inputs
        return cls('calculated inputs here')
        
    @classmethod
    def from_pmodel_and_splash(
        cls, 
        pmodel,
        growing_season,
        splash_model, 
        aridity_index,
        z=12.227,
        k=0.5
    ):
        ...
        # do stuff to calculate inputs
        return cls('calculated inputs here')

We could definitely merge those two class methods but it seems easier to keep them as separate interfaces and not have to do a bunch of logic on which combinations of arguments are not None. They'll probably have 90% the same code in shared private methods.

@Shirleycwj
Copy link
Collaborator

Thanks @davidorme for the clarification. For the definition of potential GPP, I think we generally associate with only one constraint, so that this constraint can be explicitly separated and quantified. For example, some literature would describe the potential GPP without N limitation so that they could focus on nutrient constraint. Here becuase seasonal leaf area dynamics is what we looking for, we explicitly defined potential GPP as carbon uptake not constrained by leaf area (fAPAR or LAI), and vegetation are still realistically affected by temperature or water availability. This is the foundation of how we apply energy- or water-limited scheme and how the energy- and water-limited region can be defined, if it's under well-watered condition, then we wouldn't have water-limited region.

I think the applying the boolean array to define growing season is a nice and clear one. For subdialy model, my personal experience would be daytime temperature is more accurate and realistic, although the improvement compared to daily temperautre is overall marginal (might be important at site level so we could think about the function of calculating daytime temperature if possible).

On the wider audience I agree that the two method class should be separated as people could experiment on different ways to calculate water limitation. But then they should add their own panel factor of water limitation between the step from_pmodel to FaparLimitation right?

@MarionBWeinzierl
Copy link
Collaborator

I started a "dirty" implementation (skipping the pre-commit hooks to share it with you) on a branch here. This is very much work in progress and lots is tbd at the moment, but let me know if I am getting something completely wrong (e.g., guessed the data type of a parameter wrong).

I am now looking through @davidorme 's LAI_in_pyrealm draft code to see what I can use from there, and to understand how I get the missing parameters from Splash and P models.

Would it make sense to separate out the "data wrangling" code into it's own file/class? And/or will each data set look completely different and need separate handling outside of PyRealm?

@davidorme
Copy link
Collaborator Author

@MarionBWeinzierl I think that sketch looks great.

I think the data wrangling code has to be something users come up with. We'll provide a demo using the example data to show some example boilerplate. I think most people will set it up using the PModel interface, and we already have a fair bit of code showing how to do that.

The main complexity in this is juggling the time resolutions of the data and setting the growing season days. That's kinda hard to pin down, but I think we can get this code up and running and then refine the API for how people select what the "growing season conditions" actually mean.

@MarionBWeinzierl MarionBWeinzierl moved this from In Progress to Review in ICCS development board Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Review
Development

Successfully merging a pull request may close this issue.

4 participants