diff --git a/README.md b/README.md index a562a4a..7ffc9b1 100644 --- a/README.md +++ b/README.md @@ -1,55 +1,53 @@ ## pySODM *Simulating and Optimising Dynamical Models in Python 3* +Resources: [documentation](https://twallema.github.io/pySODM), [peer-reviewed paper](https://www.sciencedirect.com/science/article/pii/S1877750323002089), [pyPI](https://pypi.org/project/pySODM/) + ![build](https://github.com/twallema/pySODM/actions/workflows/tests.yml/badge.svg) ![docs](https://github.com/twallema/pySODM/actions/workflows/deploy.yml/badge.svg) [![HitCount](https://hits.dwyl.com/twallema/pySODM.svg)](https://hits.dwyl.com/twallema/pySODM) ### Quick installation ``` pip install pySODM ``` -### Resources - -Documentation: https://twallema.github.io/pySODM - -Manuscript: https://www.sciencedirect.com/science/article/pii/S1877750323002089 - -pyPI: https://pypi.org/project/pySODM/ ### Aim & Scope -A modeling and simulation workflow will typically constitute the following steps (see [Villaverde et. al](https://doi.org/10.1093/bib/bbab387)), -1. Build a system dynamical model to describe some real-world phenomenon. Assess structural identifiability. -2. Calibrate the model to a set of experimental data. -3. Extract knowledge from the calibrated parameter values (assess practical identifiability). -4. Use the model to make projections outside the calibrated range. +All the simulation projects I've undertaken over the past six years required me to do most of the following, +1. Integrate a system dynamical model +2. Its states may be represented by n-dimensional numpy arrays, labeled using coordinates +3. Its parameters may have time-dependencies +4. Its parameters may have to be sampled from distributions +5. It may have to be calibrate to a dataset(s) by defining and optimising a cost function -The aim of pySODM is to reduce the time it takes to step through this workflow. pySODM provides a *template* to construct, simulate and calibrate dynamical systems governed by differential equations. Models can have n-dimensional labeled states of different sizes and can be simulated deterministically and stochastically. Model parameters can be time-dependent by means of complex functions -with arbitrary inputs. The labeled n-dimensional model states can be aligned with n-dimensional -data to compute the posterior probability function, which can subsequently be optimised. +all these features required me to wrap code around an ODE solver, typically `scipy.solve_ivp`, and I got tired of recycling the same code over and over again, so I packaged it into pySODM. + +Does other simulation software exist in Python? Sure, but most of them hold your hand by having you define symbolic transitions, which places a limit on the attainable complexity of a model, making it unfit for academic research. I wanted a piece a software that nicely does all the nasty bookkeeping like keeping track of state sizes, time dependencies on parameters, aligning simulations with datasets etc. and does so in a **generically applicable** way so that I'd never hit a software wall mid-project. ### Overview of features | Workflow | Features | |------------------------------|---------------------------------------------------------------------------------------------------------------------------------| -| Construct a dynamical model | Implement coupled systems of differential equations | -| | States can be n-dimensional and of different sizes, allowing users to build models with subprocesses | -| | Allows n-dimensional model states to be labelled with coordinates and dimensions by using `xarray.Dataset}` to store simulation output | -| | Easy indexing, manipulating, saving, and piping to third-party software of model output by formatting simulation output as `xarray.Dataset` | -| Simulating the model | Continuous and discrete deterministic simulation or discrete stochastic simulation (Jump processes, Gillespie's Stochastic Simulation Algorithm or Tau-Leaping) | -| | Vary model parameters during the simulation generically using a complex function | -| | Use *draw functions* to perform repeated simulations for sensitivity analysis. With multiprocessing support | +| Construct a dynamical model | Implement systems of coupled differential equations | +| | Labeled n-dimensional model states, states can have different sizes | +| | Leverages `xarray.Dataset` to store labeled n-dimensional simulation output | +| Simulating the model | Deterministic (ODE) or stochastic simulation (Jump process) | +| | *Time-dependent model parameter functions* to vary parameters during the course of a simulation | +| | *Draw functions* to vary model parameters during consecutive simulations. | | Calibrate the model | Construct and maximize a posterior probability function | -| | Automatic alignment of data and model forecast over timesteps and coordinates | -| | Nelder-Mead Simplex and Particle Swarm Optimization for point estimation of model parameters | -| | Pipeline to and backend for `emcee.EnsembleSampler` to perform Bayesian inference of model parameters | +| | Automatically aligns data and model forecast | +| | Nelder-Mead Simplex and Particle Swarm Optimization | +| | Bayesian inference with `emcee.EnsembleSampler` | + ### Getting started -The [quistart tutorial](quickstart.md) will teach you the basics of building and simulating models with n-dimensional labeled states in pySODM. It will demonstrate how to vary model parameters over the course of a simulation and how to perform repeated simulations with sampling of model parameters. +- Detailed [installation instructions](installation.md). + +- The [quistart tutorial](quickstart.md) teaches the basics of building and simulating models with n-dimensional labeled states in pySODM. It demonstrates the use of *time-dependent parameter functions* (TDPFs) to vary model parameters over the course of a simulation and *draw functions* to vary model parameters during consecutive simulations. -The [workflow](worfklow.md) tutorial provides a step-by-step introduction to a modeling and simulation workflow by inferring the distributions of the model parameters of a simple compartmental disease model using a synthetic dataset. +- The [workflow](worfklow.md) tutorial provides a step-by-step introduction to building a mathematical model and calibrating its parameters to a dataset. An SIR disease model is built and the basic reproduction number during an outbreak is determined by calibrating the model to the outbreak data. -The [enzyme kinetics](enzyme_kinetics.md) and [influenza 17-18](influenza_1718.md) case studies apply the modeling and simulation workflow to more advanced, real-world problems. In the enzyme kinetics case study, a 1D packed-bed reactor model is implemented in pySODM by reducing the two PDEs to a set of coupled ODEs by using the method-of-lines. In the Influenza 17-18 case study, a stochastic, age-structured model for influenza is developped and calibrated to the Influenza incidence data reported by the Belgian Federal Institute of Public Health. These case studies mainly serve to demonstrate pySODM's capabilities across scientific disciplines and highlight the arbitrarily complex nature of the models that can be built with pySODM. For an academic description of pySODM and on the Enzyme Kinetics and Influenza 17-18 case studies, checkout our [manuscript](https://arxiv.org/abs/2301.10664). +- The [enzyme kinetics](enzyme_kinetics.md) and [influenza 17-18](influenza_1718.md) case studies apply the [workflow](workflow.md) to more advanced, real-world problems. In the enzyme kinetics case study, a 1D packed-bed reactor model is implemented in pySODM by reducing the PDEs to a set of coupled ODEs by using the method-of-lines. In the Influenza 17-18 case study, a stochastic, age-structured model for influenza is developped and calibrated to the Influenza incidence data reported by the Belgian Federal Institute of Public Health. These case studies mainly serve to demonstrate pySODM's capabilities across scientific disciplines and highlight the arbitrarily complex nature of the models that can be built with pySODM. For an academic exposee of pySODM, the Enzyme Kinetics and Influenza 17-18 case studies, checkout our [peer-reviewed paper](https://www.sciencedirect.com/science/article/pii/S1877750323002089). ### Versions diff --git a/docs/_static/figs/flowchart.png b/docs/_static/figs/flowchart.png new file mode 100644 index 0000000..4dd7da6 Binary files /dev/null and b/docs/_static/figs/flowchart.png differ diff --git a/docs/_static/figs/flowchart.svg b/docs/_static/figs/flowchart.svg new file mode 100644 index 0000000..3f5cff9 --- /dev/null +++ b/docs/_static/figs/flowchart.svg @@ -0,0 +1,945 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + CALIBRATION + CONSTRUCTION + pySODM + + SIMULATION + + + + + + + xarray.DataArray/xarray.Dataset + pd.Series/pd.DataFrame + datasets + + + scipy.integrate.solve_ivp() + + + pySODM.optimization.pso + optimization methods + + Gillespie's SSA/Tau-Leaping + + posterior probability + + pySODM.optimization +.objective_functions + + pySODM.models.base.ODEModel + pySODM.models.base.JumpProcess + pySODM.optimization.nelder_mead + + + prior probability + pySODM.optimization. +objective_functions + + + likelihood + pySODM.optimization. +objective_functions + + + Control Systems library + etc. + + + + + + + + + simulation output + xarray.Dataset + + dynamical system + solution methods + + emcee.EnsembleSampler + draw functions + time-dependent model parameters + third party software and applications + SAlib: sensitivity analysis + + parameter distributions + + + + + + + + + + + diff --git a/docs/introduction.md b/docs/introduction.md index 69bda48..01b92ee 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -2,53 +2,58 @@ *Simulating and Optimising Dynamical Models in Python 3* +Resources: [documentation](https://twallema.github.io/pySODM), [peer-reviewed paper](https://www.sciencedirect.com/science/article/pii/S1877750323002089), [pyPI](https://pypi.org/project/pySODM/) + ![build](https://github.com/twallema/pySODM/actions/workflows/tests.yml/badge.svg) ![docs](https://github.com/twallema/pySODM/actions/workflows/deploy.yml/badge.svg) -### Quick installation +### Quick installation + ``` pip install pySODM ``` -### Resources +### Aim & scope -1. [Documentation website](https://twallema.github.io/pySODM) +All the simulation projects I've undertaken over the past six years required me to do most of the following, +1. Integrate a system dynamical model +2. Its states may be represented by n-dimensional numpy arrays, labeled using coordinates +3. Its parameters may have time-dependencies +4. Its parameters may have to be sampled from distributions +5. It may have to be calibrate to a dataset(s) by defining and optimising a cost function -2. [Manuscript](https://www.sciencedirect.com/science/article/pii/S1877750323002089) +all these features required me to wrap code around an ODE solver, typically `scipy.solve_ivp`, and I got tired of recycling the same code over and over again, so I packaged it into pySODM. -3. [pyPI](https://pypi.org/project/pySODM/) +Does other simulation software exist in Python? Sure, but most of them hold your hand by having you define symbolic transitions, which places a limit on the attainable complexity of a model, making it unfit for academic research. I wanted a piece a software that nicely does all the nasty bookkeeping like keeping track of state sizes, time dependencies on parameters, aligning simulations with datasets etc. and does so in a **generically applicable** way. -### Aim & Scope +### Software structure -A modeling and simulation workflow will typically constitute the following steps (see [Villaverde et. al](https://doi.org/10.1093/bib/bbab387)), -1. Build a system dynamical model to describe some real-world phenomenon. Assess structural identifiability. -2. Calibrate the model to a set of experimental data. -3. Extract knowledge from the calibrated parameter values. Perform additional sensitivity analyses to finetune the model structure. -4. Use the model to make projections outside the calibrated range. +To achieve its goal, pySODM bundles a set of low-level interfaces to integrate sets of ODEs (scipy.integrate), simulate stochastic jump processes (Gillespie methods), store n-dimensional temporal data (`xarray.Dataset`), perform frequentist optimizations of model parameters using Particle Swarm Optimization or the Nelder–Mead Simplex algorithm, and, perform Bayesian inference of model parameters (emcee.EnsembleSampler), all of which were already available in Python 3. pySODM adds a generically applicable template to implement time-dependencies on model parameters and performing consecutive simulations with parameter sampling. -The aim of pySODM is to reduce the time it takes to step through this workflow. pySODM provides a *template* to construct, simulate and calibrate dynamical systems governed by differential equations. Models can have n-dimensional labeled states of different sizes and can be simulated deterministically and stochastically. Model parameters can be time-dependent by means of complex functions -with arbitrary inputs. The labeled n-dimensional model states can be aligned with n-dimensional -data to compute the posterior probability function, which can subsequently be optimised. +Flowchart of pySODM -### Overview of features +**Figure** Solid boxes depict third-party implementations incorporated in pySODM, while the dashed boxes depict implementations provided by pySODM. + +### Features | Workflow | Features | |------------------------------|---------------------------------------------------------------------------------------------------------------------------------| -| Construct a dynamical model | Implement coupled systems of differential equations | -| | States can be n-dimensional and of different sizes, allowing users to build models with subprocesses | -| | Allows n-dimensional model states to be labelled with coordinates and dimensions by using `xarray.Dataset}` to store simulation output | -| | Easy indexing, manipulating, saving, and piping to third-party software of model output by formatting simulation output as `xarray.Dataset` | -| Simulating the model | Continuous and discrete deterministic simulation or discrete stochastic simulation (Jump processes, Gillespie's Stochastic Simulation Algorithm or Tau-Leaping) | -| | Vary model parameters during the simulation generically using a complex function | -| | Use *draw functions* to perform repeated simulations for sensitivity analysis. With multiprocessing support | +| Construct a dynamical model | Implement systems of coupled differential equations | +| | Labeled n-dimensional model states, states can have different sizes | +| | Leverages `xarray.Dataset` to store labeled n-dimensional simulation output | +| Simulating the model | Deterministic (ODE) or stochastic simulation (Jump process) | +| | *Time-dependent model parameter functions* to vary parameters during the course of a simulation | +| | *Draw functions* to vary model parameters during consecutive simulations. | | Calibrate the model | Construct and maximize a posterior probability function | -| | Automatic alignment of data and model forecast over timesteps and coordinates | -| | Nelder-Mead Simplex and Particle Swarm Optimization for point estimation of model parameters | -| | Pipeline to and backend for `emcee.EnsembleSampler` to perform Bayesian inference of model parameters | +| | Automatically aligns data and model forecast | +| | Nelder-Mead Simplex and Particle Swarm Optimization | +| | Bayesian inference with `emcee.EnsembleSampler` | ### Getting started -The [quistart tutorial](quickstart.md) will teach you the basics of building and simulating models with n-dimensional labeled states in pySODM. It will demonstrate how to vary model parameters over the course of a simulation and how to perform repeated simulations with sampling of model parameters. +- Detailed [installation instructions](installation.md). + +- The [quistart tutorial](quickstart.md) teaches the basics of building and simulating models with n-dimensional labeled states in pySODM. It demonstrates the use of *time-dependent parameter functions* (TDPFs) to vary model parameters over the course of a simulation and *draw functions* to vary model parameters during consecutive simulations. -The [workflow](worfklow.md) tutorial provides a step-by-step introduction to a modeling and simulation workflow by inferring the distributions of the model parameters of a simple compartmental disease model using a synthetic dataset. +- The [workflow](worfklow.md) tutorial provides a step-by-step introduction to building a mathematical model and calibrating its parameters to a dataset. An SIR disease model is built and the basic reproduction number during an outbreak is determined by calibrating the model to the outbreak data. -The [enzyme kinetics](enzyme_kinetics.md) and [influenza 17-18](influenza_1718.md) case studies apply the modeling and simulation workflow to more advanced, real-world problems. In the enzyme kinetics case study, a 1D packed-bed reactor model is implemented in pySODM by reducing the PDEs to a set of coupled ODEs by using the method-of-lines. In the Influenza 17-18 case study, a stochastic, age-structured model for influenza is developped and calibrated to the Influenza incidence data reported by the Belgian Federal Institute of Public Health. These case studies mainly serve to demonstrate pySODM's capabilities across scientific disciplines and highlight the arbitrarily complex nature of the models that can be built with pySODM. For an academic description of pySODM and on the Enzyme Kinetics and Influenza 17-18 case studies, checkout our [manuscript](https://arxiv.org/abs/2301.10664). +- The [enzyme kinetics](enzyme_kinetics.md) and [influenza 17-18](influenza_1718.md) case studies apply the [workflow](workflow.md) to more advanced, real-world problems. In the enzyme kinetics case study, a 1D packed-bed reactor model is implemented in pySODM by reducing the PDEs to a set of coupled ODEs by using the method-of-lines. In the Influenza 17-18 case study, a stochastic, age-structured model for influenza is developped and calibrated to the Influenza incidence data reported by the Belgian Federal Institute of Public Health. These case studies mainly serve to demonstrate pySODM's capabilities across scientific disciplines and highlight the arbitrarily complex nature of the models that can be built with pySODM. For an academic exposee of pySODM, the Enzyme Kinetics and Influenza 17-18 case studies, checkout our [peer-reviewed paper](https://www.sciencedirect.com/science/article/pii/S1877750323002089). diff --git a/docs/quickstart.md b/docs/quickstart.md index c19edff..e82e5a5 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -2,7 +2,7 @@ ## Set up a dimensionless ODE model -Load pySODM's [`ODE` class](models.md) and define your model. As an example, we'll set up a simple Susceptible-Infectious-Removed (SIR) disease model, schematically represented as follows, +To set up a simple Susceptible-Infectious-Removed (SIR) disease model, schematically represented as follows, @@ -16,6 +16,8 @@ N &=& S + I + R, \\ \end{eqnarray} ``` +Load pySODM's [`ODE`](models.md) class, and define a new class inheriting the `ODE` object to define your model. Minimally, you must provide: 1) A list containing the names of your model's states, 2) a list containing the names of your model's parameters, 3) an `integrate` function integrating your model's states. To learn more about the `ODE` class' formatting, check out the [`ODE`](models.md) class description. + ```python # Import the ODE class from pySODM.models.base import ODE @@ -45,7 +47,7 @@ To initialize the model, provide a dictionary containing the initial values of t model = SIR(states={'S': 1000, 'I': 1}, parameters={'beta': 0.35, 'gamma': 5}) ``` -Simulate the model using the `sim()` method. The solver method and tolerance of `scipy.solve_ivp()`, as well as the timesteps included in the output can be tailored. pySODM supports the use of `datetime` types as timesteps. +Simulate the model using its `sim()` method. pySODM supports the use of `datetime` types as timesteps. ```python # Timesteps @@ -63,12 +65,9 @@ In some situations, the use of a discrete timestep with a fixed length may be pr ```python # Use a discrete timestepper with step size 1 out = model.sim(121, tau=1) - -# Is equivalent to (`tau` overwrites `method`!): -out = model.sim(121, method='RK45', rtol='1e-4', tau=1) ``` -Results are sent to an `xarray.Dataset`, for more information on indexing and selecting data using `xarray`, [see](https://docs.xarray.dev/en/stable/user-guide/indexing.html). +Simulations are sent to an `xarray.Dataset`, for more information on indexing and selecting data using `xarray`, [see](https://docs.xarray.dev/en/stable/user-guide/indexing.html). ```bash Dimensions: (time: 122) @@ -81,11 +80,12 @@ Data variables: ``` The simulation above results in the following trajectories for the number of susceptibles, infected and recovered individuals. + ![SIR_time](/_static/figs/quickstart/quickstart_SIR.png) ## Set up an ODE model with a labeled dimension -To transform all our SIR model's states in 1D vectors, referred to as a *dimension* throughout the documentation, add the `dimensions` keyword to the model declaration. Here, we add a dimension representing the age groups individuals belong to. +To transform all our SIR model's states in 1D vectors, referred to as a *dimension* throughout the documentation, add the `dimensions` keyword to the model declaration. In the example below, we add a dimension representing the age groups individuals belong to. This turns all model states in the `integrate` function into 1D vectors. ```python # Import the ODE class @@ -95,7 +95,7 @@ from pySODM.models.base import ODE class stratified_SIR(ODE): states = ['S','I','R'] - parameters = ['beta', 'gamma] + parameters = ['beta', 'gamma'] dimensions = ['age_groups'] @staticmethod @@ -111,7 +111,7 @@ class stratified_SIR(ODE): return dS, dI, dR ``` -When initializing the model, additionally provide a dictionary containing coordinates for every dimension declared previously. In our example, we'll declare four age groups: 0-5, 5-15, 15-65 and 65-120 year olds. **All** model states are now 1D vectors of shape `(4,)`. +When initializing the model, provide a dictionary containing coordinates for every dimension declared previously. In the example below, we'll declare four age groups: 0-5, 5-15, 15-65 and 65-120 year olds. **All** model states are now 1D vectors of shape `(4,)`. ```python model = stratified_SIR(states={'S': 1000*np.ones(4), 'I': np.ones(4)}, @@ -139,6 +139,8 @@ pySODM allows model states to have different coordinates and thus different size +In the example below, the states S, I and R will be 1D vectors in the `integrate` function, while the states S_v and I_v will be scalars. + ```python class ODE_SIR_SI(ODE): """ @@ -182,7 +184,7 @@ out = model.sim(120) print(out) ``` -results in the following `xarray.Dataset`. Here it can be seen that the S, I and R state have `age_group` and `time` as dimensions and their shape is thus `(4,)`, while S_v and I_v only have `time` as dimension and their shape is thus `(1,)`. +results in the following `xarray.Dataset`, ```bash @@ -197,10 +199,11 @@ Data variables: S_v (time) float64 1e+06 1e+06 1e+06 ... 8.635e+05 8.562e+05 I_v (time) float64 2.0 1.798 1.725 ... 1.292e+05 1.365e+05 1.438e+05 ``` +Here, S, I and R state have `age_group` and `time` as dimensions and their shape is thus `(4,)`, while S_v and I_v only have `time` as dimension and their shape is thus `(1,)`. ## Set up a dimensionless stochastic jump process Model -To stochastically simulate the simple SIR model, the `JumpProcess` class is loaded and two functions `compute_rates` and `apply_transitionings` must be defined. For a detailed mathematical description of implenting models using Gillespie's tau-leaping method (example of a jump proces), we refer to the [manuscript](https://arxiv.org/abs/2301.10664). The rates dictionary defined in `compute_rates` contains the rates of the possible transitionings in the system. These are contained in a list because a state may have multiple possible transitionings. Further, the transitioning for state `I` is a float and this should be a `numpy.float`. I'm still figuring out if I want to introduce the overhead to check and correct this behavior (likely not). +To stochastically simulate the simple SIR model, pySODM's `JumpProcess` class is loaded and two functions, `compute_rates` and `apply_transitionings`, must be defined instead of one. For a detailed mathematical description of implenting models using Gillespie's tau-leaping method (example of a jump proces), we refer to the [peer-reviewed paper](https://www.sciencedirect.com/science/article/pii/S1877750323002089). The rates dictionary defined in `compute_rates` contains the rates of the possible transitionings in the system. These are contained within a list so that a state may have multiple transitionings. To learn more about the `JumpProcess` class' formatting, check out the [`JumpProcess`](models.md) class description. ```python # Import the ODE class @@ -238,7 +241,7 @@ class SIR(JumpProcess): ### Draw functions -The simulation functions of the `ODE` and `JumpProcess` classes (`sim()`) can be used to perform {math}`N` repeated simulations by using the optional argument `N`. A *draw function* can be used to instruct the model on how to alter the value of a model parameters during consecutive model runs, thereby offering a powerful tool for sensitivity analysis and uncertainty propagation. +The simulation functions of the `ODE` and `JumpProcess` classes can be used to perform {math}`N` repeated simulations by using the optional argument `N`. A *draw function* can be used to instruct the model on how to alter the value of a model parameters during consecutive model runs, thereby offering a powerful tool for sensitivity analysis and uncertainty propagation. Draw functions **always** take the dictionary of model parameters, `parameters` as their first argument, input checks are used to ensure you provide the correct name and type. This can be followed an arbitrary number of user-defined parameters, which are supplied to the `sim()` function by using its `draw_function_kwargs` argument. The output of a draw function is **always** the dictionary of model parameters, without alterations to the dictionaries keys (no new parameters introduced or parameters deleted). In the example below, we attempt to draw a model parameter `gamma` randomly from 0 to 5, @@ -293,7 +296,7 @@ Parameters can also be varied through the course of one simulation using a *time ```python def vary_my_parameter(t, states, param, an_additional_parameter): - """A function to vary a model parameter during a simulation""" + """A function to implement a time-dependency on a parameter""" # Do any computation param = ... @@ -305,6 +308,6 @@ When initialising the model, all we need to do is use the `time_dependent_parame ```python model = SIR(states={'S': 1000, 'I': 1}, - parameters={'beta': 0.35, 'gamma': 5, 'an_additional_parameter': anything_you_want}, + parameters={'beta': 0.35, 'gamma': 5, 'an_additional_parameter': any_datatype_you_want}, time_dependent_parameters={'beta': vary_my_parameter}) ``` \ No newline at end of file diff --git a/src/pySODM/models/base.py b/src/pySODM/models/base.py index 9fc8f07..48bdade 100644 --- a/src/pySODM/models/base.py +++ b/src/pySODM/models/base.py @@ -19,7 +19,7 @@ class JumpProcess: """ - Initialise a stochastic differential equations model + Initialise a jump process solved using Gillespie's SSA or Tau-Leaping Parameters ----------