From 0b91cc20da240b06ea7ff24a11d9fecc116b124a Mon Sep 17 00:00:00 2001 From: jamesaazam Date: Thu, 21 Mar 2024 19:17:57 +0000 Subject: [PATCH 01/20] Add introduction and placeholder for figure --- vignettes/epichains.Rmd | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 8c1a716b..58d6f88f 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -29,6 +29,22 @@ knitr::opts_chunk$set( ) ``` +Epidemics spread through when infected individuals transmit the +infection to others. If we assume that each case infects a random number of +other individuals (the offspring) through time units called generations, +and governed by some reproduction rate, then the epidemic can be modelled +as a branching process. + +Branching processes are a stochastic process where each individual in a +generation gives rise to a random number of individuals in the next generation. +The distribution of the number of offspring is called the offspring +distribution. The size of the transmission chain is the total number of +individuals infected by a single case, and the length of the transmission +chain is the number of generations from the first case to the last case before +the epidemic died out. + +
+ _epichains_ provides methods to analyse and simulate the size and length of branching processes with an arbitrary offspring distribution. These can be used, for example, to analyse the distribution of chain sizes @@ -39,6 +55,28 @@ or length of infectious disease outbreaks, as discussed in @farrington2003 and library("epichains") ``` +:: {.alert .alert-primary} +## Use case {-} +Suppose we observe a series of small outbreak clusters, each arising from a +separate external introduction (e.g. spillover event or importation into the +region). What are the likely transmission properties of the infection that +generated these clusters? +::: + +::: {.alert .alert-secondary} +### What we have {-} + +1. Data on the sizes or lengths of the observed transmission chains, +2. An offspring distribution with known parameters, + +### What we assume {-} + +1. The epidemic was perfectly observed. +1. The exact time of infection is known for all the cases. Hence, there is +no reporting delay. +1. Population is homogeneous and well-mixed. +::: + ## Chain likelihoods ### [`likelihood()`](https://epiverse-trace.github.io/epichains/reference/likelihood.html) From 3e45f94af571dbfe68fa181d25201a60b8787e69 Mon Sep 17 00:00:00 2001 From: jamesaazam Date: Fri, 22 Mar 2024 13:00:37 +0000 Subject: [PATCH 02/20] Revise Getting Started vignette --- vignettes/epichains.Rmd | 221 ++++++++++++------- vignettes/img/transmission_chain_example.png | Bin 0 -> 16192 bytes 2 files changed, 143 insertions(+), 78 deletions(-) create mode 100644 vignettes/img/transmission_chain_example.png diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 58d6f88f..415d37f3 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -29,21 +29,37 @@ knitr::opts_chunk$set( ) ``` +This vignette demonstrates how get started with using the _epichains_ package +for simulating transmission chains or estimating the likelihood of observing +a transmission chain. + +::: {.alert .alert-info} +* To understand the theoretical background of the branching processes methods +used here, refer to the [theory vignette](https://epiverse-trace.github.io/epichains/articles/theoretical_background.html) +* To understand the software development design decisions and implementation details of the package, +refer to the [design vignette](https://epiverse-trace.github.io/epichains/articles/design-principles.html) +::: + Epidemics spread through when infected individuals transmit the infection to others. If we assume that each case infects a random number of other individuals (the offspring) through time units called generations, -and governed by some reproduction rate, then the epidemic can be modelled -as a branching process. +and that this transmission process is governed by a reproduction number and +a probability distribution describing how the offspring are produced through +the time unit, then the epidemic can be modelled as a branching process. Branching processes are a stochastic process where each individual in a generation gives rise to a random number of individuals in the next generation. The distribution of the number of offspring is called the offspring -distribution. The size of the transmission chain is the total number of +distribution. + +The size of the transmission chain is the total number of individuals infected by a single case, and the length of the transmission -chain is the number of generations from the first case to the last case before -the epidemic died out. +chain is the number of generations from the first case to the last case they +produced before the chain ended (See figure below). -
+```{r transmission_chain_fig, out.width = "75%", echo = FALSE, fig.cap = "An example of a transmission chain starting is a single case C1. Cases are represented by blue circles and arrows indicate who infected whom. The chain grows through generations G1, G2, and G3, producing cases C2, C3, C4, C5, C6, and C7. The chain ends at generation G3 with case C7. The size of C1's chain is 7 (sum of all blue circles) and the length is 3 (maximum number of generations reached by C1's chain).", fig.alt = "The figure shows an example of a transmission chain starting with a single case C1. The figure is depicted as blue circles linked by orange directed arrows showing the cases they produced. Each generation is marked by a brown rectangle and labelled Gen 1, Gen 2, and Gen 2. The chain grows through generations G1, G2, and G3. Case C1 is Gen 1 and produces cases C2, and C3 in generation 2. In Gen 3, C2 produces C4 and C5, and C3 produces C7. The chain ends in Gen 3. The chain size of C1 is 7 (that is the sum of all the blue circles, representing cases) and the length is 3 (maximum number of generations reached by C1's chain)."} +knitr::include_graphics(file.path("img", "transmission_chain_example.png")) +``` _epichains_ provides methods to analyse and simulate the size and length of branching processes with an arbitrary offspring distribution. These @@ -55,58 +71,70 @@ or length of infectious disease outbreaks, as discussed in @farrington2003 and library("epichains") ``` +## Transmission chains likelihoods + :: {.alert .alert-primary} -## Use case {-} -Suppose we observe a series of small outbreak clusters, each arising from a -separate external introduction (e.g. spillover event or importation into the -region). What are the likely transmission properties of the infection that +### Use case {-} +Suppose we have observed a number of outbreak clusters, each arising from a +separate external case. What are the likely transmission properties +(reproduction number and/or superspreading coefficient) of the infection that generated these clusters? + +The first step in answering this question is to calculate the likelihood of +observing the observed chain summaries given the transmission properties. This +is where the `likelihood()` function comes in. The returned estimate can then +be used to infer the transmission properties using estimation frameworks such +as maximum likelihood or Bayesian inference. _epichains_ does not provide +these estimation frameworks. ::: ::: {.alert .alert-secondary} -### What we have {-} +#### What we have {-} -1. Data on the sizes or lengths of the observed transmission chains, -2. An offspring distribution with known parameters, +1. Data on observed transmission chains summaries (sizes or lengths). -### What we assume {-} +#### What we assume {-} -1. The epidemic was perfectly observed. -1. The exact time of infection is known for all the cases. Hence, there is -no reporting delay. -1. Population is homogeneous and well-mixed. +1. An offspring distribution that governs the number of secondary cases +produced by each case. +2. A reproduction number that governs the average number of secondary cases +produced by each case. +3. An observation process/probability that generates the observed chain +summaries. +4. A threshold of chain summaries that are considered too large for the given +transmission properties. For example, we do not expect each case to produce +more than $N$ cases or last more than $T$ generations. ::: -## Chain likelihoods - ### [`likelihood()`](https://epiverse-trace.github.io/epichains/reference/likelihood.html) -This function calculates the likelihood/loglikelihood of observing a vector of outbreak summaries obtained from transmission chains. "Outbreak summaries" here refer to transmission chain sizes or lengths/durations. +This function calculates the likelihood/loglikelihood of observing a vector of +outbreak summaries obtained from transmission chains. "Outbreak summaries" here +refer to transmission chain sizes or lengths/durations. `likelihood()` requires a vector of chain summaries (sizes or lengths), `chains`, the corresponding statistic to calculate, `statistic`, the offspring -distribution, `offspring_dist` and its associated parameters. `offspring_dist` -is specified as the function that is used to generate random numbers, i.e. -`rpois` for Poisson, `rnbinom` for negative binomial, or a custom function. If no -the closed-form solution is available then simulated replicates are used to -estimate the likelihoods. In that case `likelihood()` also requires -`nsim_offspring`, which is the number of simulations to run . This argument will -be explained further in the next section. +distribution, `offspring_dist` and its associated parameters. + +`offspring_dist` is specified as the function that is used to generate random +numbers, i.e. `rpois` for Poisson, `rnbinom` for negative binomial, or a custom +function. By default, the result is a log-likelihood but if `log = FALSE`, then -likelihoods are returned. +likelihoods are returned (See `?likelihood` for more details). + +::: {.alert .alert-info} +To understand how `likelihood()` works see the section on [How `likelihood()` works](#how-likelihood-works). +::: Let's look at the following example where we estimate the log-likelihood of observing `chain_sizes`. -```{r chain_sizes_setup} +```{r estimate_likelihoods} set.seed(121) # example of observed chain sizes # randomly generate 20 chains of size between 1 to 10 chain_sizes <- sample(1:10, 20, replace = TRUE) chain_sizes -``` - -```{r estimate_likelihoods} # estimate loglikelihood of the observed chain sizes likelihood_eg <- likelihood( chains = chain_sizes, @@ -124,15 +152,13 @@ likelihood_eg `likelihood()`, by default, returns the joint log-likelihood. If instead, the individual log-likelihoods are required, then the `individual` argument must be set to `TRUE`. To return likelihoods instead, set `log = FALSE`. -```{r chains_setup2} +```{r estimate_likelihoods2} set.seed(121) # example of observed chain sizes # randomly generate 20 chains of size between 1 to 10 chain_sizes <- sample(1:10, 20, replace = TRUE) chain_sizes -``` -```{r estimate_likelihoods2} # estimate loglikelihood of the observed chain sizes likelihood_ind_eg <- likelihood( chains = chain_sizes, @@ -146,42 +172,6 @@ likelihood_ind_eg <- likelihood( likelihood_ind_eg ``` -### How `likelihood()` works - -*epichains* ships with functions for the analytical solutions of some -transmission chain "size" and "length" distributions. For the size -distributions, we provide the poisson, negative binomial, and gamma-borel -mixture. For the length distribution, we provide the poisson and geometric -distributions. These can be used with `likelihood()` based on what is specified -for `offspring_dist` and `statistic`. - -If an analytical solution does not exist, we provide `offspring_ll()`, which -uses simulations to approximate the probability distributions -([using a linear approximation to the cumulative -distribution](https://en.wikipedia.org/wiki/Empirical_distribution_function) -for unobserved sizes/lengths). In this case, an extra argument `nsim_offspring` -must be passed to `likelihood()` to specify the number of simulations to be -used for this approximation. - -For example, let's look at an example where `chain_sizes` is observed and we -want to calculate the likelihood of this being drawn from a binomial -distribution with probability `prob = 0.9`. -```{r likelihoods_by_simulation} -set.seed(121) -# example of observed chain sizes; randomly generate 20 chains of size 1 to 10 -chain_sizes <- sample(1:10, 20, replace = TRUE) -# get their likelihood -liks <- likelihood( - chains = chain_sizes, - offspring_dist = rbinom, - statistic = "size", - size = 1, - prob = 0.9, - nsim_offspring = 250 -) -liks -``` - ### Observation probability `likelihood()` uses the argument `obs_prob` to model the observation @@ -218,12 +208,80 @@ liks This returns `10` likelihood values (because `nsim_obs = 10`), which can be averaged to come up with an overall likelihood estimate. -To find out about the usage of the `likelihood()` function, you can run -`?likelihood` to access its `R` help file. +### How `likelihood()` works {#how-likelihood-works} + +`likelihood()` first checks if an analytical solution of the likelihood exists +for the given offspring distribution and chain statistic specified. If there's +none, simulations are used to estimate the likelihoods. + +The _epichains_ package includes closed-form (analytical) solutions for +calculating the likelihoods associated with certain summaries of transmission +chains ("size" or "length") for specific offspring distributions. + +For the size distributions, the package provides the poisson, negative binomial, +and gamma-borel mixture. For the length distribution, there's the poisson and +geometric distributions. These can be used with `likelihood()` based on what is +specified for `offspring_dist` and `statistic`. + +If an analytical solution does not exist, an internal simulation function, +`.offspring_ll()` is employed. It uses simulations to approximate the probability +distributions ([using a linear approximation to the cumulative +distribution](https://en.wikipedia.org/wiki/Empirical_distribution_function) +for unobserved sizes/lengths). If simulation is to be used, an extra argument +`nsim_offspring` must be passed to `likelihood()` to specify the number of +simulations to be used for this approximation. + +For example, let's look at an example where `chain_sizes` is observed and we +want to calculate the likelihood of this being drawn from a binomial +distribution with probability `prob = 0.9`. +```{r likelihoods_by_simulation} +set.seed(121) +# example of observed chain sizes; randomly generate 20 chains of size 1 to 10 +chain_sizes <- sample(1:10, 20, replace = TRUE) +# get their likelihood +liks <- likelihood( + chains = chain_sizes, + offspring_dist = rbinom, + statistic = "size", + size = 1, + prob = 0.9, + nsim_offspring = 250 +) +liks +``` ## Transmission chains simulation -There are two simulation functions, herein referred to collectively as the `simulate_*()` functions. +:: {.alert .alert-primary} +### Use case {-} +We want to simulate a scenario where a number of outbreak clusters are each +produced by a separate external case. We want to simulate the chains of +transmission that result from these cases, given a specific offspring +distribution and a reproduction number. +::: + +::: {.alert .alert-secondary} +#### What we have {-} + +1. An initial number of cases that seed the outbreak. + +#### What we assume {-} + +1. An offspring distribution that governs the number of secondary cases +produced by each case. +2. A reproduction number that governs the average number of secondary cases +produced by each case. +3. An observation process/probability that generates the observed chain +summaries. +4. A threshold of chain summaries that are considered too large for the given +transmission properties. For example, we do not expect each case to produce +more than $N$ cases or last more than $T$ generations. +5. A generation time distribution that governs the time between successive +infections. +::: + +There are two simulation functions, herein referred to collectively as the +`simulate_*()` functions that can help us achieve this. ### [`simulate_chains()`](https://epiverse-trace.github.io/epichains/reference/simulate_chains.html) @@ -258,10 +316,12 @@ sim_chains <- simulate_chains( head(sim_chains) ``` +::: {.alert .alert-info} Beyond the defaults, `simulate_chains()` can also simulate outbreaks based on a specified population size and pre-existing immunity until the susceptible pool runs out. - +::: + Here is a quick example where we simulate an outbreak in a population of size $1000$ with $10$ index cases and $10/%$ pre-existing immunity. We assume individuals have a poisson offspring distribution with mean, @@ -290,10 +350,12 @@ head(sim_chains_with_pop) ### [`simulate_summary()`](https://epiverse-trace.github.io/epichains/reference/simulate_summary.html) +::: {.alert .alert-primary} `simulate_summary()` is a performant version of `simulate_chains()` that does not track information on each infector and infectee. It returns the eventual size or length/duration of each transmission chain. This function is especially useful for calculating numerical likelihoods in `likelihood()`. +::: Here is an example to simulate the previous examples without intervention, returning the size of each of the $10$ chains. It assumes a poisson offspring @@ -313,7 +375,7 @@ simulate_summary_eg <- simulate_summary( simulate_summary_eg ``` -## Other functionalities +## S3 Methods ### Summarising @@ -353,13 +415,16 @@ simulate_summary_eg <- simulate_summary( summary(simulate_summary_eg) ``` +::: {.alert .alert-danger} This summary is the same as the output of `simulate_summary()` if the same inputs are used. `simulate_summary()` is a more performant version of `simulate_chains()`, hence, you can use it instead, if you are only interested in the summary of the simulated chains without details about the infection tree. +::: -We can confirm if the two outputs are the same using `base::setequal()` +We can confirm if the two outputs are the same using `base::setequal()`, which +checks if two objects are equal and returns `TRUE/FALSE`. ```{r compare_sim_funcs, include=TRUE,echo=TRUE} setequal( diff --git a/vignettes/img/transmission_chain_example.png b/vignettes/img/transmission_chain_example.png new file mode 100644 index 0000000000000000000000000000000000000000..728a25268cf37d071b930fdc9124419462eca34c GIT binary patch literal 16192 zcmeHuc{tTw*FO<5DGjG@f)SRsVWGp7uh zj(Pm;qwenee&6SL-uJn#-}~QF*O9Zo>$~>eYp=c5`mD8jqNSliModqPgM&k+dPC_B z4h}8~e4Y^zfDuk#1rzX3gQt$Z^Bt%gtAmrh1=0r2>g?eFXN9{XEpTw$hi$dw$?+}{ zRqsEQk*Cw~4jUP}d?=q{O08}+qH2D9kjdYnCOO<;@qPsB8_NO-l>nVLqiUAU@2IYR z3dTFF-IGk=nGz8SCFQep&5TNs<<7LY_T##V_?E(X#8*CcSA~`AMHGEM5`XKIdH@>M zV)WiS%)!VpXzz@4@UrdkXo&5K+7@{%A7#3-Ts7_nJ8maFcd?H-8H3oX7r0-uOP~9B z-6OIWlQ&Q=-)4IVb{2qCRMb*cRQzKzU<-*pFQjiY$uYK=YQJVXPZbg@Zl0~-dYh{4 zULcze(R+nl^2;&4@{x}(wY1!J;1S;`(syxi{Mz4eml9uAE|Dpb+W1xTF-IVc05jwE z%vep^n53MgPOODX3$A)ur?cQi3skPz_w(|k7oXwESwst5QN6koR_KMl<4HSXaP;kn z(Z_1sOiu1nm{8NZCt|?}QQH8FjN_vy^QfIf_r_NwXJws?*LFftJ#c@-^$s=9WN`&< z8cF^6`?`W6)5z)UocWi0=F++n_Ksh#iF~cR;s+HkEaXNdp*bV;o6lYf*3weEaDFhj zCnwVEqWbLUwgckwCZf z)zu&{ds|+pxxE>j*WK0uC>;)tl&rf06lM)~W;KH&kap5+D^+!DtVnZdHa!t_esu>$ zxFzz2rxRS;Q$q*lX$_MwXOop7mU4#x2DWf#D66}zjU5W&F3onr7Xp6A4)d|Go{%_O zOS9>#Yq2WYJHc6nd4+lTd6eCeu7Yeb#H>S+whrp*zuDWNen|nygU=o6z$d`V&u44P_t!U2&dRQU$*%?d!#7Ym;EM9y zfurnQoM3QeSGb)s`(Ih8sH$uI=78N20%_}TvMC_EzfEQi`%UNI;$(9|V-Dkk+rVuB zQxsUAz~AVdkrsan>u+nrJ~`RWUl#(V`_2Dv)PL;#gcwk%t3#CRVJ_H7Rh6XKu3Hm1Yy<4HR;iz7eglaD^Lv~O{k&4zaL1qDetxw=n0R?r1|2%NO z>*phwCDaZ92j=6KzWyBevSe21*{?8Z*%xdXkfwrmw$d8#{cCMtgQc;h5<+fketJQaB<=j>A)bNv#Po>;deY5QW1_>nF3WD99A4vC3zk9 z;U8l@MlW4zkJo609;8R4vj$gQV2!p=5k+ws+6#VhRi4-JD$R419xJDri=TJ>mWiy$ zN?;*W921qaDYdF|F?RgXn&7J3p>6n;?(%ES*CDBQ%%;hRgc4UL~MaB!p1zcYYjYOr-t zT(qkia%FJaczQR7zu$jB&ZYaZ3*~5ZT}SYhXUtYl zXm5QyZ~wsbK={x$N1C~A?CW0FaSMz~#=JE}Hg$5+O{YTI6FvBroLN7jJvU+fj4OVT z>wX>G{=~{#+X*pGN*i6mbEf-*?XujHUzn+?%QAu;FRa{gnth~;=GnZrOuNJ<`MJyQ zrQBEPfpgUOk=M7rEz2GRq&Kw<`mA!sY<*W4udAI~t2>AcC#HyScBC%3Lx{3fAhkbO z;T_+ZFn&c^8C&POe*Sz`PKCXa-w~!Q#&@N3{J>I-L5A;(P;0Bpj ztXRVp)0k5Z-$b6^pT1y9LUP8UB|t2>w&TUQEV|8u3C_Z7Go53T1L?Zp#p6Z$CHJjw zOZ!`N6_Ozz#|%2)MO5Cmjau<;IjO2IC=Q_RaUNq**m3Dn)_50Xj$@J~MDqgZt@Jm$ z>A<;#vvUR;6Nce*3kWMwJPch(^$CuAs<)DrNie==!{;~nlA^x5R=}#{UEjEUI4@{VUmrX5$3-;xP&7Oahr^8aP+R+W~jX;eKTsLsn)(7SAQqG%K&^dkC$rg z!>P3QQ%w^n$>eNe!fg3|Epz`{~y=4~WvIXlkH8$=n7CqED!yLdhcQ!-YC7 zjp{EbG}!7Idlgz~1RXycz#BHpXGr3~vrY1NJ7li1nb|0kvMF`qXmgsxYiT^NDUaew{ShuPcogqyq4;>fWCR@>L-TRR@dhbJ!7r9gWr1*H}h)P4liiRnA z!F{H1$yA1SHi%Yb4K3@2**;(!PfW~PC<`vl3&=X!qrzDmlf>+YUN3xSsz#d`R;xhq zNPQvItxUuFP+dakc#q0XlgyC#CeF@wgJb5HXr^g_D>(^#ECY67Fqdn;Jxs{2$u%rd zyd}LV2$zx?AJ5a3V{v@($vk7LkAVbMy*>_5u)?)cdL>n@WD}3-*t7jZ@9cN7O1s(U z5e9`N&*px~z?og29^z4Bl>5*E@J1faiCA z*!G-`wzhUJjo7>8(uA)44mQF%$%dj&R@3RckVDqK?e%QytSx79lA2To(Qg$M5aU>2 z+9)Sow0^9uZi>x;Cmh$LI{I8F8E>4%>S9l*5jr9}WJ9e;yx;%bT_$`<#!JpCjFcma zUZ<6nxaEMI#acP4z>45t&JN`hJNvhRR|TRzgqELCsP`?(F045Ly4@CS7F1L(zGjI2 zvdn*2$`c+~Q~i5j%f=SJ$eGD9 zUfD*zm)mR;f_>hbsUFWq18AMQq^?{cl*^~-5f0I>B^T~wD@{ex6nSFlm6W(ps9xa^ zcVxVBo(W;R1(jhppsK-Av6sZT$bn}A|g&5ZTFLF6$K8JAgo>N&fSXUMU02U zy<`->QB`~Or3!fymCNa{Rg*~@!sVM%bIR^FG2iv$Z}l&WYy>~CXk|yLRWG^%g?>X~ z+{<06|3ouUDrve{n~{M(YO+>LY!{JBqT7R1ffN}le!RD%AXm^Q0*(@w4|bbIkDR&V z=!4F+JE(;7vp5Yt_L*!VF+Nmpj3nIo`RXFhJB-Ky?IDS`C@L7jd{IrQd+_!Qi`EeA zgNL!+X1#DDTM$3$E7MC~)FVW4Vh%s8z`4!#{3SQRYHXEjm^hVP}J$ld1Pe=33t?za=w0`^9?4eDUjp?`hVA z*@H8$%QFqQ5#9Y_Z)F9eKH(zsT^Sligywz36S`!)yEaA??di`NYITj?i{>`ga2$H0 zOXkGy`JDrvrC;h@p^K)C80L1W>=uc)yc)wn24CR_7e!ieG@m)lfPTA}ooy$dN}D%5 zU1%~(GC-5x!S^yTF|qd?PkK~^en9(N`fPo}^?-vfmOJkB-usL;$)|#n1Ao-{uWlhs zm*Zu$4oDee@y@)Ms7nU2)^?g?<&C)o<#kGXV`|65rC0dcc;Y1iUvG&}we0CVmmtNq zXLlv)Ta2eP-I6Pe%vJP#Se&g0p%>roGqB|DKcdz(FBI8y?I|))x|C`&S{=Cl!Hb@> zuxxHlE)*x?VDG`}nkFJbqEKD*_fzorJtm?VNdW zk#k3oMWydNS;427$_pv7)$1JtDaVq|UZ)5SKJOeS)ICTxp42o9Jzn8NvT|k=+~5A} z9a!-pFf+Nr&`7%TYI^`RA@>|$fO7zxaf|)EUtS74>q+R z)4Cn$DeBPl*|mJ^uDcX7BRQ(T-^GTe=abQ?$4yi9^Q5?Lue;>z^%AU%8|&x!b*J5o zPm|aM7N1)k^`fLUU)rc%i5HR;+bbrp8#I2>d%Qt2^{Nd8I#oToaTDyD2Q=dvDf@o2p(r zUd7wY54_8i08DM0MNxNdo&-j}{D?AYAXCh}B4oJftvja#@yHg#kLYM(55H@XkyS!& zUuTF>FG%iv_mn&iyjxBpn&&TFb>JE$OBUa1adu>OSu*<4+qI`b# ztW!=x>?X?hVdirlIx-^SgMNAJb27I{iQ=|Gw@#Ef#YKi?R#eaQ193Pb)+;IWa$hD{ zLu-^?%A)%b=XtO-Z>9+_);mj0+rX|!ppyMwL{Ma&i+J9+mJJn8FW@InSp3sM6G_H~ zR5_6On5BEzSlp9S%il>q2MPHQNaAR(MY}AhyZnv71Mf#vYq=3Gxbz$d42|nbM^@fD zy0R+bETKn0AE(dl3t8=G*CsPg_r;4j#S9ovJ-nJj_ne)x+ZuwoF_Z}tsI6Ijl( z@RQ`dHZ>ZZiKJO8?iUVG7>#zpMj)pRH`>onmlj|-ayGY|dWWZD(-5IeO6=J0-)Ll0 zW>xfsIzYk5MT_EuKpARh2Tr>1L9&@va4z{0Gn&l=cj(Gp@H;h{0FdZTo6!#AUMr=f z{Da4<7YbsS6CRV;*yeU{NaN@z2yIL<<4<^iV2(!!g;onJw~Xh|0wyKdh?f=^a`+6I zhuS+%(*P-8=XPELK}2=C=+efb=K6=US7A(wckVsMxe$~zS60FG_>&WuJiB<4VwWiZ~-S%qAC?etNjsENSa$g6`2NkSU0ZSyZm}Y z{elkp4ePZ{eEY8150zKVxs5s$BQIZTK99y#>E%9K&KLLXb)<|RF7ULe#cj_ql?4m_ z$VbZLP;{Oo=OLP?OG@JIMLT&Q6oILyQa(qga@z8nhB(Ig2IiLp9szeAn<5dP{ed^M zu>1TpZri6`ZmW_T_5Z?s?|FSEb_5BH#5#GsiTa}teVBz8VtWxTUYBbq5SXE$m8;D> zNM?G&blsY(plF$RQK?s?rO8v)c-_H_6K5le7&+q0V zjdkSX2cMg2h#hZr7a5#-8P;ib5(G-ECliEh{ZPIB-0vPfyNPJ6D2!UlC4*|AX(p6) z;*jiM^|brd^UGBPkBy9uH<`leTaRr#LcLgyqo`(y9R;rbz=?X$Q-ff<(rQ;f1uLjV zAa#o zCXW8e-dHcalZI0GAsB^onM>>*s$7>_b90zSk(myF=wwW_JC}s|Wn&cIUU_O&L?lL1 z+a*VV;=%b-U5%1*OaOL>+73H=rYoOHXq##HQLuht7N-Wnz@k~vb+qT=$iH{T(sU#x z&YQ_ww25ean`CNX1NiD8a$wuS+cF5{)zpDP(&dr|;bjTok4DMh)fn4~xShev!W(5S zU%aO8kXytqY)5`jmoPP+oA!x}qVORHi^3BB0l#Jv-%=jd4*M&o_>72T*q{E%O z&XzLahv^l9*7}Jqd43{VkEf^GzGlP>!Ds}+|&NnG9oU` z5`9RA3II<}Xh?v^XVF9^cdgVVQCQ1}$*fQk3rw{#?!L3x$)ihV@D}rDq8DL3YAKoC~bGz+DM?=A(0v++q82fVP;)>?n z?rmB}%@*7llC>w416?~(9(l|!aWarJROcKjjGi5Qv~^z|A9Lk1j=8ithpD@0vHx|C zFV%q}ufBI=7cUa4JgTay*ihfTA;%(#zTh=b`JOA6<73aM_?TLD6$lYd4!^xV#Y>F& z3T)_N+}aKLy0`=lWrjx{?!J)lPPKsLDa6An0b)1p5+JqLBitp3E`q!o!jYE;H6^Qw zWjz#?Xo4-}S9he}ZiXQ=OfEaZhir1aq%f6rhEtz21HEeMLK=;!(Vgj{?WajGWYz~L zkihr3_21wx%+3n?n>$2y5%0WbaBT% z!^3VX66>C$>#p4{PuS|O1b1xMGs>hZr@CN5CgJlz$7vl7K1@6DnDIIlHeC48Kp3~* zr*nLGMC^vaPyQ@S4Gh7VNjdvV9~KL`v!opOjISdSO(~@LB4LYJ&nqk)+jVD&-2y8b zt6fa*_`5zJFiI5@_)r5Zcpbn*5hs}Fy#Xww|8y=nl&qj;`doL&e-36+2 zLByN0>={e%7MEk7R|2r7c}6wSp$TPLt8?3ntbYJS8=OA>u_mKaoXv;Sv$Y-ZBJK3{ zbhpdJtUi93>+tGv5ByqSPdxXc)-=t9yYscxHoa%FlF-Zx;mhMLUiC-g#3*DH(b2I>WO1OxZw9y(-2|46Y@SI;X7;;zx;MJK_r@w3zN!PjEN87YDH~ zkL!EAjxyMdWklHDvKJ4tH7GA@DS}sTPWlA8CJ#t@nLy?-w3MP#c-OixhXnn2y+k35 zjU5JVT6>*T$sSHt^->!R!A9AX_5oW@V{EU)chE1dJ!##v(}#`5sP^8g{*5&E=9B6_ zGb?5@mxb-dWjuMhbe5mX`Mtp_Tg7U-gx8@DjnoFCCANJCI}LF91p5U%LC91U(>owe?iU9(w>s7pve4!JVx$MnD0U%**7|EH`Tjkyy%Xltp(ji?Usv_vC9Ba)n-|v zete@8PE$?VqUEJ-zFYL4#&0p(e%XNrpW9Pu`mCsRd+eaf2*jcxb)GNyj11KdyFsKp ze7U*HGriO&WmL46JHInUQitInW6JAG%fIw-4vMn+2bdB+bc)xM?2)_1jjftE5UtKQ zRj7?g4uF8G`iBDJ0g8zSAOh8=yYMVvo4d}Ul8mC z`6DFNGxwpL;b*s3`$7@A3wqlEx(j&6xocI9QdV>l38q4unFHinAVO3y0^neZY&$yf z?JL(Wuf4j&HJ*)4Vo0^6C7Wn9=~7H%a6MXFbO$w7`|8zW8j7oci_jpX_9f{phQj0z z#ePT>2a(%zpe+5;8jYx@Y2Sf0_Zj*7G9yxmcjxmTNgpIRcISC^RGwkJfZHS`F8LKTeC=~Swy_eXVu!7 zbDD*%{=F^>-aD!edV8cBuJ7M`<>=6NsgiaWxchce7pC(4$F2LH95Pejm_j(%DM3eA;S!k61Tz{OAoJCwJheu zJnP?s3#>Nih1LlPFWu+OhzN@J6ta$2zG#O{JCVnXK z*h;bjxO4Ese+6X5&+i^_jE1s-9m*ki7Rt-7PXLr< ze9JI%yzq;~$E4FSky+Xihlnt;*x|gQ7626=gfpWf1c4=iH0@U8mccgSb@S%<1EtwcOV(0PN2i^z(@*a# z2CxHNW62njW;{XtiQR`KBn^h-y<0WV&3-xlLij6zwStlJ)1Y2Z*Uh!|1PI#5HAh)R zb@3g0(qKWFIgfXP{u{1l4F>=N=TX&n^s+!5DIzL;q|=C?jTKfT9?NeK-ZQ< z*2zXG0PZwpdm-Nr3=P-B2b5Jfn>R1f^vluEB;(*3a?8E1QMH%ufv* z@7fwE7U(@Ue3og@yx>Bq#fABazI3h;zIs1w3((=K?I|0X{Z;9^wZWF~WqXir27ps1 zmq0=eq;zk7(kQaob-Eh_$|HxiAPo)D$nRBS$Q&E%S`_waYB?4LAH3=CkkRgkUIb7v zcR*h5odJj-*!`RslB?W6#_LwbRH$T{>9&yRh!V9TQp=2g#fUV>JwmO5F%!#?b>^b8#9<}#! zYPD&+^_z&u>XBY;;fi}7zmur_?LhrU1G7qj=KA$$|}!IM?^kF%Hti_jNtS zdJU41lN86ecU{)S1wQTUX!wG+vF@T_%HjKk`?#Kj*`SJZYI>IXpZN)3nA>VRGdri9 z>0;LU9^_`f(8Het1)-LlJ!Mc8^84>N$qL^CE~3`{xNG5EO_0ecdHL4wW-zJO9?e&_ z#KIhTaWzkBPej4U=&WxYu&F&o9g~Fd$vIvt;A_)nIc-M#w!*s5*sN$*cxABHHLLBH ztydI`WM?q5MV`8#6Qa5zeve$MuGZWjqCTeU_;KtH!^OG>jmAo9msp3nf8wId%KLW{ zMonBAu7N=OiG*X{cHq{M;AgdWP_m9#{K9G-uVIPz@D5Ot0d+8YtA4S)*bg)4fdyFaHkusR-67*Jmk8()=|S{ z7g&|toDShno0O|80Ri(Qxcz5B7^h=Ue#z|sz+4&7yl8DL?X%vCD^&`gsznUMwcF}v zIM~B}UYq(Pyp1`%;#i4Xm(RHL2W34{V*9Gh<$h+ELOCusQ27(d;l-Ep9qYBqSV^{2d5wA*k<^m)T;AE&xdO8lGf1 zyI#%K%PX9{BhD;B3JMCXP*l>7dgCWq)kFXZLkUriv;`MMFUPkrtIsT-DNC4%(JEh5 z<<7yY3i23>+Izb6e%5BZaXD$%+v{yGeK-Tc0h>7k`hnNq)LI$dOxQ6^XD4FGnEY#nD`WAp{87LjYLYqvkjWWFdY0Ik>tOvvb z*KbS|c0;1kciwxENt>l(0r9z&N~YUP_)bg!^^4gKYCdBqHckp? zE63XmQi)rv*kqdsHLDJxRSD-PpHg|<`j@zoaLq?pP;=I6FYc??mDNN8O{+j=RxYQ0 zf3n6=uku?p2F*%SwNpGZgTgL_S0qkPpH1*S&l}S&e!c4v#l00e>05l$YWA?<{52kUJK%}>hGL-!mQ9z+OqM?rmNl`?=H(x zx5yXw#2S$=kJ=1fQllzjJ?6%{U4)CRs<@GyUVI+!RAGpV8sAySBq`*TT`((7;A-<8 zEhyJI*!IkDTe4&rUmLK_!b$Q#aK+8q(K5?abU2(_I zI(Tp=N}aZLl^Zr)sN-#mz!$A8QK&)rt@T@2U(eppCC4$TJ$-aYOhopWqW19_`@~(7 z)P8@EB1sUnHCq|4u_}oz;^`ecSL+btIOf9bJ{Ruo`i{20rfuh-Cpl?q21&_89uy|u zLlwM4WZXCzR5RJB32mF9WH~dzC{`<$eZ|{_PKIyyXfu%c_~}*@$20m}m^Au`vM0av z^tSC`xYxO1zN4IgB`?tQGHH4G9fg2ZBNe_x6QaY)uF1oS#n6?_Me;K8 zhXqXcQ5P?L7BTS*Si4l%1*#RZHo6HdD}~kc-u-%B5zZ`ERJK)^jK=W_G#}y3QOXiV z&^G3>)T?;mm0{EUwjG_JhPq36yM3`NO3V|yXKVi> z`yi@2UW0F(*j;ewHt#``v}F+<&UItyVX|=hFRVtF7&T*>j-QVY}Z-(wK1hniO-m zF4_#kx-HV6#bTc_Hewihs~J~3OX~FU%@oa9*9JuLOO2qOI~CZRC)dlQ2Uq<9We*s~ zJA+KQ17&RkU-+a|7ykr(M6FcRuiDlg)>YnWcp*iDXpO++1o(N4@A!Vay|Zdvz7-+F zwM(#Af8k*hbbPb$b(2w>g}{^6hoh1K%qfGDRdVbnmC@M}y*+dmzi;Z=rEyt?FkJ;T z@xH$La=nWeKj}w>vJkZ|KW4CM8}HQ>{aF^Z69`>-7in?iZ{K=jQ9>Gn$q!(^5?Xps ztz4k;dw#%GKigO2>1OIRsys~SaaEeY-P0SNZiN<9CR_uFtI{a+uJ_r0fFaK-9?8R$ z^7UxxBed9R%S!yOID*V-3|Zfsor7Usk6fhuGHm(4dl6^S@9507iR~kq`lq&U zm)ox|uol&UJj}tXO_{3<1}y)r&SE@5Bz_uLn136ooGvWaS75<>zLWjbp z<%<5#yld}T#~};)2$Ec}Fv%Sc(o&J{S5ME73cXOgw?Zle(wy&Iaz#sJ0~nu0l`o{; z%?9O;D_5^Rc@`LQ29!*b*}V=t$G}f){WLCwCRcA+a~!Ye4f(xLP)Ej{R|$v|zdeth z5o6A*-0v4}^xA`S(5x?azLMTyt&4*g0D)3!xuiyF-;nd#jpEMyBgeWT{6~&efMQhk zHiNDUr30uL#yt4uRIxKRJOA=!(lh`Do%DLSy!y%@?JQmSDB}J1)9-b42yqRumA3Yu zMbrNhPJk6$AcQ;VO<(0}1f}%B2BXnH>)_BQ=pvQga<9f8!w+%-9u(juB`R=NC>Xw| ziB9Y^9IO#aDgI(M+J9F(;G~&Grbss9F~wZ;8tmQKGDD_v&}5TuKnM{H(bu?h=gW3- zqP)DkPEt;H>un~jxl#+J?`n!>Bs9#!5f*RPCH&vx=_oay+Q__!?Fy@~9ey{zp%gNh z;D#t4Oo+mq^uK`jvoknBPu6<_CD26nvn2*#AAgX+2H$OIPF@8KQ`Nv+{;G!ppbFZj ztV)7|PXc#PO#Sr)e8;xm693~aE08m$D#8KykLM=?AUW_IJMXV@DJV;WFrJ$U4gUXT zvbnAe4MyhTz~gV97uG!=gYb@t Date: Fri, 22 Mar 2024 13:21:15 +0000 Subject: [PATCH 03/20] Remove chunk name --- vignettes/epichains.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 415d37f3..2a69a42c 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -57,7 +57,7 @@ individuals infected by a single case, and the length of the transmission chain is the number of generations from the first case to the last case they produced before the chain ended (See figure below). -```{r transmission_chain_fig, out.width = "75%", echo = FALSE, fig.cap = "An example of a transmission chain starting is a single case C1. Cases are represented by blue circles and arrows indicate who infected whom. The chain grows through generations G1, G2, and G3, producing cases C2, C3, C4, C5, C6, and C7. The chain ends at generation G3 with case C7. The size of C1's chain is 7 (sum of all blue circles) and the length is 3 (maximum number of generations reached by C1's chain).", fig.alt = "The figure shows an example of a transmission chain starting with a single case C1. The figure is depicted as blue circles linked by orange directed arrows showing the cases they produced. Each generation is marked by a brown rectangle and labelled Gen 1, Gen 2, and Gen 2. The chain grows through generations G1, G2, and G3. Case C1 is Gen 1 and produces cases C2, and C3 in generation 2. In Gen 3, C2 produces C4 and C5, and C3 produces C7. The chain ends in Gen 3. The chain size of C1 is 7 (that is the sum of all the blue circles, representing cases) and the length is 3 (maximum number of generations reached by C1's chain)."} +```{r out.width = "75%", echo = FALSE, fig.cap = "An example of a transmission chain starting with a single case C1. Cases are represented by blue circles and arrows indicate who infected whom. The chain grows through generations G1, G2, and G3, producing cases C2, C3, C4, C5, C6, and C7. The chain ends at generation G3 with case C7. The size of C1's chain is 7 (sum of all blue circles) and the length is 3 (maximum number of generations reached by C1's chain).", fig.alt = "The figure shows an example of a transmission chain starting with a single case C1. The figure is depicted as blue circles linked by orange directed arrows showing the cases they produced. Each generation is marked by a brown rectangle and labelled Gen 1, Gen 2, and Gen 2. The chain grows through generations G1, G2, and G3. Case C1 is Gen 1 and produces cases C2, and C3 in generation 2. In Gen 3, C2 produces C4 and C5, and C3 produces C7. The chain ends in Gen 3. The chain size of C1 is 7 (that is the sum of all the blue circles, representing cases) and the length is 3 (maximum number of generations reached by C1's chain)."} knitr::include_graphics(file.path("img", "transmission_chain_example.png")) ``` From ded78edf660fd0aba71fb78bc90b5a0570dc783c Mon Sep 17 00:00:00 2001 From: jamesaazam Date: Fri, 22 Mar 2024 13:21:28 +0000 Subject: [PATCH 04/20] Fix alert blocks --- vignettes/epichains.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 2a69a42c..186d7b13 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -73,7 +73,7 @@ library("epichains") ## Transmission chains likelihoods -:: {.alert .alert-primary} +::: {.alert .alert-primary} ### Use case {-} Suppose we have observed a number of outbreak clusters, each arising from a separate external case. What are the likely transmission properties @@ -252,7 +252,7 @@ liks ## Transmission chains simulation -:: {.alert .alert-primary} +::: {.alert .alert-primary} ### Use case {-} We want to simulate a scenario where a number of outbreak clusters are each produced by a separate external case. We want to simulate the chains of From 49ad0a8b86a89ef3dd2ce1778bd6a942091f6ea5 Mon Sep 17 00:00:00 2001 From: jamesaazam Date: Fri, 22 Mar 2024 15:31:29 +0000 Subject: [PATCH 05/20] Add note clarifying why we export the borel distribution --- vignettes/epichains.Rmd | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 186d7b13..d1542754 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -219,7 +219,15 @@ calculating the likelihoods associated with certain summaries of transmission chains ("size" or "length") for specific offspring distributions. For the size distributions, the package provides the poisson, negative binomial, -and gamma-borel mixture. For the length distribution, there's the poisson and +and gamma-borel mixture. + +::: {.alert .alert-info} +To provide the gamma-borel size likelihood, the [borel distribution](https://en.wikipedia.org/wiki/Borel_distribution) is needed. +However, base R does not provide this distribution natively like poisson, +gamma, etc., so _epichains_ provides its [density and random generator](https://epiverse-trace.github.io/epichains/reference/index.html#special-distributions). +::: + +For the length distribution, there's the poisson and geometric distributions. These can be used with `likelihood()` based on what is specified for `offspring_dist` and `statistic`. From af6774cf51deb4e5a4c4e6ff69dad0d5162f3aa9 Mon Sep 17 00:00:00 2001 From: jamesaazam Date: Fri, 22 Mar 2024 15:36:49 +0000 Subject: [PATCH 06/20] Replace use of etc to avoid spelling error --- vignettes/epichains.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index d1542754..d29c0f62 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -223,8 +223,8 @@ and gamma-borel mixture. ::: {.alert .alert-info} To provide the gamma-borel size likelihood, the [borel distribution](https://en.wikipedia.org/wiki/Borel_distribution) is needed. -However, base R does not provide this distribution natively like poisson, -gamma, etc., so _epichains_ provides its [density and random generator](https://epiverse-trace.github.io/epichains/reference/index.html#special-distributions). +However, base R does not provide this distribution natively like poisson and +gamma, so _epichains_ provides its [density and random generator](https://epiverse-trace.github.io/epichains/reference/index.html#special-distributions). ::: For the length distribution, there's the poisson and From 4e7312107d7f96a867c8b4f8d37ca65b4c7bff51 Mon Sep 17 00:00:00 2001 From: James Azam Date: Tue, 26 Mar 2024 18:50:05 +0000 Subject: [PATCH 07/20] Apply suggestions from code review Co-authored-by: Sebastian Funk --- vignettes/epichains.Rmd | 42 ++++++++++++++++++----------------------- 1 file changed, 18 insertions(+), 24 deletions(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index d29c0f62..3370db7d 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -35,20 +35,16 @@ a transmission chain. ::: {.alert .alert-info} * To understand the theoretical background of the branching processes methods -used here, refer to the [theory vignette](https://epiverse-trace.github.io/epichains/articles/theoretical_background.html) +used here, refer to the theory vignette `vignette("theoretical_background")`. * To understand the software development design decisions and implementation details of the package, -refer to the [design vignette](https://epiverse-trace.github.io/epichains/articles/design-principles.html) +refer to the design vignette `vignette("design-principles")`. ::: -Epidemics spread through when infected individuals transmit the -infection to others. If we assume that each case infects a random number of -other individuals (the offspring) through time units called generations, -and that this transmission process is governed by a reproduction number and -a probability distribution describing how the offspring are produced through -the time unit, then the epidemic can be modelled as a branching process. - -Branching processes are a stochastic process where each individual in a -generation gives rise to a random number of individuals in the next generation. +Epidemics spread through populations when infected individuals transmit the +infection to others. +[Branching processes](https://en.wikipedia.org/wiki/Branching_process) can be used to model this process. +A branching process is a stochastic process where each infectious individual in a +generation gives rise to a random number of individuals in the next generation, starting with the index case in generation 1. The distribution of the number of offspring is called the offspring distribution. @@ -76,9 +72,9 @@ library("epichains") ::: {.alert .alert-primary} ### Use case {-} Suppose we have observed a number of outbreak clusters, each arising from a -separate external case. What are the likely transmission properties -(reproduction number and/or superspreading coefficient) of the infection that -generated these clusters? +separate index case. What are the likely transmission properties +(reproduction number and/or superspreading coefficient) that +generated these clusters (assuming these parameters are the same across all the clusters)? The first step in answering this question is to calculate the likelihood of observing the observed chain summaries given the transmission properties. This @@ -97,13 +93,10 @@ these estimation frameworks. 1. An offspring distribution that governs the number of secondary cases produced by each case. -2. A reproduction number that governs the average number of secondary cases -produced by each case. -3. An observation process/probability that generates the observed chain +2. Parameters of the offspring distribution (for example, mean and overdispersion if using a negative binomial; which can then be interpreted as reproduction number and superspreading coefficient, respectively) +3. (optional) An observation process/probability that generates the observed chain summaries. -4. A threshold of chain summaries that are considered too large for the given -transmission properties. For example, we do not expect each case to produce -more than $N$ cases or last more than $T$ generations. +4. A threshold of chain summaries that are considered too large to be consistent with a reproduction number of $R<1$ (e.g., $N$ or $T$ generations in total). ::: ### [`likelihood()`](https://epiverse-trace.github.io/epichains/reference/likelihood.html) @@ -218,16 +211,16 @@ The _epichains_ package includes closed-form (analytical) solutions for calculating the likelihoods associated with certain summaries of transmission chains ("size" or "length") for specific offspring distributions. -For the size distributions, the package provides the poisson, negative binomial, +For the size distributions, the package provides the Poisson, negative binomial, and gamma-borel mixture. ::: {.alert .alert-info} -To provide the gamma-borel size likelihood, the [borel distribution](https://en.wikipedia.org/wiki/Borel_distribution) is needed. +To provide the gamma-Borel size likelihood, the [Borel distribution](https://en.wikipedia.org/wiki/Borel_distribution) is needed. However, base R does not provide this distribution natively like poisson and gamma, so _epichains_ provides its [density and random generator](https://epiverse-trace.github.io/epichains/reference/index.html#special-distributions). ::: -For the length distribution, there's the poisson and +For the length distribution, there's the Poisson and geometric distributions. These can be used with `likelihood()` based on what is specified for `offspring_dist` and `statistic`. @@ -238,6 +231,7 @@ distribution](https://en.wikipedia.org/wiki/Empirical_distribution_function) for unobserved sizes/lengths). If simulation is to be used, an extra argument `nsim_offspring` must be passed to `likelihood()` to specify the number of simulations to be used for this approximation. +Approximate values of the likelihood will vary with every call to the simulation (because the simulations used for estimation vary), and it may be worth calling `likelihood()` multiple times in this case to see the error this may introduce. For example, let's look at an example where `chain_sizes` is observed and we want to calculate the likelihood of this being drawn from a binomial @@ -263,7 +257,7 @@ liks ::: {.alert .alert-primary} ### Use case {-} We want to simulate a scenario where a number of outbreak clusters are each -produced by a separate external case. We want to simulate the chains of +produced by a separate index case. We want to simulate the chains of transmission that result from these cases, given a specific offspring distribution and a reproduction number. ::: From c0dba677ec2cacfce186843093bb22a30e4e8e90 Mon Sep 17 00:00:00 2001 From: James Azam Date: Tue, 26 Mar 2024 18:50:32 +0000 Subject: [PATCH 08/20] Apply suggestions from code review Co-authored-by: Sebastian Funk --- vignettes/epichains.Rmd | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 3370db7d..3a79a564 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -106,7 +106,7 @@ outbreak summaries obtained from transmission chains. "Outbreak summaries" here refer to transmission chain sizes or lengths/durations. `likelihood()` requires a vector of chain summaries (sizes or lengths), -`chains`, the corresponding statistic to calculate, `statistic`, the offspring +`chains`, the corresponding statistic to use, `statistic`, the offspring distribution, `offspring_dist` and its associated parameters. `offspring_dist` is specified as the function that is used to generate random @@ -142,8 +142,8 @@ likelihood_eg ### Joint and individual log-likelihoods -`likelihood()`, by default, returns the joint log-likelihood. If instead, -the individual log-likelihoods are required, then the `individual` argument +`likelihood()`, by default, returns the joint log-likelihood, given by the sum of log-likelihoods of each observed chain. If instead +the individual log-likelihoods are desired (for example for calculating [WAIC](https://en.wikipedia.org/wiki/Watanabe%E2%80%93Akaike_information_criterion) values, then the `individual` argument must be set to `TRUE`. To return likelihoods instead, set `log = FALSE`. ```{r estimate_likelihoods2} set.seed(121) @@ -252,7 +252,7 @@ liks <- likelihood( liks ``` -## Transmission chains simulation +## Transmission chain simulation ::: {.alert .alert-primary} ### Use case {-} @@ -360,7 +360,7 @@ especially useful for calculating numerical likelihoods in `likelihood()`. ::: Here is an example to simulate the previous examples without intervention, -returning the size of each of the $10$ chains. It assumes a poisson offspring +returning the size of each of the $10$ chains. It assumes a Poisson offspring distribution distribution with mean of $0.9$. ```{r sim_summary_pois} set.seed(123) From 9329becdce3a2ad2b13927223e30dc7ad85d967f Mon Sep 17 00:00:00 2001 From: James Azam Date: Tue, 26 Mar 2024 18:58:26 +0000 Subject: [PATCH 09/20] Align text with figure labels and fix wording --- vignettes/epichains.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 3a79a564..c0c9f7df 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -53,7 +53,7 @@ individuals infected by a single case, and the length of the transmission chain is the number of generations from the first case to the last case they produced before the chain ended (See figure below). -```{r out.width = "75%", echo = FALSE, fig.cap = "An example of a transmission chain starting with a single case C1. Cases are represented by blue circles and arrows indicate who infected whom. The chain grows through generations G1, G2, and G3, producing cases C2, C3, C4, C5, C6, and C7. The chain ends at generation G3 with case C7. The size of C1's chain is 7 (sum of all blue circles) and the length is 3 (maximum number of generations reached by C1's chain).", fig.alt = "The figure shows an example of a transmission chain starting with a single case C1. The figure is depicted as blue circles linked by orange directed arrows showing the cases they produced. Each generation is marked by a brown rectangle and labelled Gen 1, Gen 2, and Gen 2. The chain grows through generations G1, G2, and G3. Case C1 is Gen 1 and produces cases C2, and C3 in generation 2. In Gen 3, C2 produces C4 and C5, and C3 produces C7. The chain ends in Gen 3. The chain size of C1 is 7 (that is the sum of all the blue circles, representing cases) and the length is 3 (maximum number of generations reached by C1's chain)."} +```{r out.width = "75%", echo = FALSE, fig.cap = "An example of a transmission chain starting with a single case C1. Cases are represented by blue circles and arrows indicate who infected whom. The chain grows through generations Gen 1, Gen 2, and Gen 3, producing cases C2, C3, C4, C5, and C6. The chain ends at generation Gen 3 with cases C4, C5, and C6. The size of C1's chain is 6 (sum of all blue circles) and the length is 3 (maximum number of generations reached by C1's chain).", fig.alt = "The figure shows an example of a transmission chain starting with a single case C1. The complete chain is depicted as blue circles linked by orange directed arrows showing the cases produced in each generation. Each generation is marked by a brown rectangle and labelled Gen 1, Gen 2, and Gen 3. The chain grows through generations Gen 1, Gen 2, and Gen 3. Case C1 starts in Gen 1 and produces cases C2, and C3 in Gen 2. In Gen 3, C2 produces C4 and C5, and C3 produces C6. The chain ends in Gen 3. The chain size of C1 is 6 (that is the sum of all the blue circles, representing cases) and the length is 3 (maximum number of generations reached by C1's chain)."} knitr::include_graphics(file.path("img", "transmission_chain_example.png")) ``` From 92b46b252c1a8fcdfd85e83546de8193fe6da66f Mon Sep 17 00:00:00 2001 From: James Azam Date: Tue, 26 Mar 2024 19:08:54 +0000 Subject: [PATCH 10/20] Change "clusters" to "chains" for consistency --- vignettes/epichains.Rmd | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index c0c9f7df..7d12580c 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -71,10 +71,9 @@ library("epichains") ::: {.alert .alert-primary} ### Use case {-} -Suppose we have observed a number of outbreak clusters, each arising from a -separate index case. What are the likely transmission properties +Suppose we have observed a number of transmission chains, each arising from a separate index case. What are the likely transmission properties (reproduction number and/or superspreading coefficient) that -generated these clusters (assuming these parameters are the same across all the clusters)? +generated these chains (assuming these parameters are the same across all the chains)? The first step in answering this question is to calculate the likelihood of observing the observed chain summaries given the transmission properties. This @@ -256,7 +255,7 @@ liks ::: {.alert .alert-primary} ### Use case {-} -We want to simulate a scenario where a number of outbreak clusters are each +We want to simulate a scenario where a number of chains are each produced by a separate index case. We want to simulate the chains of transmission that result from these cases, given a specific offspring distribution and a reproduction number. From 2452fa0b8f1102f8364e57b0992d79072cb5c84f Mon Sep 17 00:00:00 2001 From: James Azam Date: Tue, 26 Mar 2024 19:12:27 +0000 Subject: [PATCH 11/20] Indicate that stat_max is optional --- vignettes/epichains.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 7d12580c..2577e0b7 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -95,7 +95,7 @@ produced by each case. 2. Parameters of the offspring distribution (for example, mean and overdispersion if using a negative binomial; which can then be interpreted as reproduction number and superspreading coefficient, respectively) 3. (optional) An observation process/probability that generates the observed chain summaries. -4. A threshold of chain summaries that are considered too large to be consistent with a reproduction number of $R<1$ (e.g., $N$ or $T$ generations in total). +4. (optional) A threshold of chain summaries that are considered too large to be consistent with a reproduction number of $R<1$ (e.g., $N$ or $T$ generations in total). ::: ### [`likelihood()`](https://epiverse-trace.github.io/epichains/reference/likelihood.html) From fc82bcea1c51d54fea24242059e2fc708929ba41 Mon Sep 17 00:00:00 2001 From: James Azam Date: Tue, 26 Mar 2024 19:12:50 +0000 Subject: [PATCH 12/20] Reuse assumptions from likelihood section --- vignettes/epichains.Rmd | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 2577e0b7..6eecc4ca 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -268,17 +268,10 @@ distribution and a reproduction number. #### What we assume {-} -1. An offspring distribution that governs the number of secondary cases -produced by each case. -2. A reproduction number that governs the average number of secondary cases -produced by each case. -3. An observation process/probability that generates the observed chain -summaries. -4. A threshold of chain summaries that are considered too large for the given -transmission properties. For example, we do not expect each case to produce -more than $N$ cases or last more than $T$ generations. -5. A generation time distribution that governs the time between successive -infections. +1. An offspring distribution that governs the number of secondary cases produced by each case. +2. Parameters of the offspring distribution (for example, mean and overdispersion if using a negative binomial; which can then be interpreted as reproduction number and superspreading coefficient, respectively) +3. (optional) A threshold of chain summaries that are considered too large to be consistent with a reproduction number of $R<1$ (e.g., $N$ or $T$ generations in total). +4. (optional) A generation time distribution that governs the time between successive cases. ::: There are two simulation functions, herein referred to collectively as the From 743a5c3e03bb07b12c095d45108620c504ff82ae Mon Sep 17 00:00:00 2001 From: James Azam Date: Tue, 26 Mar 2024 19:16:01 +0000 Subject: [PATCH 13/20] Calrify that N is cases --- vignettes/epichains.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 6eecc4ca..314c5fcb 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -95,7 +95,7 @@ produced by each case. 2. Parameters of the offspring distribution (for example, mean and overdispersion if using a negative binomial; which can then be interpreted as reproduction number and superspreading coefficient, respectively) 3. (optional) An observation process/probability that generates the observed chain summaries. -4. (optional) A threshold of chain summaries that are considered too large to be consistent with a reproduction number of $R<1$ (e.g., $N$ or $T$ generations in total). +4. (optional) A threshold of chain summaries that are considered too large to be consistent with a reproduction number of $R<1$ (e.g., $N$ cases or $T$ generations in total). ::: ### [`likelihood()`](https://epiverse-trace.github.io/epichains/reference/likelihood.html) @@ -270,7 +270,7 @@ distribution and a reproduction number. 1. An offspring distribution that governs the number of secondary cases produced by each case. 2. Parameters of the offspring distribution (for example, mean and overdispersion if using a negative binomial; which can then be interpreted as reproduction number and superspreading coefficient, respectively) -3. (optional) A threshold of chain summaries that are considered too large to be consistent with a reproduction number of $R<1$ (e.g., $N$ or $T$ generations in total). +3. (optional) A threshold of chain summaries that are considered too large to be consistent with a reproduction number of $R<1$ (e.g., $N$ cases or $T$ generations in total). 4. (optional) A generation time distribution that governs the time between successive cases. ::: From fc80efc9977fd890b645b7b751ff60df99cda90b Mon Sep 17 00:00:00 2001 From: James Azam Date: Tue, 26 Mar 2024 22:22:52 +0000 Subject: [PATCH 14/20] Apply suggestions from code review Co-authored-by: Adam Kucharski --- vignettes/epichains.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 314c5fcb..a47f5e7b 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -40,7 +40,7 @@ used here, refer to the theory vignette `vignette("theoretical_background")`. refer to the design vignette `vignette("design-principles")`. ::: -Epidemics spread through populations when infected individuals transmit the +Infectious disease epidemics can spread through populations when a chain of infected individuals transmit the infection to others. [Branching processes](https://en.wikipedia.org/wiki/Branching_process) can be used to model this process. A branching process is a stochastic process where each infectious individual in a @@ -108,7 +108,7 @@ refer to transmission chain sizes or lengths/durations. `chains`, the corresponding statistic to use, `statistic`, the offspring distribution, `offspring_dist` and its associated parameters. -`offspring_dist` is specified as the function that is used to generate random +`offspring_dist` is specified by the R function that is used to generate random numbers, i.e. `rpois` for Poisson, `rnbinom` for negative binomial, or a custom function. @@ -127,7 +127,7 @@ set.seed(121) # randomly generate 20 chains of size between 1 to 10 chain_sizes <- sample(1:10, 20, replace = TRUE) chain_sizes -# estimate loglikelihood of the observed chain sizes +# estimate loglikelihood of the observed chain sizes for given lambda likelihood_eg <- likelihood( chains = chain_sizes, statistic = "size", From 150850de69424041456b9fc5aa7aebafb46fbfa5 Mon Sep 17 00:00:00 2001 From: James Azam Date: Tue, 26 Mar 2024 22:30:24 +0000 Subject: [PATCH 15/20] Update WORDLIST --- inst/WORDLIST | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/inst/WORDLIST b/inst/WORDLIST index e586e0d5..05b21a58 100644 --- a/inst/WORDLIST +++ b/inst/WORDLIST @@ -14,6 +14,7 @@ R's README's TransPhylo Vukosi +WAIC Zhian borel bpmodels @@ -28,7 +29,6 @@ epicontacts epiparameter fn frac -gborel geosocial geq infectee @@ -43,13 +43,11 @@ ln mathbf mathrm nCoV -nbinom nosoi outbreaker outbreakr packagename parameterised -pois poisson rborel ringbp @@ -59,4 +57,3 @@ superspreading th truncdist unnormalised -var From 8127f57b0e82f4fe4ae2990968f22a1783295be3 Mon Sep 17 00:00:00 2001 From: James Azam Date: Wed, 27 Mar 2024 10:37:55 +0000 Subject: [PATCH 16/20] Clarify that size and length include first case --- vignettes/epichains.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index a47f5e7b..e5129410 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -51,9 +51,9 @@ distribution. The size of the transmission chain is the total number of individuals infected by a single case, and the length of the transmission chain is the number of generations from the first case to the last case they -produced before the chain ended (See figure below). +produced before the chain ended. The size calculation includes the first case and the length calculation includes the first generation when the first case starts the chain (See figure below). -```{r out.width = "75%", echo = FALSE, fig.cap = "An example of a transmission chain starting with a single case C1. Cases are represented by blue circles and arrows indicate who infected whom. The chain grows through generations Gen 1, Gen 2, and Gen 3, producing cases C2, C3, C4, C5, and C6. The chain ends at generation Gen 3 with cases C4, C5, and C6. The size of C1's chain is 6 (sum of all blue circles) and the length is 3 (maximum number of generations reached by C1's chain).", fig.alt = "The figure shows an example of a transmission chain starting with a single case C1. The complete chain is depicted as blue circles linked by orange directed arrows showing the cases produced in each generation. Each generation is marked by a brown rectangle and labelled Gen 1, Gen 2, and Gen 3. The chain grows through generations Gen 1, Gen 2, and Gen 3. Case C1 starts in Gen 1 and produces cases C2, and C3 in Gen 2. In Gen 3, C2 produces C4 and C5, and C3 produces C6. The chain ends in Gen 3. The chain size of C1 is 6 (that is the sum of all the blue circles, representing cases) and the length is 3 (maximum number of generations reached by C1's chain)."} +```{r out.width = "75%", echo = FALSE, fig.cap = "An example of a transmission chain starting with a single case C1. Cases are represented by blue circles and arrows indicate who infected whom. The chain grows through generations Gen 1, Gen 2, and Gen 3, producing cases C2, C3, C4, C5, and C6. The chain ends at generation Gen 3 with cases C4, C5, and C6. The size of C1's chain is 6, including C1 (that is, the sum of all blue circles) and the length is 3, which includes Gen 1 (maximum number of generations reached by C1's chain).", fig.alt = "The figure shows an example of a transmission chain starting with a single case C1. The complete chain is depicted as blue circles linked by orange directed arrows showing the cases produced in each generation. Each generation is marked by a brown rectangle and labelled Gen 1, Gen 2, and Gen 3. The chain grows through generations Gen 1, Gen 2, and Gen 3. Case C1 starts in Gen 1 and produces cases C2, and C3 in Gen 2. In Gen 3, C2 produces C4 and C5, and C3 produces C6. The chain ends in Gen 3. The chain size of C1 is 6, which includes C1 (that is the sum of all the blue circles, representing cases) and the length is 3, which includes Gen 1 (maximum number of generations reached by C1's chain)."} knitr::include_graphics(file.path("img", "transmission_chain_example.png")) ``` From 41cc13bf5aa6966a3b691dcadd193fe6255d4065 Mon Sep 17 00:00:00 2001 From: James Azam Date: Wed, 27 Mar 2024 10:38:10 +0000 Subject: [PATCH 17/20] Improve wording --- vignettes/epichains.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index e5129410..df0c8dd8 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -40,7 +40,7 @@ used here, refer to the theory vignette `vignette("theoretical_background")`. refer to the design vignette `vignette("design-principles")`. ::: -Infectious disease epidemics can spread through populations when a chain of infected individuals transmit the +Infectious disease epidemics spread through populations when a chain of infected individuals transmit the infection to others. [Branching processes](https://en.wikipedia.org/wiki/Branching_process) can be used to model this process. A branching process is a stochastic process where each infectious individual in a From dca2df270e95726af6ff813831b32f8b5ea48fba Mon Sep 17 00:00:00 2001 From: James Azam Date: Wed, 27 Mar 2024 10:38:34 +0000 Subject: [PATCH 18/20] Break important sentence to new paragraph --- vignettes/epichains.Rmd | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index df0c8dd8..bb7a6ce9 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -79,8 +79,9 @@ The first step in answering this question is to calculate the likelihood of observing the observed chain summaries given the transmission properties. This is where the `likelihood()` function comes in. The returned estimate can then be used to infer the transmission properties using estimation frameworks such -as maximum likelihood or Bayesian inference. _epichains_ does not provide -these estimation frameworks. +as maximum likelihood or Bayesian inference. + +_epichains_ does not provide these estimation frameworks. ::: ::: {.alert .alert-secondary} From ff7254a5a93fd4e04a00684791d66479593d363b Mon Sep 17 00:00:00 2001 From: James Azam Date: Wed, 27 Mar 2024 10:38:47 +0000 Subject: [PATCH 19/20] Write WAIC in full --- vignettes/epichains.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index bb7a6ce9..06d5c45c 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -143,7 +143,7 @@ likelihood_eg ### Joint and individual log-likelihoods `likelihood()`, by default, returns the joint log-likelihood, given by the sum of log-likelihoods of each observed chain. If instead -the individual log-likelihoods are desired (for example for calculating [WAIC](https://en.wikipedia.org/wiki/Watanabe%E2%80%93Akaike_information_criterion) values, then the `individual` argument +the individual log-likelihoods are desired (for example for calculating [Watanabe–Akaike information criterion](https://en.wikipedia.org/wiki/Watanabe%E2%80%93Akaike_information_criterion) values, then the `individual` argument must be set to `TRUE`. To return likelihoods instead, set `log = FALSE`. ```{r estimate_likelihoods2} set.seed(121) From 82582574618f1e9b69d11c6405be724e5d55a0b0 Mon Sep 17 00:00:00 2001 From: James Azam Date: Wed, 27 Mar 2024 10:39:13 +0000 Subject: [PATCH 20/20] Improve explanation of susceptible depletion feature --- vignettes/epichains.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 06d5c45c..7cda2aa1 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -312,9 +312,9 @@ head(sim_chains) ``` ::: {.alert .alert-info} -Beyond the defaults, `simulate_chains()` can also simulate outbreaks based on -a specified population size and pre-existing immunity until the susceptible -pool runs out. +By default, `simulate_chains()` assumes an infinite population but can account for susceptible depletion when a finite `pop` is specified. + +Pre-existing immunity levels can also be specified, which will be applied to `pop` before the simulation is initialised. ::: Here is a quick example where we simulate an outbreak in a population of