Merge pull request #329 from Merck/289-vignette-of-the-information-un…

…der-the-null-and-alternative-hypothesis 289 vignette of the information under the null and alternative hypothesis
Merck · Feb 5, 2024 · 91e4e4d · 91e4e4d
2 parents eb1d509 + a81ead5
commit 91e4e4d
Show file tree

Hide file tree

Showing 2 changed files with 161 additions and 0 deletions.
diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -157,6 +157,7 @@ articles:
   - articles/story-compute-expected-events
   - articles/story-summarize-designs
   - articles/story-canonical-h0-h1
+  - articles/story-info-formula
 
 - title: "Design for binary endpoints"
   desc: >

diff --git a/vignettes/articles/story-info-formula.Rmd b/vignettes/articles/story-info-formula.Rmd
@@ -0,0 +1,160 @@
+---
+title: "Statistical information under null and alternative hypothesis"
+output:
+  rmarkdown::html_document:
+    toc: true
+    toc_float: true
+    toc_depth: 2
+    number_sections: true
+    highlight: "textmate"
+    css: "custom.css"
+bibliography: gsDesign2.bib
+vignette: |
+  %\VignetteIndexEntry{Statistical information under null and alternative hypothesis}
+  %\VignetteEncoding{UTF-8}
+  %\VignetteEngine{knitr::rmarkdown}
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+In a group sequential design of the $k$-th analysis, the Z-score is
+$$
+  Z_k = \delta_k / \sqrt{\text{Var}(\delta_k | H_0)}
+$$
+The statistical information $\mathcal I_k$ is defined as the inverse of the variance of $\delta_k$, i.e.,
+$$
+  \mathcal I_k = 1 / \text{Var}(\delta_k).
+$$
+
+# Continuous outcomes
+
+Imagine a trial with a continuous outcome. Let $X_{0, i} \sim N(\mu_0, \sigma^2)$ for subjects $i = 1, \ldots, n_0$ in the control arm and $X_{1,i} \sim N(\mu_1, \sigma^2)$ for patient $i$ ($i = 1, \ldots, n_1$) in the experimental arm.
+
+For a superiority design, the tested hypothesis is
+$$
+  H_0: \; \mu_0 = \mu_1 \;\;\; \text{vs.} \;\;\; H_1:\; \mu_1 > \mu_0.
+$$
+Suppose at the $k$-th analysis, there are $n_{0k}$ subjects in the control arm, and there are $n_{1k}$ subjects in the experimental arm. We have $\delta_k$ as the difference of group means, i.e.,
+$$
+  \delta_k
+  =
+  \frac{\sum_{i=1}^{n_{1k}} X_{i,1}}{n_{1k}}
+  -
+  \frac{\sum_{i=1}^{n_{0k}} X_{i,0}}{n_{0k}}.
+$$
+It can be estimated as $\widehat\delta_k = \frac{\sum_{i=1}^{n_{1k}} x_{i,1}}{n_{1k}} - \frac{\sum_{i=1}^{n_{0k}} x_{i,0}}{n_{0k}}$, where $x_{i,j}$ are the observation of $X_{i,j}$ of subject $i$ in arm $j$.
+
+The statistical information $\mathcal I_k$ is
+$$
+  \mathcal I_k^{-1}
+  =
+  \text{Var}(\delta_k | H_0)
+  =
+  \sigma^2 (1 / n_{1k} + 1 / n_{0k}),
+$$
+under both $H_0$ and $H_1$, which can be estimated as $\mathcal I_k = \widehat\sigma^2 (1 / n_{1k} + 1 / n_{0k})$.
+
+# Binary outcomes
+
+Imagine a trial with a binary outcome. Let $X_{0, i} \sim B(p_0)$ for patient $i = 1, \ldots, n_0$ and $X_{1,i} \sim B(p_1)$ for patient $i$ ($i = 1, \ldots, n_1$), where $p_0$ and $p_1$ are failure rate probability. Suppose at the $k$-th analysis, there are $n_{0k}$ subjects in the control arm, and there are $n_{1k}$ subjects in the experimental arm. For a superiority design, the null and alternative hypothesis is
+$$
+  H_0: \; p_0 = p_1 = p \;\;\; \text{vs.} \;\;\; H_1:\; p_0 > p_1.
+$$
+The nature-scale treatment effect is
+$$
+  \delta_k
+  =
+  \frac{\sum_{i=1}^{n_{1k}} X_{i,1}}{n_{1k}}
+  -
+  \frac{\sum_{i=1}^{n_{0k}} X_{i,0}}{n_{0k}},
+$$
+It can be estimated as $\widehat\delta_k = \frac{\sum_{i=1}^{n_{1k}} x_{i,1}}{n_{1k}} - \frac{\sum_{i=1}^{n_{0k}} x_{i,0}}{n_{0k}}$, where $x_{i,j}$ are the observation of $X_{i,j}$ of subject $i$ in arm $j$.
+
+The statistical information is
+$$
+  \mathcal I_k^{-1}
+  =
+  \text{Var}(\delta_k)
+  =
+  \left\{
+    \begin{array}{ll}
+      p(1-p)/n_{1k} + p(1-p)/n_{0k}           & \text{under } H_0\\
+      p_1(1-p_1)/n_{1k} + p_0(1-p_0)/n_{0k}   & \text{under } H_1\\
+    \end{array}
+  \right..
+$$
+Its estimation is
+$$
+  \widehat{\mathcal I}_k^{-1}
+  =
+  \left\{
+    \begin{array}{ll}
+      \bar p(1 - \bar p) / n_{1k} + \bar p(1 - \bar p) / n_{0k}           & \text{under } H_0\\
+      \widehat p_1(1-p_1) / n_{1k} + \widehat p_0(1 - \widehat p_0)/n_{0k}   & \text{under } H_1\\
+    \end{array}
+  \right.,
+$$
+where $\bar p = \frac{\sum_{i=1}^{n_{1k}}x_{i1} + \sum_{i=1}^{n_{0k}}x_{i0}}{n_{1k} + n_{0k}}$, $\widehat p_j = \frac{\sum_{i=1}^{n_{jk}}x_{ij}}{n_{jk}}$ for $j = 0, 1$.
+
+
+# Survival outcome
+
+In many clinical trials, the outcome is the time to some event. For simplicity, assume the event is death so that each person can only have one event; the same ideas apply for events that can recur, but in those cases we restrict attention to the first event for each patients. We use the logrank statistics to compare the treatment and control arms.
+If we assume there are $N_k$ total number of deaths at analysis $k$. The numerator of the logrank statistics at analysis $k$ is [@proschan2006statistical]
+$$
+  \sum_{i=1}^{N_k} D_i,
+$$
+where $D_i = O_i - E_i$ with $O_i$ is the indicator that the $i$th death occurred in a treatment patient, and $E_i = m_{1i} / (m_{0i} + m_{1i})$ as the null expectation of $O_i$ given the respective numbers, $m_{0i}$ and $m_{1i}$, of control and treatment patients at risk just prior to the $i$th death.
+
+
+Conditioned on $m_{0i}$ and $m_{1i}$, the $O_i$ has a Bernoulli distribution with parameter $E_i$. The null conditional mean and variance of $D_i$ are 0 and $V_i = E_i(1 − E_i)$, respectively.
+
+Unconditionally, the $D_i$ are uncorrelated, mean 0 random variables with variance $E(V_i)$ under the null hypothesis.
+
+Thus, conditioned on $N_k$, we have
+$$
+\begin{array}{ccl}
+  \mathcal I_k^{-1}
+  & = &
+  \text{Var}(\delta_k)
+  =
+  \sum_{i=1}^{N_k} \text{Var}(D_i)
+  =
+  \sum_{i=1}^{N_k} E(V_i)
+  =
+  E \left( \sum_{i=1}^{N_k} V_i \right)
+  =
+  E \left( \sum_{i=1}^{N_k} E_i(1 − E_i) \right) \\
+  & = &
+  \left\{
+  \begin{array}{ll}
+  E\left(\sum_{i=1}^{N_k} \frac{r}{1+r} \frac{1}{1+r} \right)
+  & \text{under } H_0\\
+  E\left(\sum_{i=1}^{N_k} \frac{m_{1i}}{(m_{0i} + m_{1i})} \frac{m_{0i}}{(m_{0i} + m_{1i})}\right)
+  & \text{under } H_1
+  \end{array}
+  \right.,
+\end{array}
+$$
+where $r$ is the randomization ratio. Its estimation is
+$$
+\begin{array}{ccl}
+  \widehat{\mathcal I}_k^{-1}
+  & = &
+  \left\{
+  \begin{array}{ll}
+  \sum_{i=1}^{N_k} \frac{r}{1+r} \frac{1}{1+r}
+  & \text{under } H_0\\
+  \sum_{i=1}^{N_k} \frac{m_{1i}}{(m_{0i} + m_{1i})} \frac{m_{0i}}{(m_{0i} + m_{1i})}
+  & \text{under } H_1
+  \end{array}
+  \right..
+\end{array}
+$$
+
+# References