First, we import the gsDesign
library for sequential
monitoring analyses.
library(gsDesign)
Calculate boundaries for k = 5 equally spaced analyses with two-sided
test and α=5%. Note that
gsDesign
uses a one-sided α, so we specify α=0.025 for 5% two-sided test.
gsDesign
The gsDesign
function offers a variety of analyses;
however, this discussion will concentrate on boundary calculations. We
will conduct five analyses, including both interim and final
evaluations. By default, gsDesign
assumes equally spaced
analyses. It is crucial to specify that we desire a two-sided test
(i.e., both lower and upper boundaries) and to define the type-I error
rate. Notably, the function defaults to a one-sided type-I error;
therefore, to achieve a 5% alpha level, we must specify
alpha = 0.025
.
The following outlines the bounds proposed by Pocock for k=5 equally spaced interim analyses:
pocock_design <- gsDesign(k = 5, test.type = 2, alpha = 0.025, sfu = "Pocock")
gsBoundSummary(pocock_design)
## Analysis Value Efficacy Futility
## IA 1: 20% Z 2.4132 -2.4132
## N/Fixed design N: 0.24 p (1-sided) 0.0079 0.0079
## ~delta at bound 1.5155 -1.5155
## P(Cross) if delta=0 0.0079 0.0079
## P(Cross) if delta=1 0.2059 0.0000
## IA 2: 40% Z 2.4132 -2.4132
## N/Fixed design N: 0.48 p (1-sided) 0.0079 0.0079
## ~delta at bound 1.0716 -1.0716
## P(Cross) if delta=0 0.0138 0.0138
## P(Cross) if delta=1 0.4661 0.0000
## IA 3: 60% Z 2.4132 -2.4132
## N/Fixed design N: 0.72 p (1-sided) 0.0079 0.0079
## ~delta at bound 0.8749 -0.8749
## P(Cross) if delta=0 0.0183 0.0183
## P(Cross) if delta=1 0.6747 0.0000
## IA 4: 80% Z 2.4132 -2.4132
## N/Fixed design N: 0.97 p (1-sided) 0.0079 0.0079
## ~delta at bound 0.7577 -0.7577
## P(Cross) if delta=0 0.0219 0.0219
## P(Cross) if delta=1 0.8149 0.0000
## Final Z 2.4132 -2.4132
## N/Fixed design N: 1.21 p (1-sided) 0.0079 0.0079
## ~delta at bound 0.6777 -0.6777
## P(Cross) if delta=0 0.0250 0.0250
## P(Cross) if delta=1 0.9000 0.0000
The boundaries remain constant across analyses, with the Z-value for a two-sided test calculated as Z=±2.41 and the p-value we have to use in a two-sided test is 2×0.0079=0.0158. You can ignore the remaining output at this stage. We can also use the \verb|plot| to produce a graph.
library(ggplot2)
par(mar = c(4, 4, 1, 1))
ss1 <- plot(pocock_design, main = "")
ss2 <- ss1 +
theme(legend.position = "none") +
theme(axis.text = element_text(size = 10)) +
theme(axis.title = element_text(size = 15, face = "bold"))
ss2
Figure 1. Pocock boundaries assuming k=5 looks and &alpha=0.05$.
The following are the Z statistics and the associated p-value boundaries:
# Extract Z statistics and p-values
pocock_bounds <- cbind(
Z = gsBoundSummary(pocock_design)$Eff[gsBoundSummary(pocock_design)$Val == "Z"],
P = 2 * gsBoundSummary(pocock_design)$Eff[gsBoundSummary(pocock_design)$Val == "p (1-sided)"]
)
pocock_bounds
## Z P
## [1,] 2.4132 0.0158
## [2,] 2.4132 0.0158
## [3,] 2.4132 0.0158
## [4,] 2.4132 0.0158
## [5,] 2.4132 0.0158
Notice how far the p-value at the last (fifth) analysis is from the nominal 5% alpha level! Another issue that you need to realize is that the introduction of the Pocock interim review impacted the sample size. Recall the following line from the output above:
## N/Fixed design N: 1.21 p (1-sided) 0.0079 0.0079
In fact, the expected sample size is inflated by about 21% compared to the fixed design (more on that later).
To specify the O’Brien-Fleming method, just use
sfu = "OF"
. According to this method, it is much more
difficult to reject the null hypothesis at the beginning of the study
(i.e., you require a higher level of evidence to do so). The advantage
is that the p-value is close to the nominal level at the final
analysis.
# O'Brien-Fleming boundaries
of_design <- gsDesign(k = 5, test.type = 2, alpha = 0.025, sfu = "OF")
gsBoundSummary(of_design)
## Analysis Value Efficacy Futility
## IA 1: 20% Z 4.5617 -4.5617
## N/Fixed design N: 0.21 p (1-sided) 0.0000 0.0000
## ~delta at bound 3.1059 -3.1059
## P(Cross) if delta=0 0.0000 0.0000
## P(Cross) if delta=1 0.0010 0.0000
## IA 2: 40% Z 3.2256 -3.2256
## N/Fixed design N: 0.41 p (1-sided) 0.0006 0.0006
## ~delta at bound 1.5530 -1.5530
## P(Cross) if delta=0 0.0006 0.0006
## P(Cross) if delta=1 0.1254 0.0000
## IA 3: 60% Z 2.6337 -2.6337
## N/Fixed design N: 0.62 p (1-sided) 0.0042 0.0042
## ~delta at bound 1.0353 -1.0353
## P(Cross) if delta=0 0.0045 0.0045
## P(Cross) if delta=1 0.4675 0.0000
## IA 4: 80% Z 2.2809 -2.2809
## N/Fixed design N: 0.82 p (1-sided) 0.0113 0.0113
## ~delta at bound 0.7765 -0.7765
## P(Cross) if delta=0 0.0128 0.0128
## P(Cross) if delta=1 0.7516 0.0000
## Final Z 2.0401 -2.0401
## N/Fixed design N: 1.03 p (1-sided) 0.0207 0.0207
## ~delta at bound 0.6212 -0.6212
## P(Cross) if delta=0 0.0250 0.0250
## P(Cross) if delta=1 0.9000 0.0000
The following are the Z statistics and the associated p-value boundaries:
# Z's and p-values
cbind(
Z = gsBoundSummary(of_design)$Eff[gsBoundSummary(of_design)$Val == "Z"],
P = 2 * gsBoundSummary(of_design)$Eff[gsBoundSummary(of_design)$Val == "p (1-sided)"]
)
## Z P
## [1,] 4.5617 0.0000
## [2,] 3.2256 0.0012
## [3,] 2.6337 0.0084
## [4,] 2.2809 0.0226
## [5,] 2.0401 0.0414
Note that these are twice those listed in the previous output. Also note that the p-value for the final analysis is not too far from the nominal 5% alpha level despite four interim analyses. We will see why in a minute.
Another issue has to do with the sample inflation caused by the O’Brien-Fleming procedure. Recall the output from above:
## N/Fixed design N: 1.03 p (1-sided) 0.0207 0.0207
We see that the sample size is inflated less than 3% after four interim analysis—another advantage of the O’Brien-Fleming procedure.
par(mar = c(4, 4, 1, 1))
ss1 <- plot(of_design, main = "")
ss2 <- ss1 +
theme(legend.position = "none") +
theme(axis.text = element_text(size = 10)) +
theme(axis.title = element_text(size = 15, face = "bold"))
ss2
Figure 2. O’Brien-Fleming boundaries assuming k=5 looks and &alpha=0.05$.
Because we want to be able to carry out an analysis at any point—including adding ad hoc analyses during the execution of the study—we adopt the spending function approach. This approach governs how \alpha (Type-I error) will be spent during the study.
Consider a situation where we expend the \alpha level over k=5 analyses, such that \alpha_1=0.01, \cdots, \alpha_5=0.01.
This situation is handled by the following R code:
t <- c(0.2, 0.4, 0.6, 0.8, 1.0)
x <- gsDesign(k = 5, test.type = 2, beta = 0.2,
sfu = sfLinear, sfupar = c(t, t))
x
## Symmetric two-sided group sequential design with
## 80 % power and 2.5 % Type I Error.
## Spending computations assume trial stops
## if a bound is crossed.
##
## Sample
## Size
## Analysis Ratio* Z Nominal p Spend
## 1 0.23 2.58 0.0050 0.005
## 2 0.46 2.49 0.0064 0.005
## 3 0.69 2.41 0.0080 0.005
## 4 0.92 2.34 0.0097 0.005
## 5 1.15 2.28 0.0114 0.005
## Total 0.0250
##
## ++ alpha spending:
## Piecewise linear spending function with line points = 0.2, line points = 0.4, line points = 0.6, line points = 0.8, line points = 1, line points = 0.2, line points = 0.4, line points = 0.6, line points = 0.8, line points = 1.
## * Sample size ratio compared to fixed design with no interim
##
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
##
## Upper boundary (power or Type I Error)
## Analysis
## Theta 1 2 3 4 5 Total E{N}
## 0.0000 0.0050 0.0050 0.0050 0.0050 0.0050 0.025 1.1268
## 2.8016 0.1089 0.1908 0.2036 0.1714 0.1253 0.800 0.7849
##
## Lower boundary (futility or Type II Error)
## Analysis
## Theta 1 2 3 4 5 Total
## 0.0000 0.005 0.005 0.005 0.005 0.005 0.025
## 2.8016 0.000 0.000 0.000 0.000 0.000 0.000
This is shown pictorially as follows:
par(mar = c(4, 4, 1, 1))
ss1 <- plot(x, main = "")
ss2 <- ss1 +
theme(legend.position = "none") +
theme(axis.text = element_text(size = 10)) +
theme(axis.title = element_text(size = 15, face = "bold"))
ss2
Figure 3. Bounds for equally spending &alpha=0.05$ over k=5 analyses.
Note here that despite spending an equal amount of alpha at each analysis, the bounds are not constant. This is because, while, for the first analysis
P(|Z_1|>c_1)=\alpha_1=0.01
so that c_1=z_{1-a_1/2}, for the second interim analysis, c_2 must satisfy the #conditional probability#
P(|Z_2|>c_2|-c_1<Z_1<c_1)=\alpha_2=0.01
Thus, the boundary points c_1\neq c_2; and so on.
In their paper from 1983,~ introduced the concept of a continuous (alpha) spending function defined across all points t \in [0,1] (i.e., in all trial fractions from the beginning to the end of the study.) Several families of such functions exist:
These are \alpha(t)=\alpha\ln\left\{1+(e-1)t\right\} and \alpha(t)=2-2\Phi(z_{\alpha/4}/\sqrt{t}) for Pocock and O’Brien-Fleming-like functions respectively.
These functions have a general form given by \alpha(t)=\min\left (\alpha t^\rho, \alpha\right )
where \rho>0. Spending functions with \rho=0.8 approximate Pocock spending functions. (Note that this is close to the linear function employed in the example of Section~\ref{sec:linear} above), while \rho=3 approximates the O’Brien-Fleming function.
Power spending functions for several choices of \rho are shown in the following Figure.
# Plot of power family error spending functions
t <- seq(0, 1, .05)
p08 <- 0.05*t^0.8 # Pocock
p1 <- 0.05*t # Linear
p3 <- 0.05*t^3.0 # O-F
plot(t, p08, lty = 2, type = "l", ylab = "Alpha spent", xlab = "Trial fraction", main = "")
lines(t, p1, lty = 1, type = "l", lwd = 2)
lines(t, p3, lty = 3, type = "l", lwd = 2)
legend(0, .05, legend = c(expression(paste(rho, " = ", 0.8, " (Pocock)")),
expression(paste(rho, " = ", 1, " (linear)")),
expression(paste(rho, " = ", 3, " (O-F)"))),
lty = c(2, 1, 3), lwd = c(2, 2, 2), title = "Power spending functions")
Figure 4. Power family of spending functions for various choices of \rho
Hwang et al. (1990) proposed the following family of spending functions: \begin{aligned} \alpha(t)=\left \{ \begin{array}{cc} \frac{\alpha(1-e^{-\gamma t})}{1-e^{-\gamma}} & {\rm if} \ \gamma\neq 0 \\ \alpha t & {\rm if} \ \gamma=0 \end{array} \right . \end{aligned} Alpha is spent more quickly as \gamma increases (Figure 5).
# Plot of HSD error spending functions
t <- seq(0, 1, .05)
g0 <- 0.05*t # Pocock
g4 <- 0.05*(1-exp(4*t))/(1-exp(4)) # O-F
g2 <- 0.05*(1-exp(-2*t))/(1-exp(-2))
plot(t, g2, lty = 2, lwd = 2, type = "l",
ylab = "Alpha spent", xlab = "Trial fraction", main = "")
lines(t, g0, lty = 1, lwd = 2)
lines(t, g4, lty = 3, lwd = 2)
legend(0, .05, legend = c(expression(paste(gamma, " = ", 2)),
expression(paste(gamma, " = ", 0, " (Pocock)")),
expression(paste(gamma, " = ", -4, " (O-F)"))),
lty = c(2, 1, 3), lwd = c(2, 2, 2), title = "HSD spending functions")
Figure 5. HSD family of spending functions for various choices of \gamma
Spending functions were a “game changer” in monitoring clinical trials. The following simple example illustrates this fact.
Suppose you planned a study with k=3 total analyses (two interim and one final), carried out at trial fraction \tau_1=20\%, \tau_2=50\% and \tau_3=100\%. The analyses are conducted at \alpha=0.025 (one-sided) with power of 90% (i.e., \beta=0.1), using power spending functions with \rho=1 (i.e., a linear spending of the Type-1 error).
The study design at start is as follows:
# Inserting an additional interim analysis
t <- c(0.2, 0.5, 1.0)
gsDesign(k = 3, test.type = 1, sfu = sfLinear, sfupar = c(t, t), timing = t)
## One-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
## Sample
## Size
## Analysis Ratio* Z Nominal p Spend
## 1 0.218 2.58 0.0050 0.0050
## 2 0.544 2.38 0.0087 0.0075
## 3 1.088 2.14 0.0161 0.0125
## Total 0.0250
##
## ++ alpha spending:
## Piecewise linear spending function with line points = 0.2, line points = 0.5, line points = 1, line points = 0.2, line points = 0.5, line points = 1.
## * Sample size ratio compared to fixed design with no interim
##
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
##
## Upper boundary (power or Type I Error)
## Analysis
## Theta 1 2 3 Total E{N}
## 0.0000 0.0050 0.0075 0.0125 0.025 1.0791
## 3.2415 0.1436 0.3772 0.3791 0.900 0.7574
The alpha level will be spent over three analyses as follows:
The final analysis will be carried out with a p-value of p=0.0161.
Now suppose that after the second interim analysis at trial fraction \tau_2=0.5, we (or, more likely, the DSMB reviewing this study) decide to add another interim analysis at point \tau^*_3=0.75 (making now final analysis at the time point \tau^*_4=1.00). Spending functions allow us to do this almost effortlessly! Consider the following output:
# adding an interim look at t=0.75
t <- c(0.2, 0.5, 0.75, 1.0)
gsDesign(k = 4, test.type = 1, sfu = sfLinear, sfupar = c(t, t), timing = t)
## One-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
## Sample
## Size
## Analysis Ratio* Z Nominal p Spend
## 1 0.225 2.58 0.0050 0.0050
## 2 0.562 2.38 0.0087 0.0075
## 3 0.843 2.32 0.0102 0.0063
## 4 1.124 2.24 0.0124 0.0062
## Total 0.0250
##
## ++ alpha spending:
## Piecewise linear spending function with line points = 0.2, line points = 0.5, line points = 0.75, line points = 1, line points = 0.2, line points = 0.5, line points = 0.75, line points = 1.
## * Sample size ratio compared to fixed design with no interim
##
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
##
## Upper boundary (power or Type I Error)
## Analysis
## Theta 1 2 3 4 Total E{N}
## 0.0000 0.0050 0.0075 0.0063 0.0063 0.025 1.1134
## 3.2415 0.1494 0.3870 0.2326 0.1310 0.900 0.7067
There are remarkable aspects in this output worth noting:
Thus expending more alpha has downsides: it requires higher evidence threshold at final analysis if you do not interrupt study during any interim analyses.
Introducing several interim analyses on top of a fixed design invariably increases required sample size to preserve power.
Consider the three following cases shown in Figure 6:
x <- array(dim = 11)
for (i in 1:11) {
x[i] <- gsDesign(k = 5, test.type = 2, alpha = 0.025, sfu = sfHSD, sfupar = -i + 1)$en[1]
}
par(mar = c(4, 4, 1, 1))
plot(seq(0, -10, -1), x, type = "b", ylab = "Sample size inflation",
xlab = expression(paste(gamma, " parameter in the HSD spending function")))
Figure 6. Sample size inflation (compared to the fixed design) according to &gamma in the HSD family of spending functions
To understand why there is significant inflation in sample size we can consider the rates of spending (recall Figure 5 above).