Introduction

First, we import the gsDesign library for sequential monitoring analyses.

library(gsDesign)

Pocock Boundaries (Equally-Spaced Analyses)

Calculate boundaries for k = 5 equally spaced analyses with two-sided test and $\alpha=5\%$ . Note that gsDesign uses a one-sided $\alpha$ , so we specify $\alpha=0.025$ for 5% two-sided test.

The R package `gsDesign`

The gsDesign function offers a variety of analyses; however, this discussion will concentrate on boundary calculations. We will conduct five analyses, including both interim and final evaluations. By default, gsDesign assumes equally spaced analyses. It is crucial to specify that we desire a two-sided test (i.e., both lower and upper boundaries) and to define the type-I error rate. Notably, the function defaults to a one-sided type-I error; therefore, to achieve a 5% alpha level, we must specify alpha = 0.025.

Pocock boundaries

The following outlines the bounds proposed by Pocock for $k=5$ equally spaced interim analyses:

pocock_design <- gsDesign(k = 5, test.type = 2, alpha = 0.025, sfu = "Pocock")
gsBoundSummary(pocock_design)

##                Analysis               Value Efficacy Futility
##               IA 1: 20%                   Z   2.4132  -2.4132
##  N/Fixed design N: 0.24         p (1-sided)   0.0079   0.0079
##                             ~delta at bound   1.5155  -1.5155
##                         P(Cross) if delta=0   0.0079   0.0079
##                         P(Cross) if delta=1   0.2059   0.0000
##               IA 2: 40%                   Z   2.4132  -2.4132
##  N/Fixed design N: 0.48         p (1-sided)   0.0079   0.0079
##                             ~delta at bound   1.0716  -1.0716
##                         P(Cross) if delta=0   0.0138   0.0138
##                         P(Cross) if delta=1   0.4661   0.0000
##               IA 3: 60%                   Z   2.4132  -2.4132
##  N/Fixed design N: 0.72         p (1-sided)   0.0079   0.0079
##                             ~delta at bound   0.8749  -0.8749
##                         P(Cross) if delta=0   0.0183   0.0183
##                         P(Cross) if delta=1   0.6747   0.0000
##               IA 4: 80%                   Z   2.4132  -2.4132
##  N/Fixed design N: 0.97         p (1-sided)   0.0079   0.0079
##                             ~delta at bound   0.7577  -0.7577
##                         P(Cross) if delta=0   0.0219   0.0219
##                         P(Cross) if delta=1   0.8149   0.0000
##                   Final                   Z   2.4132  -2.4132
##  N/Fixed design N: 1.21         p (1-sided)   0.0079   0.0079
##                             ~delta at bound   0.6777  -0.6777
##                         P(Cross) if delta=0   0.0250   0.0250
##                         P(Cross) if delta=1   0.9000   0.0000

The boundaries remain constant across analyses, with the Z-value for a two-sided test calculated as $Z=\pm2.41$ and the p-value we have to use in a two-sided test is $2\times 0.0079=0.0158$ . You can ignore the remaining output at this stage. We can also use the $\verb|plot|$ to produce a graph.

library(ggplot2)
par(mar = c(4, 4, 1, 1))
ss1 <- plot(pocock_design, main = "")
ss2 <- ss1 + 
  theme(legend.position = "none") +
  theme(axis.text = element_text(size = 10)) +
  theme(axis.title = element_text(size = 15, face = "bold"))
ss2

<b>Figure 1. </b>Pocock boundaries assuming $k=5$ looks and &alpha=0.05$.

Figure 1. Pocock boundaries assuming $k=5$ looks and &alpha=0.05$.

The following are the $Z$ statistics and the associated $p$ -value boundaries:

# Extract Z statistics and p-values
pocock_bounds <- cbind(
  Z = gsBoundSummary(pocock_design)$Eff[gsBoundSummary(pocock_design)$Val == "Z"],
  P = 2 * gsBoundSummary(pocock_design)$Eff[gsBoundSummary(pocock_design)$Val == "p (1-sided)"]
)
pocock_bounds

##           Z      P
## [1,] 2.4132 0.0158
## [2,] 2.4132 0.0158
## [3,] 2.4132 0.0158
## [4,] 2.4132 0.0158
## [5,] 2.4132 0.0158

Notice how far the p-value at the last (fifth) analysis is from the nominal 5% alpha level! Another issue that you need to realize is that the introduction of the Pocock interim review impacted the sample size. Recall the following line from the output above:

##  N/Fixed design N: 1.21         p (1-sided)   0.0079   0.0079

In fact, the expected sample size is inflated by about 21% compared to the fixed design (more on that later).

O’Brien-Fleming boundaries (Equally-Spaced Analyses)

To specify the O’Brien-Fleming method, just use sfu = "OF". According to this method, it is much more difficult to reject the null hypothesis at the beginning of the study (i.e., you require a higher level of evidence to do so). The advantage is that the p-value is close to the nominal level at the final analysis.

# O'Brien-Fleming boundaries
of_design <- gsDesign(k = 5, test.type = 2, alpha = 0.025, sfu = "OF")
gsBoundSummary(of_design)

##                Analysis               Value Efficacy Futility
##               IA 1: 20%                   Z   4.5617  -4.5617
##  N/Fixed design N: 0.21         p (1-sided)   0.0000   0.0000
##                             ~delta at bound   3.1059  -3.1059
##                         P(Cross) if delta=0   0.0000   0.0000
##                         P(Cross) if delta=1   0.0010   0.0000
##               IA 2: 40%                   Z   3.2256  -3.2256
##  N/Fixed design N: 0.41         p (1-sided)   0.0006   0.0006
##                             ~delta at bound   1.5530  -1.5530
##                         P(Cross) if delta=0   0.0006   0.0006
##                         P(Cross) if delta=1   0.1254   0.0000
##               IA 3: 60%                   Z   2.6337  -2.6337
##  N/Fixed design N: 0.62         p (1-sided)   0.0042   0.0042
##                             ~delta at bound   1.0353  -1.0353
##                         P(Cross) if delta=0   0.0045   0.0045
##                         P(Cross) if delta=1   0.4675   0.0000
##               IA 4: 80%                   Z   2.2809  -2.2809
##  N/Fixed design N: 0.82         p (1-sided)   0.0113   0.0113
##                             ~delta at bound   0.7765  -0.7765
##                         P(Cross) if delta=0   0.0128   0.0128
##                         P(Cross) if delta=1   0.7516   0.0000
##                   Final                   Z   2.0401  -2.0401
##  N/Fixed design N: 1.03         p (1-sided)   0.0207   0.0207
##                             ~delta at bound   0.6212  -0.6212
##                         P(Cross) if delta=0   0.0250   0.0250
##                         P(Cross) if delta=1   0.9000   0.0000

The following are the $Z$ statistics and the associated $p$ -value boundaries:

# Z's and p-values
cbind(
  Z = gsBoundSummary(of_design)$Eff[gsBoundSummary(of_design)$Val == "Z"],
  P = 2 * gsBoundSummary(of_design)$Eff[gsBoundSummary(of_design)$Val == "p (1-sided)"]
)

##           Z      P
## [1,] 4.5617 0.0000
## [2,] 3.2256 0.0012
## [3,] 2.6337 0.0084
## [4,] 2.2809 0.0226
## [5,] 2.0401 0.0414

Note that these are twice those listed in the previous output. Also note that the p-value for the final analysis is not too far from the nominal 5% alpha level despite four interim analyses. We will see why in a minute.

Another issue has to do with the sample inflation caused by the O’Brien-Fleming procedure. Recall the output from above:

##  N/Fixed design N: 1.03         p (1-sided)   0.0207   0.0207

We see that the sample size is inflated less than 3% after four interim analysis—another advantage of the O’Brien-Fleming procedure.

par(mar = c(4, 4, 1, 1))
ss1 <- plot(of_design, main = "")
ss2 <- ss1 + 
  theme(legend.position = "none") +
  theme(axis.text = element_text(size = 10)) +
  theme(axis.title = element_text(size = 15, face = "bold"))
ss2

<b>Figure 2.</b> O'Brien-Fleming boundaries assuming $k=5$ looks and &alpha=0.05$.

Figure 2. O’Brien-Fleming boundaries assuming $k=5$ looks and &alpha=0.05$.

Alpha spending functions

Because we want to be able to carry out an analysis at any point—including adding ad hoc analyses during the execution of the study—we adopt the spending function approach. This approach governs how $\alpha$ (Type-I error) will be spent during the study.

Example: Spending Equal Amount of Error at Each Analysis

Consider a situation where we expend the $\alpha$ level over $k=5$ analyses, such that $\alpha_1=0.01, \cdots, \alpha_5=0.01$ .

This situation is handled by the following R code:

t <- c(0.2, 0.4, 0.6, 0.8, 1.0)
x <- gsDesign(k = 5, test.type = 2, beta = 0.2,
              sfu = sfLinear, sfupar = c(t, t))
x

## Symmetric two-sided group sequential design with
## 80 % power and 2.5 % Type I Error.
## Spending computations assume trial stops
## if a bound is crossed.
## 
##            Sample
##             Size 
##   Analysis Ratio*  Z   Nominal p Spend
##          1   0.23 2.58    0.0050 0.005
##          2   0.46 2.49    0.0064 0.005
##          3   0.69 2.41    0.0080 0.005
##          4   0.92 2.34    0.0097 0.005
##          5   1.15 2.28    0.0114 0.005
##      Total                       0.0250 
## 
## ++ alpha spending:
##  Piecewise linear spending function with line points = 0.2, line points = 0.4, line points = 0.6, line points = 0.8, line points = 1, line points = 0.2, line points = 0.4, line points = 0.6, line points = 0.8, line points = 1.
## * Sample size ratio compared to fixed design with no interim
## 
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
## 
## Upper boundary (power or Type I Error)
##           Analysis
##    Theta      1      2      3      4      5 Total   E{N}
##   0.0000 0.0050 0.0050 0.0050 0.0050 0.0050 0.025 1.1268
##   2.8016 0.1089 0.1908 0.2036 0.1714 0.1253 0.800 0.7849
## 
## Lower boundary (futility or Type II Error)
##           Analysis
##    Theta     1     2     3     4     5 Total
##   0.0000 0.005 0.005 0.005 0.005 0.005 0.025
##   2.8016 0.000 0.000 0.000 0.000 0.000 0.000

This is shown pictorially as follows:

par(mar = c(4, 4, 1, 1))
ss1 <- plot(x, main = "")
ss2 <- ss1 + 
  theme(legend.position = "none") +
  theme(axis.text = element_text(size = 10)) +
  theme(axis.title = element_text(size = 15, face = "bold"))
ss2

<b>Figure 3.</b> Bounds for equally spending &alpha=0.05$ over $k=5$ analyses.

Figure 3. Bounds for equally spending &alpha=0.05$ over $k=5$ analyses.

Note here that despite spending an equal amount of alpha at each analysis, the bounds are not constant. This is because, while, for the first analysis

$P(|Z_1|>c_1)=\alpha_1=0.01$

so that $c_1=z_{1-a_1/2}$ , for the second interim analysis, $c_2$ must satisfy the #conditional probability#

$P(|Z_2|>c_2|-c_1<Z_1<c_1)=\alpha_2=0.01$

Thus, the boundary points $c_1\neq c_2$ ; and so on.

Continuous spending functions

In their paper from 1983,~ introduced the concept of a continuous (alpha) spending function defined across all points $t \in [0,1]$ (i.e., in all trial fractions from the beginning to the end of the study.) Several families of such functions exist:

Approximate Pocock and O’Brien-Fleming Spending Functions

These are $\alpha(t)=\alpha\ln\left\{1+(e-1)t\right\}$ and $\alpha(t)=2-2\Phi(z_{\alpha/4}/\sqrt{t})$ for Pocock and O’Brien-Fleming-like functions respectively.

The Power Spending Function Family

These functions have a general form given by $\alpha(t)=\min\left (\alpha t^\rho, \alpha\right )$

where $\rho>0$ . Spending functions with $\rho=0.8$ approximate Pocock spending functions. (Note that this is close to the linear function employed in the example of Section~ $\ref{sec:linear}$ above), while $\rho=3$ approximates the O’Brien-Fleming function.

Power spending functions for several choices of $\rho$ are shown in the following Figure.

# Plot of power family error spending functions
t <- seq(0, 1, .05)

p08 <- 0.05*t^0.8  # Pocock
p1 <- 0.05*t       # Linear
p3 <- 0.05*t^3.0   # O-F

plot(t, p08, lty = 2, type = "l", ylab = "Alpha spent", xlab = "Trial fraction", main = "")
lines(t, p1, lty = 1, type = "l", lwd = 2)
lines(t, p3, lty = 3, type = "l", lwd = 2)

legend(0, .05, legend = c(expression(paste(rho, " = ", 0.8, " (Pocock)")),
                         expression(paste(rho, " = ", 1, " (linear)")),
                         expression(paste(rho, " = ", 3, " (O-F)"))), 
       lty = c(2, 1, 3), lwd = c(2, 2, 2), title = "Power spending functions")

$**Figure 4.** Power family of spending functions for various choices of $\rho$$

Figure 4. Power family of spending functions for various choices of $\rho$

The Hwang-Shi-DeCani Family of Spending Functions

Hwang et al. (1990) proposed the following family of spending functions: $\begin{aligned} \alpha(t)=\left \{ \begin{array}{cc} \frac{\alpha(1-e^{-\gamma t})}{1-e^{-\gamma}} & {\rm if} \ \gamma\neq 0 \\ \alpha t & {\rm if} \ \gamma=0 \end{array} \right . \end{aligned}$ Alpha is spent more quickly as $\gamma$ increases (Figure 5).

# Plot of HSD error spending functions
t <- seq(0, 1, .05)

g0 <- 0.05*t                          # Pocock
g4 <- 0.05*(1-exp(4*t))/(1-exp(4))    # O-F
g2 <- 0.05*(1-exp(-2*t))/(1-exp(-2))

plot(t, g2, lty = 2, lwd = 2, type = "l", 
     ylab = "Alpha spent", xlab = "Trial fraction", main = "")
lines(t, g0, lty = 1, lwd = 2)
lines(t, g4, lty = 3, lwd = 2)

legend(0, .05, legend = c(expression(paste(gamma, " =   ", 2)),
                         expression(paste(gamma, " =   ", 0, " (Pocock)")),
                         expression(paste(gamma, " = ", -4, " (O-F)"))), 
       lty = c(2, 1, 3), lwd = c(2, 2, 2), title = "HSD spending functions")

$**Figure 5.** HSD family of spending functions for various choices of $\gamma$$

Figure 5. HSD family of spending functions for various choices of $\gamma$

Applications of spending functions

Spending functions were a “game changer” in monitoring clinical trials. The following simple example illustrates this fact.

Example: Inserting an interim analysis after the start of a study

Suppose you planned a study with $k=3$ total analyses (two interim and one final), carried out at trial fraction $\tau_1=20\%$ , $\tau_2=50\%$ and $\tau_3=100\%$ . The analyses are conducted at $\alpha=0.025$ (one-sided) with power of 90% (i.e., $\beta=0.1$ ), using power spending functions with $\rho=1$ (i.e., a linear spending of the Type-1 error).

The study design at start is as follows:

# Inserting an additional interim analysis
t <- c(0.2, 0.5, 1.0)
gsDesign(k = 3, test.type = 1, sfu = sfLinear, sfupar = c(t, t), timing = t)

## One-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
##            Sample
##             Size 
##   Analysis Ratio*  Z   Nominal p  Spend
##          1  0.218 2.58    0.0050 0.0050
##          2  0.544 2.38    0.0087 0.0075
##          3  1.088 2.14    0.0161 0.0125
##      Total                       0.0250 
## 
## ++ alpha spending:
##  Piecewise linear spending function with line points = 0.2, line points = 0.5, line points = 1, line points = 0.2, line points = 0.5, line points = 1.
## * Sample size ratio compared to fixed design with no interim
## 
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
## 
## Upper boundary (power or Type I Error)
##           Analysis
##    Theta      1      2      3 Total   E{N}
##   0.0000 0.0050 0.0075 0.0125 0.025 1.0791
##   3.2415 0.1436 0.3772 0.3791 0.900 0.7574

The alpha level will be spent over three analyses as follows:

The final analysis will be carried out with a p-value of $p=0.0161$ .

Now suppose that after the second interim analysis at trial fraction $\tau_2=0.5$ , we (or, more likely, the DSMB reviewing this study) decide to add another interim analysis at point $\tau^*_3=0.75$ (making now final analysis at the time point $\tau^*_4=1.00$ ). Spending functions allow us to do this almost effortlessly! Consider the following output:

# adding an interim look at t=0.75
t <- c(0.2, 0.5, 0.75, 1.0)
gsDesign(k = 4, test.type = 1, sfu = sfLinear, sfupar = c(t, t), timing = t)

## One-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
##            Sample
##             Size 
##   Analysis Ratio*  Z   Nominal p  Spend
##          1  0.225 2.58    0.0050 0.0050
##          2  0.562 2.38    0.0087 0.0075
##          3  0.843 2.32    0.0102 0.0063
##          4  1.124 2.24    0.0124 0.0062
##      Total                       0.0250 
## 
## ++ alpha spending:
##  Piecewise linear spending function with line points = 0.2, line points = 0.5, line points = 0.75, line points = 1, line points = 0.2, line points = 0.5, line points = 0.75, line points = 1.
## * Sample size ratio compared to fixed design with no interim
## 
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
## 
## Upper boundary (power or Type I Error)
##           Analysis
##    Theta      1      2      3      4 Total   E{N}
##   0.0000 0.0050 0.0075 0.0063 0.0063 0.025 1.1134
##   3.2415 0.1494 0.3870 0.2326 0.1310 0.900 0.7067

There are remarkable aspects in this output worth noting:

The addition of a third analysis depended only on cumulative alpha spent up to second analysis.
Adding third interim analysis required final analysis to be carried out at an alpha level of 0.062 (or a p-value of 0.0124) rather than 0.0125 like in previous situation where final analysis was third.

Thus expending more alpha has downsides: it requires higher evidence threshold at final analysis if you do not interrupt study during any interim analyses.

Effect of Group Sequential Monitoring on Sample Size

Introducing several interim analyses on top of a fixed design invariably increases required sample size to preserve power.

Consider the three following cases shown in Figure 6:

x <- array(dim = 11)
for (i in 1:11) {
    x[i] <- gsDesign(k = 5, test.type = 2, alpha = 0.025, sfu = sfHSD, sfupar = -i + 1)$en[1]
}
par(mar = c(4, 4, 1, 1))
plot(seq(0, -10, -1), x, type = "b", ylab = "Sample size inflation",
     xlab = expression(paste(gamma, " parameter in the HSD spending function")))

<b>Figure 6.</b> Sample size inflation (compared to the fixed design) according to &gamma in the HSD family of spending functions

Figure 6. Sample size inflation (compared to the fixed design) according to &gamma in the HSD family of spending functions

To understand why there is significant inflation in sample size we can consider the rates of spending (recall Figure 5 above).

The earlier alpha is spent results in higher sample size inflation compared to fixed design.
This explains why Pocock procedure resulted in larger sample size inflation compared to O’Brien-Fleming method.

Lab 5: Sequential monitoring of clinical trials

Solutions

January 15, 2025

Introduction

Pocock Boundaries (Equally-Spaced Analyses)

The R package `gsDesign`

Pocock boundaries

O’Brien-Fleming boundaries (Equally-Spaced Analyses)

Alpha spending functions

Example: Spending Equal Amount of Error at Each Analysis

Continuous spending functions

Approximate Pocock and O’Brien-Fleming Spending Functions

The Power Spending Function Family

The Hwang-Shi-DeCani Family of Spending Functions

Applications of spending functions

Example: Inserting an interim analysis after the start of a study

Effect of Group Sequential Monitoring on Sample Size

Lab 5: Sequential monitoring of clinical trials

Solutions

January 15, 2025

Introduction

Pocock Boundaries (Equally-Spaced Analyses)

The R package gsDesign

Pocock boundaries

O’Brien-Fleming boundaries (Equally-Spaced Analyses)

Alpha spending functions

Example: Spending Equal Amount of Error at Each Analysis

Continuous spending functions

Approximate Pocock and O’Brien-Fleming Spending Functions

The Power Spending Function Family

The Hwang-Shi-DeCani Family of Spending Functions

Applications of spending functions

Example: Inserting an interim analysis after the start of a study

Effect of Group Sequential Monitoring on Sample Size

The R package `gsDesign`