Introduction

First, we import the library gsDesign into R.

################################################
### Sequential monitoring of clinical trials ###
################################################
library(gsDesign)

The R package gsDesign

The function gsDesign can carry out lots of analyses but here we just focus on the calculation of boundaries. We tell \(\verb|R|\) that we want 5 analyses \(k=5\) (including interim and final). By default, the function gsDesign assumes equally spaced analyses. Also, we specify that we want two sided test (i.e. both lower and upper boundaries) and type-I error \(\alpha=5\)%. Be careful here as the function always used one-sided type-I error. Thus, if you want a 5% alpha, you need to specify alpha=0.025!

Pocock boundaries (equally-spaced analyses)

The following are the bounds proposed by Pocock for \(k=5\) equally spaced interim analyses.

# Pocock boundaries
x1 = gsDesign(k = 5,test.type = 2, alpha = 0.025, sfu = "Pocock")
gsBoundSummary(x1)
##                Analysis               Value Efficacy Futility
##               IA 1: 20%                   Z   2.4132  -2.4132
##  N/Fixed design N: 0.24         p (1-sided)   0.0079   0.0079
##                              delta at bound   1.5155  -1.5155
##                         P(Cross) if delta=0   0.0079   0.0079
##                         P(Cross) if delta=1   0.2059   0.0000
##               IA 2: 40%                   Z   2.4132  -2.4132
##  N/Fixed design N: 0.48         p (1-sided)   0.0079   0.0079
##                              delta at bound   1.0716  -1.0716
##                         P(Cross) if delta=0   0.0138   0.0138
##                         P(Cross) if delta=1   0.4661   0.0000
##               IA 3: 60%                   Z   2.4132  -2.4132
##  N/Fixed design N: 0.72         p (1-sided)   0.0079   0.0079
##                              delta at bound   0.8749  -0.8749
##                         P(Cross) if delta=0   0.0183   0.0183
##                         P(Cross) if delta=1   0.6747   0.0000
##               IA 4: 80%                   Z   2.4132  -2.4132
##  N/Fixed design N: 0.97         p (1-sided)   0.0079   0.0079
##                              delta at bound   0.7577  -0.7577
##                         P(Cross) if delta=0   0.0219   0.0219
##                         P(Cross) if delta=1   0.8149   0.0000
##                   Final                   Z   2.4132  -2.4132
##  N/Fixed design N: 1.21         p (1-sided)   0.0079   0.0079
##                              delta at bound   0.6777  -0.6777
##                         P(Cross) if delta=0   0.0250   0.0250
##                         P(Cross) if delta=1   0.9000   0.0000

The boundaries are constant across the analyses \(Z=\pm2.41\) and the p-value we have to use in a two-sided test is \(2\times 0.0079=0.0158\). You can ignore the remaining output at this stage. We can also use the \(\verb|plot|\) to produce a graph.

par(mar=c(4,4,1,1))
ss1<-plot(x1, main = "")
ss2<-ss1+theme(legend.position = "none")+
  theme(axis.text=element_text(size=10))+
  theme(axis.title=element_text(size=15,face="bold"))
ss2
<b>Figure 1. </b>Pocock boundaries assuming $k=5$ looks and &alpha=0.05$.

Figure 1. Pocock boundaries assuming \(k=5\) looks and &alpha=0.05$.

The following are the \(Z\) statistics and the associated \(p\) value boundaries

# Z's and p-values
cbind(Z = gsBoundSummary(x1)$Eff[gsBoundSummary(x1)$Val=="Z"],
      P = 2*gsBoundSummary(x1)$Eff[gsBoundSummary(x1)$Val=="p (1-sided)"])
##           Z      P
## [1,] 2.4132 0.0158
## [2,] 2.4132 0.0158
## [3,] 2.4132 0.0158
## [4,] 2.4132 0.0158
## [5,] 2.4132 0.0158

Notice how far the p-value at the last (fifth) analysis is from the nominal 5% alpha level!

Another issue that you need to realize is that the introduction of the Pocock interim review impacted the sample size.

Recall the following line from the output above:

##  N/Fixed design N: 1.21         p (1-sided)   0.0079   0.0079

In fact, the expected sample size is inflated by about 21% compared to the fixed design (more on that later).

O’Brien-Fleming boundaries (equally-spaced analyses)

To specify the O’Brien-Fleming method, just use sfu = "OF". Note that according to the O’Brien-Fleming method it’s much more difficult to reject the null hypothesis in the beginning of the study (i.e. you require a higher level of evidence to do so). The advantage is that the p-value is close to the nominal level at the final analysis.

# O'Brien-Fleming boundaries
x2 = gsDesign(k = 5,test.type = 2, alpha = 0.025, sfu = "OF")
gsBoundSummary(x2)
##                Analysis               Value Efficacy Futility
##               IA 1: 20%                   Z   4.5617  -4.5617
##  N/Fixed design N: 0.21         p (1-sided)   0.0000   0.0000
##                              delta at bound   3.1059  -3.1059
##                         P(Cross) if delta=0   0.0000   0.0000
##                         P(Cross) if delta=1   0.0010   0.0000
##               IA 2: 40%                   Z   3.2256  -3.2256
##  N/Fixed design N: 0.41         p (1-sided)   0.0006   0.0006
##                              delta at bound   1.5530  -1.5530
##                         P(Cross) if delta=0   0.0006   0.0006
##                         P(Cross) if delta=1   0.1254   0.0000
##               IA 3: 60%                   Z   2.6337  -2.6337
##  N/Fixed design N: 0.62         p (1-sided)   0.0042   0.0042
##                              delta at bound   1.0353  -1.0353
##                         P(Cross) if delta=0   0.0045   0.0045
##                         P(Cross) if delta=1   0.4675   0.0000
##               IA 4: 80%                   Z   2.2809  -2.2809
##  N/Fixed design N: 0.82         p (1-sided)   0.0113   0.0113
##                              delta at bound   0.7765  -0.7765
##                         P(Cross) if delta=0   0.0128   0.0128
##                         P(Cross) if delta=1   0.7516   0.0000
##                   Final                   Z   2.0401  -2.0401
##  N/Fixed design N: 1.03         p (1-sided)   0.0207   0.0207
##                              delta at bound   0.6212  -0.6212
##                         P(Cross) if delta=0   0.0250   0.0250
##                         P(Cross) if delta=1   0.9000   0.0000

The following are the \(Z\) statistics and the associated \(p\) value boundaries

# Z's and p-values
cbind(Z = gsBoundSummary(x2)$Eff[gsBoundSummary(x2)$Val=="Z"],
      P = 2*gsBoundSummary(x2)$Eff[gsBoundSummary(x2)$Val=="p (1-sided)"])
##           Z      P
## [1,] 4.5617 0.0000
## [2,] 3.2256 0.0012
## [3,] 2.6337 0.0084
## [4,] 2.2809 0.0226
## [5,] 2.0401 0.0414

Note that these are twice those listed in the previous output. Also note that the p value for the final analysis is not too far from the nominal 5% alpha level despite four interim analyses. We will see why in a minute.

Another issue has to do with the samle inflation caused by the O’Brien-Fleming procedure. Recall the output from above:

##  N/Fixed design N: 1.03         p (1-sided)   0.0207   0.0207

We see that the sample size is inflated less than 3% after four interim analysis (!) another advantage of the O’Brien-Fleming procedure.

par(mar=c(4,4,1,1))
ss1<-plot(x2,main = "")
ss2<-ss1+theme(legend.position = "none")+
  theme(axis.text=element_text(size=10))+
  theme(axis.title=element_text(size=15,face="bold"))
ss2
<b>Figure 2.</b> O'Brien-Fleming boundaries assuming $k=5$ looks and &alpha=0.05$.

Figure 2. O’Brien-Fleming boundaries assuming \(k=5\) looks and &alpha=0.05$.

Alpha spending functions

Because we want to be able to carry out an analysis at any point, including adding ad hoc analyses at any point during the execution of the study, we adopt the approach. This approach literally governs how the alpha (Type-I error) will be spent during the study.

Example: Spending equal amount of error at each analysis

Consider the following situation where we expend the alpha level over \(k=5\) analyses, so that \(\alpha_1=0.01, \cdots, \alpha_5=0.01\).

The situation is handled by the following code:

t<-c(0.2,0.4,0.6,0.8,1.0)
x <- gsDesign(k=5, test.type=2, beta=0.2, 
              sfu=sfLinear, sfupar=c(t,t))
x
## Symmetric two-sided group sequential design with
## 80 % power and 2.5 % Type I Error.
## Spending computations assume trial stops
## if a bound is crossed.
## 
##            Sample
##             Size 
##   Analysis Ratio*  Z   Nominal p Spend
##          1   0.23 2.58    0.0050 0.005
##          2   0.46 2.49    0.0064 0.005
##          3   0.69 2.41    0.0080 0.005
##          4   0.92 2.34    0.0097 0.005
##          5   1.15 2.28    0.0114 0.005
##      Total                       0.0250 
## 
## ++ alpha spending:
##  Piecewise linear spending function with line points = 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1.
## * Sample size ratio compared to fixed design with no interim
## 
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
## 
## Upper boundary (power or Type I Error)
##           Analysis
##    Theta      1      2      3      4      5 Total   E{N}
##   0.0000 0.0050 0.0050 0.0050 0.0050 0.0050 0.025 1.1268
##   2.8016 0.1089 0.1908 0.2036 0.1714 0.1253 0.800 0.7849
## 
## Lower boundary (futility or Type II Error)
##           Analysis
##    Theta     1     2     3     4     5 Total
##   0.0000 0.005 0.005 0.005 0.005 0.005 0.025
##   2.8016 0.000 0.000 0.000 0.000 0.000 0.000

This is shown pictorially as follows:

par(mar=c(4,4,1,1))
ss1<- plot(x, main="")
ss2<-ss1+theme(legend.position = "none")+
  theme(axis.text=element_text(size=10))+
  theme(axis.title=element_text(size=15,face="bold"))
ss2
<b>Figure 3.</b> Bounds for equally spending &alpha=0.05$ over $k=5$ analyses.

Figure 3. Bounds for equally spending &alpha=0.05$ over \(k=5\) analyses.

Note here that, despite the fact that an equal amount of alpha is spent at each analysis, the bounds are not constant. This is because, while, for the first analysis

\[ P(|Z_1|>c_1)=\alpha_1=0.01 \]

so that \(c_1=z_{1-a_1/2}\), for the second interim analysis, \(c_2\) must satisfy the #conditional probability#

\[ P(|Z_2|>c_2|-c_1<Z_1<c_1)=\alpha_2=0.01 \]

Thus, the boundary points \(c_1\neq c_2\); and so on.

Continuous spending functions

In their 1983 paper,~ introduced the concept of a (alpha) spending function. This is a function that is defined in all points \(t \in [0,1]\) (i.e., in all trial fractions from the beginning to the end of the study.) There are several families of such functions:

Approximate Pocock and O’Brien-Fleming spending functions

These are \[ \alpha(t)=\alpha\ln\left\{1+(e-1)t\right\} \] and \[ \alpha(t)=2-2\Phi(z_{\alpha/4}/\sqrt{t}) \] for the case of the Pocock and O’Brien-Fleming-like functions respectively.

The Power spending function family

These spending functions have the general form \[ \alpha(t)=\min\left (\alpha t^\rho, \alpha\right ) \]

where \(\rho>0\). Spending functions with \(\rho=0.8\) approximate the Pocock spending function. (Note that this is close to the linear function employed in the example of Section~ above), while \(\rho=3\) approximates the O’Brien-Fleming function . Power spending functions for several choices of \(\rho\) are shown in the following Figure.

# Plot of power family error spending functions
t<-seq(0,1,.05)

p08=0.05*t^0.8 # Pocock
p1=0.05*t # Linear
p3=0.05*t^3.0 # O-F

plot(t,p08, lty=2, type="l", ylab="Alpha spent", xlab="Trial fraction", main="")
lines(t, p1, lty=1, type="l", lwd=2)
lines(t, p3, lty=3, type="l", lwd=2)

legend(0,.05, legend = c(expression(paste(rho, " = ", 0.8, " (Pocock)")),
                         expression(paste(rho, " = ", 1, " (linear)")),
                         expression(paste(rho, " = ", 3, " (O-F)"))), 
       lty=c(2,1,3), lwd=c(2,2,2), title="Power spending functions")
<b> Figure 4. </b>Power family of spending functions for various choices of $
ho$.

Figure 4. Power family of spending functions for various choices of $ ho$.

The Hwang, Shi & DeCani family of spending functions

Hwang, Shih & DeCani (1990) proposed the following family of spending functions: \[\begin{eqnarray*} \alpha(t)=\left \{ \begin{array}{cc} \alpha(1-e^{-\gamma t})/(1-e^\gamma) & {\rm if} \gamma\neq 0 \\ \alpha t & {\rm if} \gamma=0 \end{array} \right . \end{eqnarray*}\]

Alpha is spent more quickly the larger the value of \(\gamma\). This is shown in Figure~5.

# Plot of HSD error spending functions
t<-seq(0,1,.05)

g0=0.05*t # Pocock
g4=0.05*(1-exp(4*t))/(1-exp(4)) # O-F
g2=0.05*(1-exp(-2*t))/(1-exp(-2))

plot(t,g2, lty=2, lwd=2, type="l", 
     ylab="Alpha spent", xlab="Trial fraction", main="")
lines(t,g0, lty=1, lwd=2)
lines(t,g4, lty=3, lwd=2)

legend(0,.05, legend = c(expression(paste(gamma, " =   ", 2)),
                         expression(paste(gamma, " =   ", 0, " (Pocock)")),
                         expression(paste(gamma, " = ", -4, " (O-F)"))), 
       lty=c(2,1,3), lwd=c(2,2,2), title="HSD spending functions")
<b>Figure 5.</b> HSD family of spending functions for various choices of &gamma.

Figure 5. HSD family of spending functions for various choices of &gamma.

Applications of spending functions

Spending functions were a ``game changer" in the effort to monitor a clinical trial. The following simple example underlines this fact.

Example: Inserting an interim analysis after the start of a study

Suppose that you have planned a study with \(k=3\) total analyses (two interim and one final), to be carried out at the \(\tau_1=20\%\), \(\tau_2=50\%\) and \(\tau_3=100\%\) trial fractions. The analyses are going to be carried out at the \(\alpha=0.025\) (one-sided) with power 90% (i.e., \(\beta=0.1\)) and we will use a power spending functions with \(\rho=1\) (i.e., a linear spending of the Type-1 error). The study design, as it stands at the start is as follows:

# Inserting an additional interim analysis
t<-c(0.2,0.5,1.0)
gsDesign(k=3, test.type=1, sfu=sfLinear, sfupar=c(t,t), timing=t)
## One-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
##            Sample
##             Size 
##   Analysis Ratio*  Z   Nominal p  Spend
##          1  0.218 2.58    0.0050 0.0050
##          2  0.544 2.38    0.0087 0.0075
##          3  1.088 2.14    0.0161 0.0125
##      Total                       0.0250 
## 
## ++ alpha spending:
##  Piecewise linear spending function with line points = 0.2 0.5 1 0.2 0.5 1.
## * Sample size ratio compared to fixed design with no interim
## 
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
## 
## Upper boundary (power or Type I Error)
##           Analysis
##    Theta      1      2      3 Total   E{N}
##   0.0000 0.0050 0.0075 0.0125 0.025 1.0791
##   3.2415 0.1436 0.3772 0.3791 0.900 0.7574

We can see that the alpha level will be spent in the three analyses as

and the final analysis will be carried out at a p value \(p=0.0161\).

Now suppose that, after the second interim analysis at the \(\tau_2=0.5\) trial fraction, we (or, more likely, the DSMB reviewing this study) decide to add one more interim analysis at the point \(\tau^*_3=0.75\) (making now the final analysis at the time point \(\tau^*_4=1.00\)). Spending functions allow us to do this almost effortlessly! Consider the following output:

# adding an interim look at t=0.75
t<-c(0.2,0.5, 0.75, 1.0)
gsDesign(k=4, test.type=1, sfu=sfLinear, sfupar=c(t,t), timing=t)
## One-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
##            Sample
##             Size 
##   Analysis Ratio*  Z   Nominal p  Spend
##          1  0.225 2.58    0.0050 0.0050
##          2  0.562 2.38    0.0087 0.0075
##          3  0.843 2.32    0.0102 0.0063
##          4  1.124 2.24    0.0124 0.0062
##      Total                       0.0250 
## 
## ++ alpha spending:
##  Piecewise linear spending function with line points = 0.2 0.5 0.75 1 0.2 0.5 0.75 1.
## * Sample size ratio compared to fixed design with no interim
## 
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
## 
## Upper boundary (power or Type I Error)
##           Analysis
##    Theta      1      2      3      4 Total   E{N}
##   0.0000 0.0050 0.0075 0.0063 0.0063 0.025 1.1134
##   3.2415 0.1494 0.3870 0.2326 0.1310 0.900 0.7067

There are a couple of remarkable things in the previous output that must be remarked on. First of all, the addition of the third analysis only depended on the alpha level spent cumulatively up to the second analysis (had a revision been necessary, this would be have been an unworkable situation). On the other hand, adding the third interim analysis, and the additional alpha spent prior to the final analysis, required the final analysis to be carried out at an alpha level of 0.062 (or a p value of 0.0124) rather than 0.0125 like in the previous situation (where the final analysis was the third analysis). Thus, expending more alpha level has a downside: It requires a higher threshold of evidence at the final analysis if you do not interrupt the study during any of the interim analyses.

Effect of group sequential monitoring on sample size

The effect of introducing a number of interim analysis on top of a fixed design invariably results in increasing the required sample size to preserve the power.

Consider the three following cases shown in Figure 6

x<-array(dim=11)
for(i in 1:11) {
x[i]<-gsDesign(k=5, test.type=2, alpha=0.025, sfu=sfHSD, sfupar=-i+1)$en[1]
}

par(mar=c(4,4,1,1))
plot(seq(0,-10,-1), x, type="b", ylab="Sample size inflation", 
     xlab=expression(paste(gamma, " parameter in the HSD spending function")))
<b>Figure 6.</b> Sample size inflation (compared to the fixed design) according to &gamma in the HSD family of spending functions

Figure 6. Sample size inflation (compared to the fixed design) according to &gamma in the HSD family of spending functions

To figure out why there is this significant inflation of the sample size, we can consider the rates of spending (recall Figure 5 above). Long story short, the earlier the alpha is spent, the higher the sample size inflation compared to the fixed design. This is the reason why the Pocock prodedure resulted in such a large sample size inflation compared to the O’Brien-Fleming