--- title: "R markdown version of Lab 4" author: "Constantin T Yiannoutsos and Christos Thomadakis" date: '`r Sys.Date()`' output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(echo = TRUE, fig.align="center") ``` # Interim monitoring of survival studies} First, we import the library ```gsDesign``` into R. ```{r, enrollment, message=FALSE} ################################################ ### Sequential monitoring of clinical trials ### ################################################ library(gsDesign) ``` When designing a survival study, there are several complications even in the fixed design. There are two sources of complications. The fact that the power is a function of the events not the sample size, and the number of events we observe (even from the same number of patients) depends on the duration of the study. The duration of the study in turn, depends on the duration of the patient accrual and the subsequent follow-up of all accrued patients, but even the duration of patient accrual is a function of the accrual rate. Needless to say, adding a schedule of interim analyses on top of the already complex design only increases the complexity and the level of the challenge for the analyst (that is you!). In the sequel, we will follow two steps or the design process: We first calculate the number of events required without sequential monitoring (fixed-sample design\footnote{We call designs without interim analyses "fixed sample designs" because there is no scenario where the sample size will be curtailed by early stopping as is the case with designs which include interim monitoring.}) and then we add the sequential monitoring in the design. ## Fixed-sample design Consider the situation where you are designing a study comparing the mortality (or, conversely, survival) of patients receiving an experimental treatment and a group receiving the standard treatment. Suppose further that you would like to explore the possibility that there is a 25\% reduction in the hazard of mortality in the experimental treatment group (i.e., $\theta=\lambda_1/\lambda_0=0.75$) and that you will ultimately carry out the comparison at the two-sided $\alpha=0.05$ level. You would like to determine the number of patients needed to ensure that, under the above considerations, you will have 80\% power (i.e., $\beta=0.2$). As we have mentioned before, we first come up with the number of _events_ necessary. First we have to calculate the hazard rates $\lambda_0$ and $\lambda_1$ for the experimental and control groups respectively. Recall from the class notes, that the hazard ratio is $$ HR=\frac{\lambda_1}{\lambda_0}=\frac{m_0}{m_1} $$ where $m_0$ and $m_1$ are the median survival times in the two arms. This is because $\lambda_0=\frac{-\log 0.5}{m_0}$ and similarly for $\lambda_1$. ```{r, Survivalfixedevents, message=FALSE} # (iv) Using O'Brien-Fleming boundaries with number of events # Fixed-sample design HR = 0.75 theta = (qnorm(0.975)+qnorm(0.80)) # Number of events without monitoring D = 4*theta^2/log(HR)^2 D ``` In other words, $D=380$ events will be needed to produce the requisite power. ### From events to patients The duration of the study will depend on many considerations. Let's suppose here that we want to cap the total duration of the study to $T=36$ months (this will include both the accrual and follow-up period). We use the function ```nSurv``` as follows: ```{r, Survivalfixedsample, error=TRUE, echo=TRUE} # Hazard for control m0<-9 lambda0<--log(0.5)/m0 # Hazard for exp. treatment m1<-12 lambda1<--log(0.5)/m1 # Sample size in the fixed-sample design nSurv(alpha = 0.05, sided = 2, beta=0.20, lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1, gamma = 30, T=36, ratio = 1) ``` Oops! This means that the design as described cannot result in the requisite power given the alpha level of the ultimate comparison. In other words, we don't have time to enroll enough patients and follow them for a sufficient amount of time in order to observe $D=380$ events (even at the rate of 30 patients a month). After some trial and error, the following is a viable study design using 12 months accrual duration: ```{r, Survivalfixedsample2, message=FALSE} # Sample size in a viable fixed-sample design nSurv(alpha = 0.05, sided = 2, beta=0.20, lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1, gamma = 40, R = 12, ratio = 1) ``` In this situation, we end up with a study that has total duration $T=29.8$ months (about 2.5 years), of which 12 months are devoted to patient accrual (at a rate of $\gamma=40$ patients/month for a total of $N=480$ patients) followed by about 17.75 months ($\approx 1.5$ years) of follow-up after the last patient has been accrued. ## Adding the interim analyses We now want to add an interim analysis schedule involving $k=3$ total analyses at the $\tau_1=0.30$, $\tau_3=0.60$ and $\tau_3=1.00$ trial fraction. Now, instead of specifying the sample size without monitoring, we specify the number of events. We use the argument ```n.fix``` in ```gsDesign``` to do so. ```{r, OFSurvival, message=FALSE} # Sample size is number of events here! x = gsDesign(k = 3, test.type = 2, alpha = 0.025,beta = 0.20, timing = c(0.3,0.6,1), sfu = "OF") x ``` Thus, the number of events now become $D'$=`r round(D*x$n.I[3])`, slightly larger than the number of events required without sequential monitoring. ### From events to patients In the previous paragraphs we used the function ```nSurv```, which calculates the required number of subjects and events in the fixed-sample design (i.e., when a single look at the end of the study is to take place). The function ```gsSurv``` constitutes an extension of ```nSurv```, allowing for interim analyses during the study (please take a look at the previous labs for ```nSurv```). Assuming the exponential distribution for the event times, it is easy to obtain the event rates in each group (also see the second lab for such calculations). At this stage, the study design doesn't involve futility boundaries. We include in R the fact that we want $k=3$ looks at the $\tau_1=0.3$, $\tau_2=0.6$ and $\tau_3=1.0$ trial fraction we want to produce O'Brien-Fleming boundaries for efficacy. This is done by selecting the option ```sfu = "OF"```. Other important parameters include ```test.type = 2``` which, as before, means that we want symmetric upper and lower bounds for a two-sided alpha level (note the option ```side=2```). As before, we include the rate of patient accrual per unit time ```gamma=40``` and accrual duration ```R=12```. The R code follows: ```{r, OFsurvival, message=FALSE, warning=FALSE} # (v) Sequential monitoring of survival study using gsSurv # Hazard for control x3 = gsSurv( k = 3, test.type = 2, alpha = 0.05, sided = 2, lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1, beta = 0.20, sfu = "OF", gamma = 40, R = 12 , ratio = 1, timing=c(0.3, 0.6, 1.0)) x3 ``` From the above output we see that the required number of events is 384 and the maximum study duration is just over 30 months (30.55 months to be exact) or about 2.5 years, of which, the first 12 months are devoted to patient accrual (at an average $\gamma=40$ patients per month for a total $N=480$ patients). The above output ```{r, gsSurvsnippet1, echo=FALSE, message=FALSE, warning=FALSE} # Capture the output tmp<-capture.output(gsSurv( k = 3, test.type = 2, alpha = 0.05, sided = 2, lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1, beta = 0.20, sfu = "OF", gamma = 40, T = 36 , ratio = 1, timing=c(0.3,0.6, 1.0))) # Output the snippet for(i in 21:25){ cat(tmp[i],"\n") } ``` suggests that the expected number of events is 381.6 under the null hypothesis and 327.5 under the alternative. The fact that it is lower than the $D=384$ total events has to do with the fact that, even under the null hypothesis, there is nonzero probability of stopping early during one of the interim analyses. Finally, consider the code chunk ```{r, gsSurvsnippet2, echo=FALSE, message=FALSE} # Capture the output # Output the other snippet for(i in 32:35){ cat(tmp[i],"\n") } ``` We see that the first interim analysis will occur at $\tau_1=10.3$ months, when about 115 events have been observed (and about 413 patients have been enrolled in the study). The second interim analysis, assuming the study was not stopped at the first interim analysis, will occur during month 16 when about 230 events have been observed. Also notice in this same output chunk that the hazards ratios corresponding to the efficacy and futility bounds are given in the output. For example, it would take an observed hazard ratio of $\theta_1=0.507$ in the first interim analysis to stop the study in favor of the experimental treatment and almost a doubling of the hazard ($\theta^*_1=1.971$) to stop the study in favor of the standard therapy. As a final remark, note that the second interim analysis will occur after all the patients have been accrued. Given the high threshold required by the O'Brien-Fleming method, this means that it is very unlikely that we will prevent any of the prospective subjects from being exposed to the experimental drug even if the experimental therapy results in significant elevation of the mortality risk compared to the standard therapy. # Recalibration of survival studies Many times during the implementation of a survival study we may need to adjust the study design and the study monitoring based on various factors, which include but are not limited to the observed rate of occurrence of the events as well as logistical issues having to do with the timing of the review itself. Alpha spending functions enable us to address all of these issues. ## Example: Recalibration of alpha spending Suppose we have a survival study based on the log-rank test and we are using the Pocock spending function $\alpha_P(t) = 0.05\log\{1 + (e-1)t\}$, and we estimate that there will be $D=100$ deaths by the end of the six-year trial. Now suppose that the first analysis happened at the end of the first year and after $d_1=10$ deaths were observed. Given the Pocock procedure, at this look, we've spent $\alpha_P(1/10) = 0.008$ (using z-score boundaries $Z(0.1)=\pm 2.655$ from the Pocock spending function). \ At the second analysis, at the end of the second year, you figure that you will have at most $D'=50$ events at the end of 6 years. Question: What do you do? You realize that you are underspending your alpha. This is because the information fraction should have been $\tau_1 = 10/50 = 0.2$ and you should have spent $\alpha_P(0.20) = 0.015$ but you spent only $\alpha(0.10) = 0.008$. Alpha spent at the first look is gone. You need to accelerate the rate of alpha spending. One way to fix it is to interpolate the spending function for $t\geq 0.20$ by the following formula $$ \alpha'_P(t) = 0.008 +\left (\frac{0.05-0.008}{0.05-0.015}\right ) \left \{\alpha_P(t)- 0.015\right \} $$ \vspace*{-2em} ```{r, recalibration1, message=FALSE} t<-seq(0,1,.1) # Original spending function apt<-0.05*log(1+(exp(1)-1)*t) cbind(t, apt) ``` Figure 1 shows this pictorially. ```{r, recalibration, fig.width=4.5, fig.height=4.5, message=FALSE, fig.align='center', fig.cap='Figure 1.Pocock spending function (blue) and recalibrated spending function (red) to catch up for slow initial spending.'} par(mar=c(4,4,1,1)) plot(t, apt, xlab="Trial fraction", ylab="Alpha spent", type="l", col="blue") abline(v=0.2, lty=2) # Recalibrated spending function apt1<-apt[2]+((0.05-apt[2])/(0.05-apt[3]))*(apt-apt[3]) lines(t[t>=0.2], apt1[t>=0.2], col="red") ``` ## Study monitoring using calendar time Suppose you had designed the previous study based on the Pocock procedure. The basic design would be like this: ```{r, basicdesign, message=FALSE} gsDesign(k=6, test.type=2, alpha=0.025, beta=0.1, sfu=sfLDPocock) ``` This design anticipates six analyses, including the final analysis, all performed at 1/6 trial fraction increments. Note also, the inflation of the sample size ```{r, basicdesign1, echo=FALSE, message=FALSE} temp<-capture.output(gsDesign(k=6, test.type=2, alpha=0.025, beta=0.1, n.fix = 100, sfu=sfLDPocock)) for(i in 6:15){ cat(temp[i], "\n") } ``` is over 20\%. Now let's change the monitoring using calendar rather than event time. Since the maximum duration of the study was six years, at the end of the first year we are at $s = 1/6 = 0.17$ of the total calendar time. Spending functions allow us to change teh monitoring schedule as follows: We could let $t = s$ and use a spending function $\alpha^*(s)$ which spends alpha based on $s$ instead of $t$. In the example above, at the end of the first year, the alpha to be spent should have been $\alpha^*(1/6) = 0.013$ (using the Pocock alpha spending function). Then we can proceed identically as before by adjusting for this fraction. This is an attractive way to do monitoring since the total number of events at the end is not necessary to be known. However, there is a big drawback: Events will be likely slow at the beginning, so it may actually be easier to stop the trial early using calendar versus information time. Consequently, you would be overspending your alpha level and thus reducing your power. We saw previously, that designs which spend the alpha level earlier tend to be a bit underpowered (or, conversely, require slightly larger sample sizes to maintain the same power levels). # Interim monitoring with futility bounds When monitoring a study we may also want to stop the trial if the probability of ever showing a statistically significant difference favoring the experimental arm is very small. This results in the construction of ``futility" bounds. To see how this is done, we must first understand the concept of beta spending. This is similar to alpha spending. Beta spending is relevant in the construction of futility bounds because any errors there will impact the Type II error of the study (i.e., not rejecting the null hypothesis when it is not true). ## One-sided study boundaries for futility Suppose that we are designing a study with $k = 5$ analyses, to be carried out at the $\alpha = 0.05$ and with 90\% power. Suppose also that we want to stop the study both if there is sufficient evidence to reject the null hypothesis as well as if there is evidence in favor of the null hypothesis (futility). Using spending functions $\alpha(t) = \alpha t^\rho$ and $\beta(t) = \beta t^\rho$ with $\rho = 3$. Note that this is approximately the O'Brien-Fleming procedure} we obtain the following upper and lower boundaries, shown in Figure 2. ```{r, futilitybounds, fig.width=5, fig.height=5, message=FALSE, fig.align='center', fig.cap="Figure 2. One-sided efficacy and futility bounds for the approximate O'Brien-Fleming procedure with k=5 equally spaced analyses."} # One-sided with futility bounds ss1<-plot(gsDesign(k=5, test.type=3, alpha=0.05, beta=0.1, sfu=sfPower, sfupar=3, sfl=sfPower, sflpar=3), main="") ss1+theme(legend.position = "none")+theme(axis.text=element_text(size=10))+ theme(axis.title=element_text(size=15,face="bold")) ``` ## Two-sided futility bounds We will carry out this analysis with SAS since ```gsDesign``` does not do analysis with a two-sided futility bound (inner wedge). The following is the code and resulting boundaries for an O'Brien-Fleming design with four equally spaced analyses at $\alpha=0.05$ and $\beta=0.1$. We use the SAS procedure ```seqdesign```, which produces group sequential designs. ```{r, sascode, message=FALSE, engine="sas", eval=FALSE, echo=TRUE} ods graphics on; proc seqdesign boundaryscale=stdz plots=boundary(hscale=samplesize); TwoSidedOBrienFleming: design method=OBF alt=twosided stop=both nstages=4 alpha=0.05 beta=0.10; ods output Boundary=bound_ldl; run; ods graphics off; ``` ```{r, echo=FALSE, fig.align='center', fig.cap="Figure 3. O'Brien-Fleming design with $k=4$ analyses and inner wedge."} library(knitr) knitr::include_graphics('C:/Users/xristos/Desktop/Clinical Trials/R Labs/Lab 4_ Study monitoring II (monitoring of survival studies)/BoundaryPlot2.png') ``` We used options ```boundaryscale=stdz``` so that the results are presented in the standardized z scale. The above code produces the results in Figure 3. The study will continue as long as the standardized $Z$ statistic falls within the white region and stop for efficacy (either in favor of the experimental treatment or for the standard treatment) in the blue areas (above and below respectively), or futility (i.e., inability to ever show a difference) if the statistic lands in the inner wedge on the right.