--- title: "RMarkdown version of lab8" author: "Constantin T Yiannoutsos" date: "October 16, 2017" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ```{r, packages, echo=FALSE, results="hide", message=FALSE} options(width=72) ``` # Introduction In today's lab, we will perform inference after the end of a study that involed in an interi analysis. Note that inference inolves test of hypotheses and estimation. The former results in tests involving p values and the latter in point estimates and confidence intervals. Both of these must account for the fact that the data were observed during multipole (interim) analyses instead of once at the eend of the study. ## Definition of the p-value in the context of monitoring The definition of a p-value in the fixed-sample design is the probability \underline{under the null hypothesis} of observing a test statistic as extreme or more extreme than what was actually observed. This is not so clear in the group sequential context. For example, if $Z_1(\tau_i)>Z_2(\tau_i)$, i.e., the test statistics of two identical studies at the same interim analysis $\tau_i$, it may be clear that $Z_1(\tau_i)$ provides more extreme evidence than $Z_2(\tau_i$), written formally as $$ (\tau_i, Z_1)\succ (\tau_i, Z_2) $$ However, the following is not as clear: Is $Z(\tau_i)=3.50$ after stage $\tau_i$ which, say, did not result in the interruption of the study, more or less extreme than $Z(\tau_j)=3.50$ after stage $\tau_j>\tau_i$ which, in this hypothetical experiment, resulted in the interruption of the study? The major assumption that will be made for the following discussion is that the evidence leading to the stopping of the study is \textit{at least as extreme} in stage $j$ as it was in stage $i c_i\right )} & + & \underbrace{\mbox{Pr}\left ( \cap_{i=1}^{j-1}Z(t_i)\leq c_i, Z(t_j)> z_j\right )}\\ \mbox{Trial stops at } iz_{\mbox{obs}}|\delta=\delta_L)\leq \alpha/2 $$ Similarly, the upper bound $\delta_U$ is calculated by considering $$ p_{\delta_U}=\mbox{Pr}(Z< z_{\mbox{obs}}|\delta=\delta_U)\leq \alpha/2 $$ ### Confidence intervals resulting from stage-wise ordering Confidence intervals calculated after a study which includes interim monitoring should have the following properties: 1. The confidence interval should be a (contiguous) interval 2. It should agree with the original test. In other words, if the test rejected $H_0$, then the value of $\delta$ under the null should not be contained within the interval. 3. The confidence interval should contain the MLE $\hat\delta=Z(t)/\sqrt{\nu_T}$ 4. A narrower confidence interval is to be preferred to a wider one All of these properties hold under the stage-wise ordering espoused in these notes. ### Example: Diet trial (Proshan, Lan \& Wittes, 2006) In a clinical trial of 200 participants per arm the primary endpoint was weight change over 3 months. An O'Brien-Fleming spending function was used and four analyses of the data were planned. Initially, we would have planned this triall assuming four equally spaced analyses ```{r, originalOFdesign} originalOFdesign<-gsDesign(k=4, test.type=2, sfu=sfLDOF, alpha=0.025, beta=0.1) cbind(1:4,originalOFdesign$upper$bound) ``` This is not exactly how things turned out. In reality, the first two analyses occurred at times $\tau_1=0.22$ and $\tau_2=0.55$. The third analysis occurred after $n_T=152$ and $n_C=144$ subjects had been accrued in the treatment and control arms respectively, that is, at (information) time $\tau_3=0.74$\footnote{Since the total information is $I_{\mbox{max}}=200/2\sigma^2$ and the information at the third interim analysis is $I_3=\left [\sigma^2\left (1/152+1/144\right )\right ]^{-1}$ then, the information fraction is $\tau_3=\frac{2\sigma^2}{200}/\sigma^2\left (\frac{1}{152}+\frac{1}{144}\right )\approx 0.74$}. With this situation, the O'Brien-Fleming bounds would be ```{r, actualOFdesign} actualOFdesign<-gsDesign(k=4, test.type=2, sfu=sfLDOF, alpha=0.025, beta=0.1, timing=c(0.22, 0.55, 0.74, 1)) cbind(1:4, actualOFdesign$upper$bound) ``` In other words, ```{r echo = FALSE, results = 'asis'} library(knitr) actualOFdesign<-gsDesign(k=4, test.type=2, sfu=sfLDOF, alpha=0.025, beta=0.1, timing=c(0.22, 0.55, 0.74, 1)) boundstable<-as.data.frame(cbind(1:4, actualOFdesign$upper$bound)) colnames(boundstable)<-c("Analyses", "Bounds") kable(boundstable, caption = "Table 1. Bounds of the design as actually run.") ``` The z-score at the third interim analysis was $$ Z(0.74)=\frac{\bar{X}_1-\bar{X}_2}{\sqrt{(4.8)^2(1/152+1/144)}}=3.76 $$ reflecting a sample standard deviation $s=4.8$ and $\hat{\delta}(\tau_3)=2.099$. As $Z(0.74)=3.76>2.39$ the study stops. The cumulative exit probability is calculated as follows: ```{r, cumexitpvalue} dietstudyp<-drift(zb=c(actualOFdesign$upper$bound[1:2],3.76), t=c(0.22, 0.55, 0.74), drft=0) cbind(1:3,dietstudyp$cum.exit) ``` Thus, the cumulative two-sided ("exit") probability is $p=0.005$. ### Confidence interval of the effect size Recall that, in a fixed-sample design, $$ N=\frac{\sigma^2(z_{1-\alpha}+z_{1-\beta})^2}{\delta^2} $$ so that $$ \theta=(z_{1-\alpha}+z_{1-\beta})=\frac{\delta}{\sqrt{{\rm var}(\delta)}}=\frac{\delta}{\sqrt{2\sigma^2/N}}=f $$ where $f=\frac{\delta}{\sigma/\sqrt{N}}$ is the effect size and $N$ is the sample size for each group at the completion of the study. In addition, $\delta$, the effect size (e.g., the difference in average weight loss between the two interventions in the diet example) is equal to $$ \delta=\theta\sqrt{2\sigma^2/N} $$ When we have an interim analysis, the software can produce an updated $\theta^*$, which takes into account the interim looks of the data. In our example, the fixed-sample effect size is $$ \theta=(z_{1-\alpha/2}+z_{1-\beta})=(1.96+1.282)=3.2415 $$ The updated effect size taking into consideration the interim analyses is ```{r, dietstudydrift} dietstudydrift<-drift(zb=actualOFdesign$upper$bound, t=c(0.22, 0.55, 0.74, 1), pow=0.9) dietstudydrift$drift ``` so that $\theta^*=3.2708$ is the minimum effect size that will result in rejection of the null hypothesis. The effect size in terms of weight loss difference in the two treatment groups (accounting for the interim analyses) will be $$ \delta^*=\theta^*\sqrt{2\sigma^2/N}=3.2708\sqrt{2(4.8)^2/200}=1.57 $$ That is, it would take a difference in the loss of weight of $\delta^*=1.57$ lbs in order to reject the null hypothesis after three interim analyses. A similar situation arises when we want to derive a two-sided confidence interval for the effect size. The software will produce a confidence interval in terms of $\theta^*$, $(\theta^*_L, \theta_U^*)$. Then the two-sided confidence interval for the effect size will be $$ (\delta^*_L, \delta_U^*)=\left (\theta^*_L\sqrt{2\sigma^2/N}, \theta_U\sqrt{2\sigma^2/N}\right ) $$ The R code performing this analysis is ```{r, dietstudyCI} dietstudyCI<-drift(zb=c(actualOFdesign$upper$bound[1:3]), t=c(0.22, 0.55, 0.74), conf=0.95, zval=3.76) cat("(",dietstudyCI$conf.interval$lower.limit/sqrt(0.74),",", dietstudyCI$conf.interval$upper.limit/sqrt(0.74), ")") ``` \textcolor[rgb]{1.00,0.00,0.00}{Note that, in the above code, we divided the lower and upper limit of the drift by $\sqrt{0.74}$ the square-root of the current information fraction, to obtain the correct lower and upper limits of the confidence interval.} Since $\delta^*=\theta^*\sqrt{2\sigma^2/N}$, the relevant 95\% confidence interval in the scale of $\delta^*$ is \begin{eqnarray*} (\delta_L, \delta_U) & = & (\theta_L\sqrt{2\sigma^2/N}, \theta_U\sqrt{2\sigma^2/N})\\ & = & (1.134\sqrt{2(4.8)^2/200}, 6.211\sqrt{2(4.8)^2/200}) = (0.544, 2.981) \end{eqnarray*} This means that the experimental treatment reduces weight by between half and three kilograms. Carrying out analyses in terms of the effect size is a powerful way to perform interim testing, since the same software can apply to any situation regardless of whether the specific study tested differences in means, proportions or time-to-event studies. All you need is an estimate of the revised drift and the appropriate definition of the effect $\delta$ and its associated variance ${\rm var}(\delta)$. 1. \underline{Comparison of means:} Here, $\delta=\mu_E-\mu_C$ and ${\rm var}(\delta)=2\sigma^2/N$ 2. \underline{Comparison of proportions:} Here $\delta=p_E-p_C$ and ${\rm var}(\delta)=2p(1-p)/N$, where $p=\frac{p_E+p_C}{2}$ 3. \underline{Survival studies:} Here $\delta=\log\left (\frac{\lambda_E}{\lambda_C}\right )$ and ${\rm var}(\delta)\approx 4/D$, where $D$ is the total number of \textit{events}. # Bibliography Proschan, M.A., Lan K.K.G. and Wittes J.T. (2006). _Statistical Monitoring of Clinical Trials: A unified approach_. Springer, New York, NY.