Introduction to clinical trials

Lab 1: Study designs

Dose finding studies

We will go over an example from artificial data from a Phase I study. We will do this with , but these data can also be analyzed with any other statistical software as well. We proceed with the following steps: We first import the data () into . By having a look at the data, we can understand that each patient () has been assigned exactly to one dose of the drug.

# (1): Artificial data from an old Phase I study of a cytotoxic drug
# 10 doses have been investigated; have a look at the data
trialPh1[1:10,]
##    id toxicity      dose
## 1   1        0 0.2500000
## 2   2        0 0.2750000
## 3   3        0 0.3025000
## 4   4        0 0.3327500
## 5   5        0 0.3660250
## 6   6        1 0.4026275
## 7   7        1 0.4428902
## 8   8        1 0.4871793
## 9   9        1 0.5358972
## 10 10        1 0.5894869

Notice in the above how we indexed the first 10 rows of data set : keeps the data set as a two-dimensional grid , where (say) is the number of rows and is (say) the number of columns. Notice that the rows, are indexed first and the columns second. Our index (included between the brackets) specified rows 1 through 10 (i.e., , note the use of the colon to imply range). Since we want all the columns, we left the index blank after the comma. This told to include all columns in the output.

The outcome variable (toxicity) indicates whether the subject developed toxicity or not. Thus, it is a binary (0/1) variable and a logistic regression model may be appropriate. We assume that the logit transformation of the probability of toxicity \(p\) is

\[ \operatorname{logit}(p)=\log(p/(1-p)) \] and it depends linearly on the logarithm of the dose. \[ \operatorname{Logit}(p) = \beta_{0} + \beta_{1}\log(Dose) \] To fit such a model, one can use the in . To do so, we need to define the appropriate formula indicating the outcome variable (on the left) and the covariates (on the right). We also need to tell to use the binomial distribution using the argument , which assumes by default a logit link function.

# Fitting a logistic model to estimate the dose associated with toxicity 
# in 30% of the patients
fit = glm(toxicity ~ log(dose), family = "binomial", data = trialPh1)
summary(fit)
## 
## Call:
## glm(formula = toxicity ~ log(dose), family = "binomial", data = trialPh1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3099  -0.6137  -0.3945  -0.1978   2.8085  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.9125     0.2341   12.44   <2e-16 ***
## log(dose)     4.9316     0.2890   17.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2050.5  on 1999  degrees of freedom
## Residual deviance: 1633.7  on 1998  degrees of freedom
## AIC: 1637.7
## 
## Number of Fisher Scoring iterations: 5

To get the coefficients we proceed as follows:

# Get coefficients
betaHat = coef(fit)
exp(betaHat)[2]
## log(dose) 
##  138.6054

Note the syntax of the logistic regression model. The model is included in the argument of the function with the dependent () and independent variable(s) (log ) separated by a tilde. This is a general syntax of modeling functions. Also, note that option , which implies that a logistic regression model is to be run (the function runs an array of generalized linear models). The above output means that \(\beta_1=138.6054\). This is the increase in the log-odds of the toxicity for each log-unit in the dose.

Suppose now that we want to estimate the optimal dose, which is defined as the dose that results in no more than 30% probability of toxicity. To estimate the optimal dose we simply need to solve the equation with respect to \(Dose\), giving \[\begin{align} \widehat{D} = \exp\left\{(\operatorname{Logit}(p) - \hat\beta_{0})/\hat\beta_{1}\right\} \nonumber \end{align}\]

where \(\hat D\) is the estimate of the (optimal) dose at \(p=0.30\), and \(\hat\beta_0\) and \(\hat\beta_1\) are the logistic regression coefficients from the model in above. Consequently, the estimated optimal dose is

# Estimate dose30
invlogit = function(x){exp(x)/(1+exp(x))}
logit = function(x){log(x/(1-x))}
Dose30 = exp((logit(0.30) - betaHat[1])/betaHat[2])

Thus, the optimal dose which is associated with 30% chance of toxicity is 0.466. We can also produce a graph showing the predictions of the model

# Estimated probability of toxicity as a function of log-dose
curve(invlogit(betaHat[1]+betaHat[2]*x),from = min(log(trialPh1$dose)),
      to = max(log(trialPh1$dose)),ylim = c(0,0.60),
      xlab = "Logarithm of dose", ylab = "Probability of toxicity")
lines(c(-1.4,log(Dose30)),rep(0.3,2),col = "red",lty = 2)
lines(rep(log(Dose30),2),c(0,0.3),col = "red",lty = 2)
Figure 1. Probability of toxicity based on a logistic regression model

Figure 1. Probability of toxicity based on a logistic regression model

Notice that the x-axis is in the log scale, so the log-dose is \(\log(0.466553)=-0.762\). Note the use of the function . You can see what this function does by typing ``" in the command line Although this approach can accurately estimate the optimal dose (the one corresponding to a certain probability of event), it cannot be used in actual clinical trials due to ethical reasons. In this artificial example, patients have been exposed to doses that were certain to lead to serious toxicities, violating central tenets of ethics.

Operating characteristics of the standard Phase I design

Let us consider the standard 3+3 design. The conditional probability of escalating past the \(i\)th dose, given that the \(i\)th dose has been reached, is \[ P_{i} = b(0;3,p_{i}) + b(1;3,p_{i})\times b(0;3,p_{i}). \] where \(b(k;n, p_i)={\rm Pr}(X=k; n,p_i)\) is the binomial probability of obtaining \(k\) successes out of \(n\) trials when the probability of success is \(p_i\). There are two scenarios that can lead to increasing the \(i\)th dose:

  1. No toxicities are observed
  2. One toxicity is seen but there are no additional ones in the expanded cohort.

Given the hypothetical probability of toxicity at the \(i\)th dose, \(p_{i}\), the probability of seeing no toxicities (scenario 1) is \(b(0;3,p_{i})\). The second scenario requires two events: the occurrence of one toxicity in the original cohort (which occurs with probability \(b(1;3,p_{i})\)) and the occurrence of no toxicity in the additional subjects (which occurs with probability \(b(0;3,p_{i})\) conditional on the toxicity in the original cohort). By the multiplication law of probability, the probability of scenario 2 is the product of the two corresponding probabilities \(b(1;3,p_{i})\times b(0;3,p_{i})\).
Overall, the conditional probability of passing the \(i\)th dose is the sum of the two probabilities

\[ b(0;3,p_{i}) + b(1;3,p_{i})\times b(0;3,p_{i}) \] Then the unconditional probability of passing the \(i\)th dose is \[ Q_{i}\prod_{k=1}^{i}b(0;3,p_{k}) + b(1;3,p_{k})\times b(0;3,p_{k}). \] By the multiplication law of probability to obtain the unconditional probability of passing the \(i\)th dose, we need to multiply the probabilities of passing all earlier doses. You may find these calculations similar to those involved in the relationship between the survival and hazard function.

Calculating the operating characteristics for both scenarios in

You can do this by hand using the binomial probability density function (pdf) in , as follows: 1. Find \(b(0;3, 0.2)\) the probability of no toxicities in the first dose (where \(p_1=0.20\)). This is

dbinom(x = 0, size = 3, prob = 0.20)
## [1] 0.512
  1. Find the probability of just one toxicity in the first dose
dbinom(x = 1, size = 3, prob = 0.20)
## [1] 0.384
  1. Find the same probabilities for the second dose (where \(p_2=0.30\))
dbinom(x = 0, size = 3, prob = 0.30)
## [1] 0.343

and the probability of one toxicity in the second dose

dbinom(x = 1, size = 3, prob = 0.30)
## [1] 0.441

The probability of getting past the second dose is then \[ Q_2=((0.512)+(0.512)\times (0.384))\times((0.343)+(0.343)\times (0.441))\approx 0.35 \] You can also write a small function that would handle any \(i\) as follows:

# (2)-b: Example of a three-up/three-down design
OC = function(p, dose = paste("dose",1:length(p),sep = ""))
 {
   # Conditional probabilities of escalating past the ith dose
   # given that the ith dose has been reached
   Pi = dbinom(x = 0,size = 3,prob = p)
   Pi = Pi + dbinom(x = 0,size = 3,prob = p)*dbinom(x = 1,size = 3,prob = p)

   # Unconditional probability of passing the ith dose
   Qi = cumprod(Pi)

   # Operating characteristic
   OC = 1 - Qi

   return(data.frame(dose = dose, prob = p,Pi = Pi, Qi = Qi, OC = OC))
}

Functions are elegant ways to not repeat code. Function takes one input argument: the probability of toxicity \(p\). A useful feature of is that the argument can be a vector of (multiple) values. Thus, we can, for example, input the proportion of treatment-related toxicity at the first and second dose level, \(p_1=0.20\) and \(p_2=0.30\) respectively. The function then produces a data set, a ``data frame" as it is known in , with arguments, the number of doses (as many as the proportions entered in the input - the length of the vector ), plus the probability – – \(p\), and the probability \(P_i\) of continuing past the current stage and the probability \(Q_i\) of continuing past all stages including the current stage. The final element in the data frame is the operating characteristic of the design, i.e., the probability of stopping by the current stage. The data frame is produced by the command which produces the only thing that will remain after the execution of the function.

Running function in the previous example we obtain

OC(p = c(0.2,0.3))
##    dose prob       Pi        Qi        OC
## 1 dose1  0.2 0.708608 0.7086080 0.2913920
## 2 dose2  0.3 0.494263 0.3502387 0.6497613

Here we have defined a vector in by two individual elements: 0.2 and 0.3. This is done by putting a list of elements, separated by a comma, within parentheses, following the character ``c“. Here, the vector is

c(0.2, 0.3)
## [1] 0.2 0.3
This is a vector of one row. Thus, there is 35% probability of continuing past the second dose, with 20% toxicity in the first and 30% in the second dose. Note that the command

concatenates a number from 1" to the length of \texttt{p} (the toxicity proportion) to the worddose" to create a unique name for \(D_i\), where \(i=1,\cdots,{\texttt{length(p)}}\).

We will show the operating characteristics of two different situations (Piantadosi pp. 235). In the first, (design \(A\)) significant toxicities arise at later doses, while in the second (design \(B\)) the toxicity probabilities increase linearly.

# (2)-c Piantadosi, 2005, pp. 235
ptrueB = seq(0.10,0.55,by = 0.05)
ptrueA = c(0.001,0.01,0.02,0.05,0.11,0.19,0.28,0.39,0.53,0.68)

# Add probabilities of set A
plot(1:10,ptrueA, xlab = "Dose level",ylab = "Probability of toxicity",
     xaxt = "n",yaxt = "n",ylim = c(0,0.8),type = "b", pch = 21, lty = 3,)
     axis(1,1:10)
     axis(2,seq(0,0.8,by = 0.1),las = 1)

# Add grid lines
abline(v = 1:10, col = "lightgray", lty = "dotted")
abline(h = seq(0,0.8,by = 0.1), col = "lightgray", lty = "dotted")

# Adding probabilities of set B
lines(1:10,ptrueB,pch = 16,type = "b")
text(7,0.2,labels = "A")
text(4.5,0.35,labels = "B")
Figure 2: Hypothetical toxicity probabilities

Figure 2: Hypothetical toxicity probabilities

Assuming that the optimal dose is the one that gives 30% chance of toxicity, for set \(A\), the trial should terminate between the 7th and 8th dose with a high probability. For the second set (curve B), the trial should terminate near the 5th dose with a high probability. The OCs are shown in Figure 3.

# Calculate OCs
setAoc = OC(ptrueA)
setBoc = OC(ptrueB)

# Add OC of set A
plot(1:10,setAoc$OC,xlab="Dose level",
     ylab="Cumulative probability of stopping",
     xaxt = "n",yaxt = "n",ylim = c(0,1),type = "b",pch = 21,lty = 3,)
axis(1,1:10)
axis(2,seq(0,1,by = 0.1),las = 1)

# Add grid lines
abline(v = 1:10, col = "lightgray", lty = "dotted")
abline(h = seq(0,1,by = 0.1), col = "lightgray", lty = "dotted")

# Adding OC of set B
lines(1:10,setBoc$OC,pch = 16,type = "b")
text(6.5,0.4,labels = "A")
text(3.15,0.65,labels = "B")
Figure 3: Operating characteristics for designs $A$ and $B$

Figure 3: Operating characteristics for designs \(A\) and \(B\)

For set \(A\), the expected stopping point is between the 6th and 7th dose (see stopping probabilities close to 50%), and the design has a nearly 90% chance of stopping before or at the 8th dose. For set \(B\), the expected stopping point is the third dose, and the design has an 85% chance of stopping before or at the 5th dose (the optimal dose). Thus, in these examples, the three-up/three-down design might stop before the optimal dose is visited.

Phase II designs

Single-stage designs

Let \(p\) denote the response rate of the new therapy. The response rate of the standard treatment is around 15%, whereas the new therapy is expected to be successful for about 40% of the patients. The null hypothesis to be tested is \[ H0: p \leq 0.15 \] versus the alternative \[ H1: p = 0.40 \] at \(\alpha = 0.10\) significance level with power \(1-\beta=0.8\). Suppose that the sample size is \(n=16\) (we will see later how we arrived at this sample size). The number of responders \(X\) among the \(n=16\) patients follows the binomial distribution, \(X|p\sim \operatorname{Binomial}(n=16,p)\). Common sense suggests that we should reject the null hypothesis in favor of the alternative when the number of responders exceeds some threshold \(k\) (\(k=0,1,\ldots,16\)). We also would like this \(k\) to satisfy both the type-I error requirement, i.e., that \(\Pr(X>k|p=p_{0})\leq\alpha\) and Type-II error requirement, i.e., \(\Pr(X\leq k|p=p_{1})\leq\beta\). We can easily get the binomial probabilities in through the function ,

# (3): Phase II non-comparative study
# Construct a test to investigate if the true resp. rate is at least 15% a = 0.10 
# (one-sided), assume sample size=16 and the resp. rate is 40% under the alternative
n = 16
k = 0:16

# The number of responders (X) out of n patients follows the binomial dist.
# The new therapy worths further examination if X >= k; We need to determine k
round(dbinom(k, size = n, prob = 0.15),5)
##  [1] 0.07425 0.20965 0.27748 0.22851 0.13106 0.05551 0.01796 0.00453
##  [9] 0.00090 0.00014 0.00002 0.00000 0.00000 0.00000 0.00000 0.00000
## [17] 0.00000

Note that the function takes \(1,\cdots, 16\) as a vector and produces a vector of 17 (i.e., \(n+1\)) elements.

bintest = round(data.frame(k = k,dbinom(k,size = n,prob = 0.15),
                           pbinom(k,size = n,prob = 0.15,lower.tail = F),
                           dbinom(k,size = n,prob = 0.40),
                           pbinom(k,size = n,prob = 0.40,lower.tail = F)),5)
names(bintest) = c("k","Pr(X=k|p=0.15)","Pr(X>k|p=0.15)",
                   "Pr(X=k|p=0.40)","Pr(X>k|p=0.40)")

bintest
##     k Pr(X=k|p=0.15) Pr(X>k|p=0.15) Pr(X=k|p=0.40) Pr(X>k|p=0.40)
## 1   0        0.07425        0.92575        0.00028        0.99972
## 2   1        0.20965        0.71610        0.00301        0.99671
## 3   2        0.27748        0.43862        0.01505        0.98166
## 4   3        0.22851        0.21011        0.04681        0.93485
## 5   4        0.13106        0.07905        0.10142        0.83343
## 6   5        0.05551        0.02354        0.16227        0.67116
## 7   6        0.01796        0.00559        0.19833        0.47283
## 8   7        0.00453        0.00106        0.18889        0.28394
## 9   8        0.00090        0.00016        0.14167        0.14227
## 10  9        0.00014        0.00002        0.08395        0.05832
## 11 10        0.00002        0.00000        0.03918        0.01914
## 12 11        0.00000        0.00000        0.01425        0.00490
## 13 12        0.00000        0.00000        0.00396        0.00094
## 14 13        0.00000        0.00000        0.00081        0.00013
## 15 14        0.00000        0.00000        0.00012        0.00001
## 16 15        0.00000        0.00000        0.00001        0.00000
## 17 16        0.00000        0.00000        0.00000        0.00000

It is easy to see from the output above that the smallest \(k\) such that \(Pr(X>k|p=p_{0})\leq\alpha\) is \(k = 4\). Thus, we reject the null hypothesis when \(X\geq 5\) responses are observed.

In the above code, note that, again, we passed a vector of length 17 (=\(n+1\)) to function . The function will execute 17 times and, each time, produce the binomial probability \(b(k,n;p)\) for \(n=16\) and \(k=0,\cdots,16\). Consequently, not complicated code or loops are needed; can perform this over all elements of vector .

Since we use the binomial distribution, we can get the exact significance level of the test by calculating \(Pr(X>4|p=0.15)\). Note that this doesn’t necessarily correspond to \(\alpha=0.10\) the significance level. On the other hand, the power of the test is defined by the probability of rejecting the null hypothesis under the alternative, i.e. \(Pr(X>4|p=0.40)\).

# The null is rejected if X>4 or equivalently X>=5!
# Exact significance level and power of the test
k1 = which(bintest[,3] < 0.10)[1]
bintest[k1,c(1,3,5)]
##   k Pr(X>k|p=0.15) Pr(X>k|p=0.40)
## 5 4        0.07905        0.83343

Beautiful! To select the correct line of the data set, we used the command in to search for the row where the entry in the third column in the data is less than \(\alpha=0.10\). Then we store this row in variable . Then, we print the \(^{\rm th}\) row and the first, third and fifth column in the database (entered as vector ).

Thus, the exact significance level and power of the test are 0.07905 and 0.83343, respectively. You have to keep in mind that although you can set the type-I error at any level you want, you minimize both the type-I and type-II errors for a given sample size. The only thing you can do in order to increase the power (or decrease the Type-II error) is to increase the sample size.

Sample size calculations

To find the sample size we need to find an \(N\) which provides a combination of binomial probabilities which satisfy the Type-I and Type-II errors. We can write a very small program in to do this:

rsppow<-function(alpha, beta, p0, p1){
  if (p1<=p0) {
    cat("You idiot! p1 must be greater than p0. Try again.")
    } 
  else {
#F is the sample size based on the normal approximation to the Binomial (Fleming, 1982)
    F<-(qnorm(1-beta)*sqrt(p1*(1-p1))+qnorm(1-alpha)*sqrt(p0*(1-p0)))^2/(p1-p0)^2
# We will search (just like PASS is doing) between 0.8F and 4F
    n<-round(0.8*F)
    found<-FALSE
#Probability of false rejection (pp0) and false non-rejection (pp1)
   pp0<-1.0
   pp1<-1.0
while(n<round(4*F) & found==FALSE) {
  j<-0
  while((pp0>alpha | pp1>beta) & j<=n) {
    pp0<-1-pbinom(q = j, size = n, prob = p0)
    pp1<-pbinom(q = j, size = n, prob = p1)
    if(pp0<alpha | pp1<beta){
      j<-j+1
    }
  }
  if(j>n){
    j<-0
    n<-n+1
  }
  else {
    found<-TRUE
  }
}
if(found){
  cat("SAMPLE SIZE (N) = ", n, "\n")
  cat("P0 EXCLUDED IF X .ge. ",j-1,"\n")
  cat("COVERAGE= ", 1-pp0, "\n")
  cat("POWER   = ", 1-pp1, "\n")
} else cat("No suitable sample size found", "\n")
    }
}

The program loops from \(0.8F\) to \(4F\), where \[ F=\left (\frac{z_{1-\beta}\sqrt{(p_1(1-p_1)}+z_{1-\alpha}\sqrt{p_0(1-p_0)}}{p_1-p_0}\right )^2 \] is the sample size based on the normal approximation to the binomial distribution With \(p_0=0.15\), \(p_1=0.40\), \(\alpha=0.1\) and \(\beta=0.2\), \(F=10\).
Performing this analysis produces the following results:

rsppow(alpha=0.1, beta=0.2, p0=0.15, p1=0.4)
## SAMPLE SIZE (N) =  16 
## P0 EXCLUDED IF X .ge.  4 
## COVERAGE=  0.9209487 
## POWER   =  0.8334326

In the output above, ``coverage" is equal to \(1-\alpha\). Note that must be strictly lower than or you will get a (rather rude) error message! In any case, the above output shows that for \(N=16\) we obtain coverage \(1-\alpha>0.92\) (or \(\alpha<0.08\)), which means that the attained alpha level is the desired 10%. The \(1-\beta>0.83\), which is greater than our desired power of 80%.

Two-stage designs

Two-stage designs usually involve cumbersome calculations, so we can use the function do the job for us. Note that the available options are the and the (Simon’s) two-stage design. Two-stage designs enroll some patients in the first-stage, where a first assessment of efficacy is performed. If the therapy seems unacceptable, the trial stops. If not, more patients are recruited in the second stage and the efficacy of the drug is investigated again. Type-I and type-II errors are calculated over both stages and involve the cumulative probability of scenarios which result in the study stopping at the first stage and those which result in the study continuing to the second phase. The idea is identical to the operating characteristics in the 3-up-3-down phase-1 design.

# 4. Simon's two-stage design
# There are two alternatives in library clinfun:
# Optimal and Minimax 2-stage Phase II designs
# Assume p0 = 0.15, p1 = 0.15,
# type-I error alpha = 0.10 (one-sided), type-II error beta = 0.20
library(clinfun)
ph2simon(pu = 0.15, pa = 0.40, ep1 = 0.10, ep2 = 0.20, nmax = 100)
## 
##  Simon 2-stage Phase II design 
## 
## Unacceptable response rate:  0.15 
## Desirable response rate:  0.4 
## Error rates: alpha =  0.1 ; beta =  0.2 
## 
##         r1 n1 r  n EN(p0) PET(p0)
## Optimal  1  7 4 18  10.12  0.7166
## Minimax  1  9 4 16  11.80  0.5995

In the above output, note the involcation of package , which must be loaded or else won’t recognize the command .
For both designs, the hypotheses to be tested are \(H0:p\leq0.15\) vs. \(H1:p\geq0.40\) (one-sided). For the optimal (Simon’s) design we need to enroll \(n=18\) patients, of whom \(n_{1} = 7\) are involved in the first stage. If \(r_{1} \leq 1\) of them respond, then the study will stop. If \(r_1 > 1\) individuals respond, then \(n_2 = 11\) additional subjects will be enrolled in the second phase. If \(r_2 \leq 4\) respond, the null hypothesis will not be rejected and the drug will not be considered effective. Otherwise, the null hypothesis will be rejected and the conclusion will be that the response rate \(p > p_0 = 0.15\). The Simon’s design is “optimal” in the sense that it exposes, on average, the minimum number of patients if the true probability of response is \(p_0\), i.e., \[ ASN (p0) = EN0 = 10.1 \] is minimized if \(H0\) is true (i.e. if \(p \leq p_0\)).

Regarding the minimax design, the above output implies that we need to enroll \(n=16\) patients, of whom \(n_{1} = 9\) are involved in the first stage. If \(r_{1} \leq 1\) of them respond, then the study will stop. If \(r_1 > 1\) individuals respond, then \(n_2 = 7\) additional subjects will be enrolled in the second phase. If \(r_2 \leq 4\) respond, the null hypothesis will not be rejected and the drug will not be considered effective. Otherwise, the null hypothesis will be rejected and the conclusion will be that the response rate \(p > p_0 = 0.15\). The minimax design minimizes the total sample \(n\) but may result in larger average sample under the null hypothesis compared with Simon’s design. Here, \(ASN (p0) = 11.8\) individuals) if \(H0\) is true compared to the optimal two-stage design.

Factorial designs

Factorial designs are efficient when combinations of various treatments are assessed. ### Evaluation of combination therapy for hypertension We consider the following example Various combinations of two anti-hypertensive drugs, four doses of an ACE inhibitor (drug \(A\)) and three doses of a diuretic (drug \(B\)) were considered. The results and sample sizes are given in the following table:

We consider simulated data from the above table (with common variance \(s=7.07\) mmHg) and equal to the above means. We focus here on the \(2\times 2\) factorial of the \(A0\), \(A1\), \(B0\) and \(B1\) (top left-hand corner of the table).

The data are in file hungab.csv. The data are inputted in and managed by the following code:

# input the data
hungab<-read.csv("H:/teaching/Indiana/PBHL B-582/labs/lab 1 (study designs)/data/hungab.csv")

The command read.csv reads in a comma-separated-value (CSV) file. There are options for the separation character if this is not a comma, but here the default will do just fine. A box plot of the four drug combinations is given in Figure 4.

Figure 4: Box plot of Hung data

The power of the \(2\times 2\) factorial design is in the fact that it combines both groups with the effect \(B1\) (i.e., \(A1B1\), \(A0B1\)) and the two groups without the effect \(B1\) (i.e., \(A1B0\), \(A0B0\)). This works when there is no interaction effect.

We can assess the presence of interaction graphically (see Figure 5).

# Interaction plot
with(hungab, interaction.plot(A, B, y, type="b", fun=mean, col=c(2:3),  
                 leg.bty="o", leg.bg="beige", 
                 lwd=2, pch=c(18,24),
                 xlab="ACE inhibitor dose (A)",
                 ylab="Mean SidBP drop (mmHg)",
                 #main="Interaction Plot"
                 ))
Figure 5: Interaction plot of $A$ versus $B$

Figure 5: Interaction plot of \(A\) versus \(B\)

From Figure 5, it does not appear that a significant interaction effect is present. By the way, note in the R code that we can enter entire vectors for arguments of things like color ( col ) or type of symbol ( pch). R does the rest.

We can run the \(2\times 2\) factorial design without interaction as follows:

hungnoint<-lm(formula = y ~ A + B, data = hungab)
summary(hungnoint)
## 
## Call:
## lm(formula = y ~ A + B, data = hungab)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.5523  -4.4318   0.2277   4.6367  19.9064 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   1.3003     0.7063   1.841   0.0666 .
## AA1          -1.2007     0.8164  -1.471   0.1425  
## BB1           1.5993     0.8164   1.959   0.0511 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.059 on 296 degrees of freedom
## Multiple R-squared:  0.01993,    Adjusted R-squared:  0.01331 
## F-statistic:  3.01 on 2 and 296 DF,  p-value: 0.05082

We can run the \(2\times 2\) factorial design with interaction as follows:

hungint<-lm(formula = y ~ A * B, data = hungab)
summary(hungint)
## 
## Call:
## lm(formula = y ~ A * B, data = hungab)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.453  -4.433   0.128   4.658  20.006 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   1.4000     0.8164   1.715   0.0874 .
## AA1          -1.4000     1.1545  -1.213   0.2262  
## BB1           1.4000     1.1545   1.213   0.2262  
## AA1:BB1       0.4000     1.6355   0.245   0.8070  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.07 on 295 degrees of freedom
## Multiple R-squared:  0.02013,    Adjusted R-squared:  0.01016 
## F-statistic:  2.02 on 3 and 295 DF,  p-value: 0.1112

We also can run this a four-way comparison where each treatment combination is considered separately.

hung4arm<-lm(formula = y ~ AB, data = hungab)
summary(hung4arm)
## 
## Call:
## lm(formula = y ~ AB, data = hungab)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.453  -4.433   0.128   4.658  20.006 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   1.4000     0.8164   1.715   0.0874 .
## ABA0B1        1.4000     1.1545   1.213   0.2262  
## ABA1B0       -1.4000     1.1545  -1.213   0.2262  
## ABA1B1        0.4000     1.1584   0.345   0.7301  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.07 on 295 degrees of freedom
## Multiple R-squared:  0.02013,    Adjusted R-squared:  0.01016 
## F-statistic:  2.02 on 3 and 295 DF,  p-value: 0.1112

The following remarks emanate directly from the previous output: 1. Note that the effect of \(B\) (the diuretic) is almost statistically significant at the 5% level and statistically significant at the 10% level in the no-interaction model. 2. The interaction model is not statistically significant. Neither is the straight four-way treatment model.

The interaction model is to the four-way comparison model (note the overall \(F\) statistics in the two models). The p-values associated with the various effects are not the same, since the two models are parametrized differently: The interaction model is \[ E(y_{i})=\beta_0+\beta_1X_{A}+\beta_2X_B+\gamma X_AX_B=\left \{ \begin{array}{l} E(y_{A_0B_0}) = \beta_0\\ E(y_{A_1B_0}) = \beta_0+\beta_1\\ E(y_{A_0B_1}) = \beta_0+\beta_2\\ E(y_{A_1B_1}) = \beta_0+\beta_1+\beta_2+\gamma\\ \end{array} \right . \] where \(X_A=1\) is equivalent to \(A1\), \(X_B=1\) with \(B1\) and, respectively, zero values denote \(A0\) and \(B0\). The four-way design is \[ E(y_{ij})=\zeta+\eta+\theta+\kappa=\left \{ \begin{array}{l} E(y_{A_0B_0}) = \zeta+\kappa\\ E(y_{A_1B_0}) = \zeta+\eta\\ E(y_{A_0B_1}) = \zeta+\theta\\ E(y_{A_1B_1}) = \zeta\\ \end{array} \right . \] so that (see previous output), \(\beta_0=\zeta+\kappa\), \(\beta_0+\beta_1=\zeta+\eta\), etc.

In cross-over trials, each patient is assigned both treatments under comparison. Consider the following example. The drug pronethalol was tested in angina pectoris. A cross-over design was used where patients were randomized to receive placebo or pronethalol. The number of angina episodes while receiving one or the other treatment were counted. The data can be stored in long format (i.e. one row per patient) or a wide format (i.e., one row per patient, where each column stores the response to pronethalol or placebo. Here, the data (stored in text file are in the wide format.

# 5. Example of a Cross-over trial for angina pectoris
angina <- read.table("../data/angina.txt", header=TRUE, quote="\"")
head(angina)
##   Patient Placebo Pronethalol Difference
## 1       1      71          29         42
## 2       2     323         348         25
## 3       3       8           1          7
## 4       4      14           7          7
## 5       5      23          16          7
## 6       6      34          25          9

We can check the normality assumption for the differences of angina attacks or for the actual measurements on the two groups.

# (a) Check normality assumption for the difference in attacks of angina
library(lattice)
histogram( ~ Difference,data = angina)
Figure 5: Histograms for the difference in angina pectoris episodes between pronethanol and placebo groups.

Figure 5: Histograms for the difference in angina pectoris episodes between pronethanol and placebo groups.

We first turn the data in the long format.

# Create independent sample data
angina2 = data.frame(attacks = NA,
                    trt = rep(c("Pronethalol","Placebo"),each = nrow(angina)))
angina2[angina2$trt == "Pronethalol","attacks"] = angina$Pronethalol
angina2[angina2$trt == "Placebo","attacks"] = angina$Placebo

Now we can also produce a histogram of the raw number of angina events by treatment group.

histogram( ~ attacks|trt,data = angina2)
Figure 6: Histograms of the number of angina pectoris episodes between pronethanol and placebo groups.

Figure 6: Histograms of the number of angina pectoris episodes between pronethanol and placebo groups.

Note in the histogram, the syntax results in a histogram of the number of attacks by treatment arm. This is a common feature in various commands. We can also check the normality of the distribution of the events.

# Q-Q plot, normal quantiles
qqmath(~ attacks|trt, data = angina2, distribution = qnorm,
      prepanel = prepanel.qqmathline,
      panel = function(x, ...) {
      panel.qqmathline(x, ...)
      panel.qqmath(x, ...)
      })
Figure 7: Q-Q plot of the number of angina pectoris episodes between pronethanol and placebo groups

Figure 7: Q-Q plot of the number of angina pectoris episodes between pronethanol and placebo groups

The same can be done for the differences. The Q-Q plot of the differences in the number of angina episodes is given in Figure 7.

# Q-Q plot of the differences
qqmath(~ Difference, data = angina, distribution = qnorm,
       prepanel = prepanel.qqmathline,
       panel = function(x, ...) {
         panel.qqmathline(x, ...)
         panel.qqmath(x, ...)
       })
Figure 8: Q-Q plot of the differences in the number of angina pectoris episodes between pronethanol and placebo groups.

Figure 8: Q-Q plot of the differences in the number of angina pectoris episodes between pronethanol and placebo groups.

The distribution is far from normal so we go on with nonparametric tests.

Data analysis

First, we ignore the study design and assume that the data come from two independent populations. To do so, we have to use the data frame , which includes two rows per patient. Then, we perform a two-sample Wilcoxon test

# (b) naive analysis
wilcox.test(attacks ~ trt, data = angina2)
## Warning in wilcox.test.default(x = c(71L, 323L, 8L, 14L, 23L, 34L, 79L, :
## cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  attacks by trt
## W = 88, p-value = 0.3705
## alternative hypothesis: true location shift is not equal to 0

This test suggests that there is no significant difference between the two groups with respect to the number of attacks of angina. Let’s see what happens if we take into account the matched study design.

We are to use the paired version of wilcoxon test (i.e. a nonparametric equivalent of Student’s t tests)

#C Dependent 2-group Wilcoxon Signed Rank Test
wilcox.test(angina$Placebo,angina$Pronethalol,paired = T)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  angina$Placebo and angina$Pronethalol
## V = 67, p-value = 0.03066
## alternative hypothesis: true location shift is not equal to 0

Now the treatment effect becomes significant! This implies that when there is a matched study design we take into account the study design in the analysis. Unfortunately, cross-over trials require more assumptions to be valid. For example, in this analysis, we have assumed that there is no treatment\(\times\)period interaction. In other words, are the effects of pronethanol the same if it follows placebo as they are if pronethanol comes first? Other more general prerequisites for application of crossover studies are given in the notes.