In this lab, we are going to go over sample size calculation, which is required to design a clinical trial. Using some notation, we want \[ \Pr(|\bar{X}-\mu|<d)\geq 1-\alpha. \] To figure things out, we need to standardize the distribution above by dividing by the standard deviation \(\sigma/\sqrt{n}\), resulting in \[ \Pr\left (|Z|=\left |\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\right |<\frac{d}{\sigma/\sqrt{n}}=z_{1-\alpha/2}\right )\geq 1-\alpha \] which in turn means that \[ n\geq\left(\dfrac{\sigma z_{\alpha/2}}{d}\right)^2 \] where \(z_{1-\alpha/2}\) is the \((1-\alpha/2)\%\) quantile of the standard normal distribution (i.e. \(z_{1-\alpha/2}=\Phi^{-1}(1-\alpha/2)\)).

For example, if we want to estimate a mean within half a standard deviation, i.e., \(d=0.5\sigma\), we get a formula that is applicable even without knowledge of \(\sigma\), since \[ n\geq (2z_{\alpha/2})^2 \] In \(\verb|R|\), you can simply code this as follows:

# 4. Find the smallest sample size required to ensure that the error
# of estimation of the mean will be within
# d = 0.5?? of the true mean with 95% confidence
(2*qnorm(0.975))^2
## [1] 15.36584

Since we cannot obtain partial units (human subjects in our context), we round up to the next integer

# Or to round n up to the next highest integer
ceiling((2*qnorm(0.975))^2)
## [1] 16

Testing for a single (normal) mean

Among others, the library \(\verb|pwr|\) provides some useful function for obtaining the sample size required. \[2ex] As an example, consider the case of testing the (one-sided) null hypothesis \(H_0: \mu\leq \mu_0=3\) versus the alternative hypothesis \(H_1: \mu=\mu_1=4\), assuming that the standard deviation in both cases is \(\sigma=2\). We will use the function \(\verb|pwr.t.test|\) from the \(\verb|pwr|\) package to do this. Please, take a look at the help file of the function.

Note that, to properly invoke the function, we have to specify the effect size, which in this case is just the difference of the means under the null and the alternative hypothesis divided by the common standard deviation \[ f=\frac{\mu_1-\mu_0}{\sigma/\sqrt{n}} \] as \(\verb|pwr.t.test|\) takes the effect size as its argument

# 5. Sample size calculation for a Normal mean
# H0: mu <= 3 vs H1: mu = 4, a = 0.05, beta = 0.1 and sigma = 2
library(pwr)
pwr.t.test(d = (4-3)/2, sig.level = 0.05, power = 0.90,
           alternative = "greater",type = "one.sample")
## 
##      One-sample t test power calculation 
## 
##               n = 35.65269
##               d = 0.5
##       sig.level = 0.05
##           power = 0.9
##     alternative = greater

If we would like to calculate the power of a test, we can specify the sample size and leave the power unspecified.

# To get the actual power of the test
pwr.t.test(d = (4-3)/2, sig.level = 0.05, n = 36,
          alternative = "greater",type = "one.sample")
## 
##      One-sample t test power calculation 
## 
##               n = 36
##               d = 0.5
##       sig.level = 0.05
##           power = 0.9025746
##     alternative = greater

Note that the power of the test is not exactly 0.90 because some rounding took place regarding the sample size.\[2ex] You could do this more elegantly, by saving the sample size from the previous command as follows:

pwr.output<-pwr.t.test(d = (4-3)/2, sig.level = 0.05, power = 0.90,
                       alternative = "greater",type = "one.sample")

# See what's in pwr.output object
summary(pwr.output)
##             Length Class  Mode     
## n           1      -none- numeric  
## d           1      -none- numeric  
## sig.level   1      -none- numeric  
## power       1      -none- numeric  
## alternative 1      -none- character
## note        0      -none- NULL     
## method      1      -none- character
# Save the sample size into n
n.ss=ceiling(pwr.output$n)

# Now invoke pwr.t.test again to get the power without worrying about the n
pwr.t.test(d = (4-3)/2, sig.level = 0.05, n = n.ss,
           alternative = "greater",type = "one.sample")
## 
##      One-sample t test power calculation 
## 
##               n = 36
##               d = 0.5
##       sig.level = 0.05
##           power = 0.9025746
##     alternative = greater

Revisiting the Phase-II design

We want to test the null hypothesis \(H_0:p\leq0.15\) vs the alternative \(H_1:p=0.40\) setting \(\alpha=0.10\) and \(\beta=0.20\). The library \(\verb|clinfun|\) includes the function \(\verb|ph2single|\) that can give us the sample size based on the binomial distribution of the number of responders. Because there may be a number of designs which would satisfy the \(\alpha\) level and power requirement, we can see, for example, 5 of the them by using the argument \(\verb|nsoln|\).

# 6. Revisiting the Phase II situation with H0:p = 0.15
# versus the alternative HA:p = 0.40,
# alpha = 0.1 and beta = 0.20

# (i) Using exact binomial test
library(clinfun)
ph2single(pu = 0.15, pa = 0.40, ep1 = 0.10, ep2 = 0.20, nsoln = 5)
##    n r Type I error Type II error
## 1 16 4   0.07905130    0.16656738
## 2 17 4   0.09871000    0.12599913
## 3 19 5   0.05369611    0.16292248
## 4 20 5   0.06730797    0.12559897
## 5 21 5   0.08273475    0.09574016

All the designs above meet our criteria, but we choose the one with the smallest sample size (i.e. \(n=16\)). Recall that we found this design earlier in the discussion.

Using the function \(\verb|pwr.p.test|\) we can find the sample size based on the arcsine transformation, which is mainly used to normalise the distribution of proportions so that the normal approximation is appropriate even for small \(p\), since the normal approximation for the sampling distribution of an estimated proportion is poor when \(\hat{p}\rightarrow 0\) or \(\hat{p}\rightarrow 1\).

# (ii) Using the arcsine transformation of the proportion
# to normalise the distribution of the test statistic
pwr.p.test(h = ES.h(0.40, 0.15), sig.level = 0.10,
           power = 0.80, alternative = "greater")
## 
##      proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.5740396
##               n = 13.68003
##       sig.level = 0.1
##           power = 0.8
##     alternative = greater

Note that we have used the utility function \(\verb|ES.h|\) to calculate the effect size for proportions. The resulting sample size estimate is \(n=14\). The two methods would be closer to each other if the sample size were larger. In any case, the previous method was exact whereas the latter method relies on the normal approximation.

Testing for two independent means

As we discussed in the lecture, testing for two means involves the sample size formula \[ n=\frac{\left (z_{1-\alpha}-z_\beta \right )^2 2\sigma^2}{(\mu_2-\mu_1)^2} \] where \(n\) is the sample size (here we are assuming equal sample sizes in the two groups). The same can be done by considering the effect size \(f=\frac{(\mu_2-\mu_1)}{2\sigma}\) as follows: \[ n= \frac{\left (z_{1-\alpha}-z_\beta \right )^2}{f^2} \]

The following calculations cycle through a sequence of effect sizes and produce a table of sample sizes.

# 7. Verify the calculation in Table 11.11 in the textbook
# (Piantadosi,2005, pp. 280)
table1 = data.frame(ES = seq(0.25,1.50,by = 0.25),
                    beta = 0.10,"Alpha = 0.05" = NA,"Alpha = 0.10" = NA)
names(table1)[3:4] = c("Alpha = 0.05", "Alpha = 0.10")

for (i in 1:length(table1[,1]))
    {
     table1[i,3] = ceiling(2*pwr.t.test(d = table1[i,1], sig.level = 0.05,
                                        power = 0.90, alternative = "two.sided",
                                        type = "two.sample")$n)
     table1[i,4] = ceiling(2*pwr.t.test(d = table1[i,1], sig.level = 0.10,
                                        power = 0.90, alternative = "two.sided",
                                        type = "two.sample")$n)
    }
table1
##     ES beta Alpha = 0.05 Alpha = 0.10
## 1 0.25  0.1          675          550
## 2 0.50  0.1          171          139
## 3 0.75  0.1           77           63
## 4 1.00  0.1           45           36
## 5 1.25  0.1           29           24
## 6 1.50  0.1           21           17

Now we produce the same results with \(\beta=0.20\):

table2 = data.frame(ES = seq(0.25,1.50,by = 0.25),
                    beta = 0.20,"Alpha = 0.05" = NA,"Alpha = 0.10" = NA)
names(table2)[3:4] = c("Alpha = 0.05", "Alpha = 0.10")

for (i in 1:length(table2[,1]))
    {
     table2[i,3] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.05,
                                        power = 0.80, alternative = "two.sided",                                         type = "two.sample")$n)
     table2[i,4] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.10,
                                        power = 0.80, alternative = "two.sided",
                                        type = "two.sample")$n)
     }
table2
##     ES beta Alpha = 0.05 Alpha = 0.10
## 1 0.25  0.2          505          398
## 2 0.50  0.2          128          101
## 3 0.75  0.2           58           46
## 4 1.00  0.2           34           27
## 5 1.25  0.2           23           18
## 6 1.50  0.2           17           13

Note that we have used the Student’s t-distribution instead of the normal distribution in computing the sample sizes. Now let’s unpack the previous code. First, we created a \(\verb|data.frame|\) containing, as its first column, a sequence of effect sizes ranging from \(f=0.25\) to \(f=1.50\) in steps of 0.25. The remaining columns were filled with missing values (\(\verb|NA|\)).

table1 = data.frame(ES = seq(0.25,1.50,by = 0.25),
                    beta = 0.10,"Alpha = 0.05" = NA,"Alpha = 0.10" = NA)

The command

names(table1)[3:4] = c("Alpha = 0.05", "Alpha = 0.10")

assigns names to the third and fourth columns of the table. Then we loop through all the effect sizes in the first column to produce, alternatively, the sample size under alpha levels (two-sided) of 5% and 10%.

for (i in 1:length(table1[,1]))
    {
     table1[i,3] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.05,
                                        power = 0.80, alternative = "two.sided",                                         type = "two.sample")$n)
     table1[i,4] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.10,
                                        power = 0.80, alternative = "two.sided",
                                        type = "two.sample")$n)
     }

Note the use of the command which does not require us to know how many rows are in the table.

Differential allocation

A slight modification in the sample size calculation occurs when the sample size in the two groups is unequal by some factor \(r=1,2,\ldots,\). We are to use the function \(\verb|nNormal|\) found in the library \(\verb|gsDesign|\). Please have a look at the help file. Assuming that the ratio of sample sizes between the two groups is from 1 (equal allocation) to 10 and that the effect size is 0.5 (mean difference divided by the standard deviation), we get

# 8. Unequal group allocation (using normal quantiles)
library(gsDesign)
ar = data.frame(ratio = 1/1:10, n = NA)
for (i in 1:10)
    {
     ar$n[i] = nNormal(delta1 = 0.5, delta0 = 0, sd = 1, sd2 = 1,
                       alpha = 0.05, beta = 0.10,
                       sided = 2, ratio = ar$ratio[i])
}
#

The sample table is as follows:

library(knitr)
library(kableExtra)
kable_styling(kable(ar, "html"),bootstrap_options = c("striped", "hover", "condensed", "responsive"))
ratio n
1.0000000 168.1188
0.5000000 189.1336
0.3333333 224.1584
0.2500000 262.6856
0.2000000 302.6138
0.1666667 343.2425
0.1428571 384.2715
0.1250000 425.5506
0.1111111 466.9966
0.1000000 508.5593

Note the options striped, condensed and responsive which create the effects of the table. The last one, makes the table scrollable in small devices like cell phones, which may not be able to show it in its entirety. We can also create a graph.

plot(ceiling(n)~I(1/ratio),data = ar,xlim = c(1,10),ylim = c(0,550),
     xlab = "Ratio of sample sizes", 
     ylab = "Total sample size",type = "l",xaxt = "n")
axis(1,at = 1:10)
<b>Figure 1.</b> Effect of ratio of sample sizes on the total sample size.

Figure 1. Effect of ratio of sample sizes on the total sample size.

We can easily see that the farther the allocation is from a balanced one (\(r = 1\)) the larger the total sample size must be. Thus, unless a compelling counter-argument exists (e.g., making a randomized clinical trial more palatable to subjects) we suggest using equal group allocation.

In the above code consider the envocation of the plot command

plot(ceiling(n)~I(1/ratio),data = ar,xlim = c(1,10),ylim = c(0,550),
     xlab = "Ratio of sample sizes", 
     ylab = "Total sample size",type = "l",xaxt = "n")
axis(1,at = 1:10)

The same output would be possible with the following (simpler) version:

with(ar, plot((1/ratio), ceiling(n), xlim = c(1,10),ylim = c(0,550),
     xlab = "Ratio of sample sizes", 
     ylab = "Total sample size",type = "l",xaxt = "n"))
axis(1,at = 1:10)

Note how the order of the x and y variables is reversed and the fact that the data can no longer be part of the plot command.

Survival analysis

The first thing we should do is to get an idea of the hazard ratio of the new treatment compared to the standard treatment. To translate information from a statement about median survival, or proportion of patients surviving at some landmark time point, which make sense to everyone, to a statement about hazard of failure, which makes sense to no one, some simplifying assumptions about the survival distribution should be made.

  1. Translating medians to hazards and hazard ratios

We begin by assuming that the time to death is distributed according to the exponential distribution for both groups. Denoting by \(m_0\) and \(m_{1}\) the median survival times for the standard treatment and the new treatment, respectively. Using some (very simple) calculous we can solve the equation with respect to the hazard \(\lambda_{0}\) in the following formula

\[ \int_{0}^{m_{0}}\lambda_0e^{-\lambda_0 t}dt = 0.5 \] where \(\lambda_0\) and \(\mu_0\) are the hazard and the median survival time (i.e., the time point where 50% of the subjects have experienced the event of interest) under the null hypothesis respectively. Running through the steps we get \(\lambda_0 = - \dfrac{\log(0.5)}{m_0}\) and \(\lambda_1 = - \dfrac{\log(0.5)}{m_1}\) in a similar manner, where \(\lambda_1\) and \(\mu_1\) are the hazard of failure and median survival time under the alternative hypothesis respectively. Now the hazard ratio is easily obtained. We point out that another nice property of the exponential distribution is that the ratio of hazards equals the inverse ratio of medians i.e., \[ HR=\frac{\lambda_1}{\lambda_0}=\frac{\mu_0}{\mu_1} \] Note above that we have not specified a unit in time in any of the above calculations. You should be very careful to maintain the same unit of time (months, years, etc.) throughout your calculations. For example, the hazard ratio when the median survival times are \(\mu_0=15\) and \(\mu_1=25\) months (say) is

# Median survival of standard therapy
m0 = 15
lambda0 = - log(0.5)/m0

# Median survival of experimental therapy
m1 = 25
lambda1 = - log(0.5)/m1

# Based on the properties of the exponetial distribution
c(m0/m1,lambda1/lambda0)
## [1] 0.6 0.6
  1. Sample size when accrual time total follow-up duration are known

You have to always keep in mind that when dealing with survival analysis, you are concerned about events not patients. Thus, to obtain the desired level of power in a survival study, a certain number of events must occur, irrespective of the number of patients involved. A complicating factor is that there are different combinations of numbers of patients and length of follow-up that will result in the desired number of events. WE can use the function \(\verb|nSurv|\) to help us investigate such combinations.\[2ex] First we give some definitions:

  1. Accrual time is the time it takes to enroll all the patients in the study

  2. Follow-up time is the duration of the follow-up period after the last patient has been enrolled

  3. Study duration is the sum of (1)+(2).

The \(\verb|nSurv|\) function has plenty of options. You are strongly advised to examine the options in depth.

# Assume 55 months of accrual; 10 months of follow up afterwards
# The function solves
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
       R = 55, minfup = 10,T = 65,
       ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)
## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for:  Accrual rate 
## Hazard ratio                  H1/H0=0.6/1
## Study duration:                   T=65
## Accrual duration:                   55
## Min. end-of-study follow-up: minfup=10
## Expected events (total, H1):        119.2209
## Expected sample size (total):       172.4299
## Accrual rates:
##      Stratum 1
## 0-55    3.1351
## Control event rates (H1):
##       Stratum 1
## 0-Inf    0.0462
## Censoring rates:
##       Stratum 1
## 0-Inf         0
## Power:                 100*(1-beta)=80%
## Type I error (1-sided):   100*alpha=5%
## Equal randomization:          ratio=1

In the above example

In this example, the sample size needed to obtain the desired power is \(n=173\), whereas the accrual rate should be 3.13 patients a month. Note that \(3.1351\times 55= 172.4305\) so that the sample size will be accrued in the specified accrual period of 55 months.

  1. Sample size when the total follow-up time is known

Since we allow for the possibility that some patients might get lost while being under follow-up, it is obvious that we implicitly inflate the variance in our estimates, so a larger sample size should be required. Indeed we see that we now need 197 patients if the annual rate of loss to follow-up is \(\eta=0.009\) (see below):

# Assuming dropout rates
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
       R = 55, minfup = 10,T = 65,eta = 0.009,etaE = 0.009,
       ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)
## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for:  Accrual rate 
## Hazard ratio                  H1/H0=0.6/1
## Study duration:                   T=65
## Accrual duration:                   55
## Min. end-of-study follow-up: minfup=10
## Expected events (total, H1):        119.3854
## Expected sample size (total):       196.3055
## Accrual rates:
##      Stratum 1
## 0-55    3.5692
## Control event rates (H1):
##       Stratum 1
## 0-Inf    0.0462
## Censoring rates:
##       Stratum 1
## 0-Inf     0.009
## Power:                 100*(1-beta)=80%
## Type I error (1-sided):   100*alpha=5%
## Equal randomization:          ratio=1

We increase the follow-up period up to 100 months. Thus, patients would be under study for a longer period of time leading to a much greater probability of observing an event at a progressively higher proportion of them. Since the key quantity in determining the power in survival analysis is the number of events, we don’t have to enroll so many patients as almost everybody would develop the event as follow-up time is extended.

# Assuming a larger follow-up time
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
       R = 55, minfup = 100,T = 155,
       ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)
## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for:  Accrual rate 
## Hazard ratio                  H1/H0=0.6/1
## Study duration:                   T=155
## Accrual duration:                   55
## Min. end-of-study follow-up: minfup=100
## Expected events (total, H1):        119.7092
## Expected sample size (total):       121.8807
## Accrual rates:
##      Stratum 1
## 0-55     2.216
## Control event rates (H1):
##       Stratum 1
## 0-Inf    0.0462
## Censoring rates:
##       Stratum 1
## 0-Inf         0
## Power:                 100*(1-beta)=80%
## Type I error (1-sided):   100*alpha=5%
## Equal randomization:          ratio=1

The expected sample size is now much smaller (\(n=122\)). It is in fact very close to the expected number of events, which means that almost all patients are going to experience the event within that period of time. 4. **Sample size when accrual rate and duration are known

We specify the enrollment rate through \(\verb|gamma=5|\) and the accrual duration through \(\verb|R=35|\). The following code will produce a design under these criteria

# Survival study design with known accrual rate and accrual duration
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
       gamma = 5, R = 35,
       ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)
## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for:  Follow-up duration 
## Hazard ratio                  H1/H0=0.6/1
## Study duration:                   T=51.4813
## Accrual duration:                   35
## Min. end-of-study follow-up: minfup=16.4813
## Expected events (total, H1):        119.2913
## Expected sample size (total):       175
## Accrual rates:
##      Stratum 1
## 0-35         5
## Control event rates (H1):
##       Stratum 1
## 0-Inf    0.0462
## Censoring rates:
##       Stratum 1
## 0-Inf         0
## Power:                 100*(1-beta)=80%
## Type I error (1-sided):   100*alpha=5%
## Equal randomization:          ratio=1

Note that the expected number of events is the virtually the same as in the previous cases. We need \(n=175\) subjects and the additional follow-up time required is approximately 16.5 months.

On the following graph, we see that for a given recruitment duration of 5 months, as the follow up period increases the sample size decreases, and eventually converges to the expected number of events. The results are presented in Figure 2.

# Increase follow up from 5 up to 100 months
fp = seq(5,100,by=0.5)
n = rep(NA,length(fp))
ar = rep(NA,length(fp))

for (i in 1:length(fp))
    {
     x = nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
               R = 5, minfup = fp[i], T = 5 + fp[i],
               ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)
     n[i] = x$n
     ar[i] = x$gamma
    }
plot(fp,n,ylim = c(80,510),yaxt = "n",ylab = "Sample size",
     xlab = "Follow up (months)",type = "l")
     axis(2,at = c(100,120,200,300,400,500),las = 1)
abline(h = 119,col = "red")
text(18,130,"Expected events")
<b>Figure 2.</b> Sample size as the length of follow-up is extended.

Figure 2. Sample size as the length of follow-up is extended.