Lab 3: Sample size calculation

In this lab, we will review sample size calculation, which is essential for designing a clinical trial.

Sample Size for a Pre-Specified Precision

Using standard notation, we aim to satisfy the condition:

$\Pr(|\bar{X}-\mu|<d)\geq 1-\alpha.$ To analyze this, we need to standardize the distribution by dividing by the standard deviation $\sigma/\sqrt{n}$ , resulting in

$\Pr\left (|Z|=\left |\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\right |<\frac{d}{\sigma/\sqrt{n}}=z_{1-\alpha/2}\right )\geq 1-\alpha$ This implies that:

$n\geq\left(\dfrac{\sigma z_{\alpha/2}}{d}\right)^2$

where $z_{1-\alpha/2}$ is the $(1-\alpha/2)\%$ quantile of the standard normal distribution (i.e., $z_{1-\alpha/2}=\Phi^{-1}(1-\alpha/2)$ ).

For example, if we want to estimate a mean within half a standard deviation, i.e., $d=0.5\sigma$ , we derive a formula that is applicable even without knowledge of $\sigma$ :

$n\geq (2z_{\alpha/2})^2$ In R, this can be coded as follows:

# Find the smallest sample size required to ensure that the error of estimation of the mean will be within d of the true mean
(2 * qnorm(0.975))^2

## [1] 15.36584

Since partial units (human subjects in our context) cannot be obtained, we round up to the next integer:

# To round n up to the next highest integer
ceiling((2*qnorm(0.975))^2)

## [1] 16

Testing for a Single (Normal) Mean

Among other functionalities, the library pwr provides useful functions for determining the required sample size.

As an example, consider testing the (one-sided) null hypothesis $H_0: \mu\leq \mu_0=3$ against the alternative hypothesis $H_1: \mu=\mu_1=4$ , assuming that the standard deviation in both cases is $\sigma=2$ . We will use the function pwr.t.test from the pwr package for this purpose. Please refer to the help file of the function for more details.

To properly invoke this function, we must specify the effect size, defined as the difference between the means under the null and alternative hypotheses divided by the common standard deviation:

$f=\frac{\mu_1-\mu_0}{\sigma/\sqrt{n}}$

The pwr.t.test function accepts this effect size as its argument:

# Sample size calculation for a Normal mean
# H0: mu <= 3 vs H1: mu = 4, a = 0.05, beta = 0.1 and sigma = 2
library(pwr)
pwr.t.test(d = (4-3)/2, sig.level = 0.05, power = 0.90,
           alternative = "greater", type = "one.sample")

## 
##      One-sample t test power calculation 
## 
##               n = 35.65269
##               d = 0.5
##       sig.level = 0.05
##           power = 0.9
##     alternative = greater

If we wish to calculate the power of a test instead, we can specify the sample size and leave power unspecified:

# To get the actual power of the test
pwr.t.test(d = (4-3)/2, sig.level = 0.05, n = 36,
           alternative = "greater", type = "one.sample")

## 
##      One-sample t test power calculation 
## 
##               n = 36
##               d = 0.5
##       sig.level = 0.05
##           power = 0.9025746
##     alternative = greater

Note that the power of the test may not exactly equal 0.90 due to rounding related to sample size.

A more efficient approach involves saving the sample size from the previous command as follows:

pwr.output <- pwr.t.test(d = (4-3)/2, sig.level = 0.05, power = 0.90,
                         alternative = "greater", type = "one.sample")

# Examine contents of pwr.output object
summary(pwr.output)

##             Length Class  Mode     
## n           1      -none- numeric  
## d           1      -none- numeric  
## sig.level   1      -none- numeric  
## power       1      -none- numeric  
## alternative 1      -none- character
## note        0      -none- NULL     
## method      1      -none- character

# Save sample size into n
n.ss = ceiling(pwr.output$n)

# Invoke pwr.t.test again to obtain power without concern for n
pwr.t.test(d = (4-3)/2, sig.level = 0.05, n = n.ss,
           alternative = "greater", type = "one.sample")

## 
##      One-sample t test power calculation 
## 
##               n = 36
##               d = 0.5
##       sig.level = 0.05
##           power = 0.9025746
##     alternative = greater

Revisiting Phase-II Design

We aim to test the null hypothesis $H_0:p\leq0.15$ against the alternative $H_1:p=0.40$ , setting $\alpha=0.10$ and $\beta=0.20$ . The library clinfun includes the function ph2single, which provides sample sizes based on binomial distribution for responders. Because there may be a number of designs which would satisfy the $\alpha$ level and power requirement, we can see, for example, 5 of them by using the argument nsoln.

# Revisiting the Phase II situation with H0:p = 0.15
# versus the alternative HA:p = 0.40,
# alpha = 0.1 and beta = 0.20

# Using exact binomial test
library(clinfun)
ph2single(pu = 0.15, pa = 0.40, ep1 = 0.10, ep2 = 0.20, nsoln = 5)

##    n r Type I error Type II error
## 1 16 4   0.07905130    0.16656738
## 2 17 4   0.09871000    0.12599913
## 3 19 5   0.05369611    0.16292248
## 4 20 5   0.06730797    0.12559897
## 5 21 5   0.08273475    0.09574016

All designs meet our criteria; however, we select one with the smallest sample size (i.e. $n=16$ ). n=16). Recall that we identified this design earlier in our discussion.

Using the function pwr.p.test, we can find sample size based on arcsine transformation, which normalizes distribution of proportions so that normal approximation remains appropriate even for small $p$ , since the normal approximation for the sampling distribution of an estimated proportion is poor when $\hat{p}\rightarrow 0$ or $\hat{p}\rightarrow 1$ .

# Using arcsine transformation of proportion to normalize distribution of test statistic
pwr.p.test(h = ES.h(0.40, 0.15), sig.level = 0.10,
           power = 0.80, alternative = "greater")

## 
##      proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.5740396
##               n = 13.68003
##       sig.level = 0.1
##           power = 0.8
##     alternative = greater

Note that we utilized ES.h utility function to calculate effect size for proportions; this results in an estimated sample size of $n=14$ . The two methods would yield closer results with larger sample sizes; however, one method is exact while the other relies on normal approximation.

Testing for Two Independent Means

As discussed in lectures, testing for two means involves using the sample size formula:

$n=\frac{\left (z_{1-\alpha}-z_\beta \right )^2 2\sigma^2}{(\mu_2-\mu_1)^2},$ where $n$ is the sample size for each group (assuming equal sample sizes in both groups). This can also be expressed in terms of effect size:

$f=\frac{(\mu_2-\mu_1)}{2\sigma}$

as follows:

$n= \frac{\left (z_{1-\alpha}-z_\beta \right )^2}{f^2}.$ The following calculations cycle through various effect sizes and produce a table of sample sizes:

# Verify calculation in Table 11.11 from textbook (Piantadosi,2005, pp.280)
table1 <- data.frame(ES = seq(0.25,1.50,by = 0.25),
                    beta = 0.10, "Alpha = 0.05" = NA, "Alpha = 0.10" = NA)
names(table1)[3:4] = c("Alpha = 0.05", "Alpha = 0.10")

for (i in 1:length(table1[,1])) {
     table1[i,3] <- ceiling(2*pwr.t.test(d = table1[i,1], sig.level = 0.05,
                                       power = 0.90, alternative = "two.sided",
                                       type = "two.sample")$n)
     table1[i,4] <- ceiling(2*pwr.t.test(d = table1[i,1], sig.level = 0.10,
                                       power = 0.90, alternative = "two.sided",
                                       type = "two.sample")$n)
}
table1

##     ES beta Alpha = 0.05 Alpha = 0.10
## 1 0.25  0.1          675          550
## 2 0.50  0.1          171          139
## 3 0.75  0.1           77           63
## 4 1.00  0.1           45           36
## 5 1.25  0.1           29           24
## 6 1.50  0.1           21           17

Now we produce similar results with $\beta=0.20$ :

table2 <- data.frame(ES = seq(0.25,1.50,by = 0.25),
                    beta = 0.20, "Alpha = 0.05" = NA, "Alpha = 0.10" = NA)
names(table2)[3:4] = c("Alpha = 0.05", "Alpha = 0.10")

for (i in 1:length(table2[,1])) {
     table2[i,3] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.05,
                                       power = 0.80, alternative = "two.sided",
                                       type = "two.sample")$n)
     table2[i,4] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.10,
                                       power = 0.80, alternative = "two.sided",
                                       type = "two.sample")$n)
}
table2

##     ES beta Alpha = 0.05 Alpha = 0.10
## 1 0.25  0.2          505          398
## 2 0.50  0.2          128          101
## 3 0.75  0.2           58           46
## 4 1.00  0.2           34           27
## 5 1.25  0.2           23           18
## 6 1.50  0.2           17           13

Note that we utilized Student’s t-distribution instead of the normal distribution in computing the sample sizes.

Next, let’s unpack the previous code snippets step-by-step:

*First, we created a data.frame containing, as its first column, a sequence of effect sizes ranging from $f=0.25$ to $f=1.50$ in increments of 0.25. The remaining columns were initialized with missing values (NA).

table1 <- data.frame(ES = seq(0.25,1.50,by = 0.25),
                    beta = 0.10, "Alpha = 0.05" = NA, "Alpha = 0.10" = NA)

*The command below assigns names to columns three and four of our table.

names(table1)[3:4] <- c("Alpha = 0.05", "Alpha = 0.10")

We then loop through all effect sizes in column one to alternatively produce sample sizes under two-sided alpha levels (two-sided) of 5% and 10%.

for (i in 1:length(table1[,1])) {
     table1[i,3] <- ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.05,
                                       power = 0.80, alternative = "two.sided",
                                       type = "two.sample")$n)
     table1[i,4] <- ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.10,
                                       power = 0.80, alternative = "two.sided",
                                       type = "two.sample")$n)
}

Differential Allocation

A slight modification occurs in sample size calculation when unequal sample sizes are present between two groups by some factor $r$ , $r=1,2,\ldots,$ . We utilize the function nNormal found in the library gsDesign. PPlease consult its help file for more information.

Assuming that ratio of sample sizes between the two groups ranges from 1 (equal allocation) to 10, and that effect size is $f=0.5$ (mean difference divided by the standard deviation), we obtain:

# Unequal group allocation (using normal quantiles)
library(gsDesign)

ar = data.frame(ratio = 1/1:10, n = NA)

for (i in 1:10) {
     ar$n[i] = nNormal(delta1 = 0.5, delta0 = 0, sd = 1, sd2 = 1,
                       alpha = 0.05, beta = 0.10,
                       sided = 2, ratio = ar$ratio[i])
}

The resulting sample table is as follows:

library(knitr)
library(kableExtra)
kable_styling(kable(ar, "html"), bootstrap_options = c("striped", "hover", "condensed", "responsive"))

ratio	n
1.0000000	168.1188
0.5000000	189.1336
0.3333333	224.1584
0.2500000	262.6856
0.2000000	302.6138
0.1666667	343.2425
0.1428571	384.2715
0.1250000	425.5506
0.1111111	466.9966
0.1000000	508.5593

We can also create a graph illustrating these findings:

plot(ceiling(n)~I(1/ratio), data = ar, xlim = c(1,10), ylim = c(0,550),
     xlab = "Ratio of sample sizes", 
     ylab = "Total sample size", type = "l", xaxt = "n")
axis(1, at = 1:10)

Effect of ratio of sample sizes on the total sample size.

It is evident that as allocation deviates from balance ( $r = 1$ ), total required sample size increases significantly; thus unless compelling reasons exist (e.g., enhancing participant comfort during randomized trials), equal group allocation is recommended.

Survival analysis

Initially assessing hazard ratios between new treatment and standard treatment necessitates translating median survival information or patient survival proportions into hazard statementsan inherently abstract concept requiring simplifying assumptions about survival distributions.

Translating Medians to Hazards and Hazard Ratios

Assuming time until death follows an exponential distribution for both groupsdenoting median survival times as $m_0$ and $m_{1}$ we can derive hazards using integration over time until half-survival occurs:

$\int_{0}^{m_{0}}\lambda_0e^{-\lambda_0 t}dt = 0.5$

where $\lambda_0$ and $\mu_0$ are the hazard and the median survival time (i.e., the time point where 50% of the subjects have experienced the event of interest) under the null hypothesis respectively.

Running through the steps we get $\lambda_0 = - \dfrac{\log(0.5)}{m_0}$ and similarly, $\lambda_1 = - \dfrac{\log(0.5)}{m_1}$ in a similar manner, where $\lambda_1$ and $\mu_1$ are the hazard of failure and median survival time under the alternative hypothesis respectively.

Consequently hazard ratios emerge easily via property stating ratio equals inverse median ratio:, $HR=\frac{\lambda_1}{\lambda_0}=\frac{m_0}{m_1}$

Note above that we have not specified a unit in time in any of the above calculations. You should be very careful to maintain the same unit of time (months, years, etc.) throughout your calculations.

For example, the hazard ratio when the median survival times are $\mu_0=15$ and $\mu_1=25$ months (say) is:

# Median survival of standard therapy
m0 = 15
lambda0 = - log(0.5)/m0

# Median survival of experimental therapy
m1 = 25
lambda1 = - log(0.5)/m1

# Based on the properties of the exponential distribution
c(m0/m1, lambda1/lambda0)

## [1] 0.6 0.6

2.Sample Size When Accrual Time Total Follow-Up Duration Are Known

When conducting survival analysis focus should remain on events rather than patients; thus obtaining desired power requires specific event counts irrespective of patient numbers involveda challenge due various patient follow-up combinations leading towards achieving target event counts.

Utilizing function nSurv, we explore such combinations while defining key terms: 1. Accrual Time: Duration required enrolling all study patients. 2. Follow-Up Time: Period after last enrolled patient until study conclusion. 3. Study Duration: Sum total defined above: (1)+(2).

The nSurv function offers numerous options; thorough examination is strongly recommended.

# Assume 55 months of accrual; 10 months of follow up afterwards
# The function solves
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
      R = 55, minfup = 10, T = 65,
      ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)

## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for:  Accrual rate 
## Hazard ratio                  H1/H0=0.6/1
## Study duration:                   T=65
## Accrual duration:                   55
## Min. end-of-study follow-up: minfup=10
## Expected events (total, H1):        119.2209
## Expected sample size (total):       172.4299
## Accrual rates:
##      Stratum 1
## 0-55    3.1351
## Control event rates (H1):
##       Stratum 1
## 0-Inf    0.0462
## Censoring rates:
##       Stratum 1
## 0-Inf         0
## Power:                 100*(1-beta)=80%
## Type I error (1-sided):   100*alpha=5%
## Equal randomization:          ratio=1

In this example: * lambdaC is the event rate in the control group * hr is the hazard ratio under the alternative hypothesis * hr0 is the hazard ratio under the null hypothesis * R is the accrual (recruitment) duration * minfup is the follow-up of the last patient enrolled * T is the maximum study duration

Herein lies expected sample size needed achieving desired power at $n=173$ ; accrual rate should approximate 3.13 patients monthlyconfirming accrual aligns specified period across given timeframe $3.1351\times 55= 172.4305$ .

3. Sample Size When Total Follow-Up Time Is Known

Considering potential patient loss during follow-up implies variance inflation necessitating larger samples; thus requiring approximately 197 patients if annual drop-out rate equals $\eta=0.009$ :

# Assuming dropout rates
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
      R = 55, minfup = 10, T = 65, eta = 0.009, etaE = 0.009,
      ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)

## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for:  Accrual rate 
## Hazard ratio                  H1/H0=0.6/1
## Study duration:                   T=65
## Accrual duration:                   55
## Min. end-of-study follow-up: minfup=10
## Expected events (total, H1):        119.3854
## Expected sample size (total):       196.3055
## Accrual rates:
##      Stratum 1
## 0-55    3.5692
## Control event rates (H1):
##       Stratum 1
## 0-Inf    0.0462
## Censoring rates:
##       Stratum 1
## 0-Inf     0.009
## Power:                 100*(1-beta)=80%
## Type I error (1-sided):   100*alpha=5%
## Equal randomization:          ratio=1

Extending follow-up period to one-hundred months yields substantial increases likelihood observing events among patients over time; thus expected sample count decreases significantly $n=1122$ closely mirroring anticipated event counts indicating nearly all participants likely experience events within designated timeframe.

# Assuming a larger follow-up time
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
      R = 55, minfup = 100, T = 155,
      ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)

## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for:  Accrual rate 
## Hazard ratio                  H1/H0=0.6/1
## Study duration:                   T=155
## Accrual duration:                   55
## Min. end-of-study follow-up: minfup=100
## Expected events (total, H1):        119.7092
## Expected sample size (total):       121.8807
## Accrual rates:
##      Stratum 1
## 0-55     2.216
## Control event rates (H1):
##       Stratum 1
## 0-Inf    0.0462
## Censoring rates:
##       Stratum 1
## 0-Inf         0
## Power:                 100*(1-beta)=80%
## Type I error (1-sided):   100*alpha=5%
## Equal randomization:          ratio=1

The expected sample size is now much smaller ( $n=122$ ). It is in fact very close to the expected number of events, which means that almost all patients are going to experience the event within that period of time.

4. Sample Size When Accrual Rate And Duration Are Known

Setting enrollment rate through parameter gamma equals five alongside accrual duration set at thirty-five months produces design under these criteria:

# Survival study design with known accrual rate and accrual duration
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
      gamma = 5, R = 35,
      ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)

## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for:  Accrual rate 
## Hazard ratio                  H1/H0=0.6/1
## Study duration:                   T=18
## Accrual duration:                   12
## Min. end-of-study follow-up: minfup=6
## Expected events (total, H1):        120.8216
## Expected sample size (total):       346.2062
## Accrual rates:
##      Stratum 1
## 0-12   28.8505
## Control event rates (H1):
##       Stratum 1
## 0-Inf    0.0462
## Censoring rates:
##       Stratum 1
## 0-Inf         0
## Power:                 100*(1-beta)=80%
## Type I error (1-sided):   100*alpha=5%
## Equal randomization:          ratio=1

Note that the expected number of events is virtually the same as in the previous cases. We need $n=175$ subjects and the additional follow-up time required is approximately 16.5 months.

On the following graph, we see that for a given recruitment duration of 5 months, as the follow up period increases the sample size decreases, and eventually converges to the expected number of events:

# Increase follow up from 5 up to 100 months
fp = seq(5, 100, by = 0.5)
n = rep(NA, length(fp))
ar = rep(NA, length(fp))

for (i in 1:length(fp)) {
     x = nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
               R = 5, minfup = fp[i], T = 5 + fp[i],
               ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)
     n[i] = x$n
     ar[i] = x$gamma
}

plot(fp, n, ylim = c(80,510), yaxt = "n", ylab = "Sample size",
     xlab = "Follow up (months)", type = "l")
axis(2, at = c(100,120,200,300,400,500), las = 1)
abline(h = 119, col = "red")
text(18, 130, "Expected events")

Sample size as the length of follow-up is extended.