In this lab, we will review sample size calculation, which is essential for designing a clinical trial.
Using standard notation, we aim to satisfy the condition:
Pr To analyze this, we need to standardize the distribution by dividing by the standard deviation \sigma/\sqrt{n}, resulting in
\Pr\left (|Z|=\left |\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\right |<\frac{d}{\sigma/\sqrt{n}}=z_{1-\alpha/2}\right )\geq 1-\alpha This implies that:
n\geq\left(\dfrac{\sigma z_{\alpha/2}}{d}\right)^2
where z_{1-\alpha/2} is the (1-\alpha/2)\% quantile of the standard normal distribution (i.e., z_{1-\alpha/2}=\Phi^{-1}(1-\alpha/2)).
For example, if we want to estimate a mean within half a standard deviation, i.e., d=0.5\sigma, we derive a formula that is applicable even without knowledge of \sigma:
n\geq (2z_{\alpha/2})^2 In R, this can be coded as follows:
# Find the smallest sample size required to ensure that the error of estimation of the mean will be within d of the true mean
(2 * qnorm(0.975))^2
## [1] 15.36584
Since partial units (human subjects in our context) cannot be obtained, we round up to the next integer:
# To round n up to the next highest integer
ceiling((2*qnorm(0.975))^2)
## [1] 16
Among other functionalities, the library pwr
provides
useful functions for determining the required sample size.
As an example, consider testing the (one-sided) null hypothesis H_0: \mu\leq \mu_0=3 against the
alternative hypothesis H_1:
\mu=\mu_1=4, assuming that the standard deviation in both cases
is \sigma=2. We will use the
function pwr.t.test
from the pwr
package for
this purpose. Please refer to the help file of the function for more
details.
To properly invoke this function, we must specify the effect size, defined as the difference between the means under the null and alternative hypotheses divided by the common standard deviation:
f=\frac{\mu_1-\mu_0}{\sigma/\sqrt{n}}
The pwr.t.test
function accepts this effect size as its
argument:
# Sample size calculation for a Normal mean
# H0: mu <= 3 vs H1: mu = 4, a = 0.05, beta = 0.1 and sigma = 2
library(pwr)
pwr.t.test(d = (4-3)/2, sig.level = 0.05, power = 0.90,
alternative = "greater", type = "one.sample")
##
## One-sample t test power calculation
##
## n = 35.65269
## d = 0.5
## sig.level = 0.05
## power = 0.9
## alternative = greater
If we wish to calculate the power of a test instead, we can specify the sample size and leave power unspecified:
# To get the actual power of the test
pwr.t.test(d = (4-3)/2, sig.level = 0.05, n = 36,
alternative = "greater", type = "one.sample")
##
## One-sample t test power calculation
##
## n = 36
## d = 0.5
## sig.level = 0.05
## power = 0.9025746
## alternative = greater
Note that the power of the test may not exactly equal 0.90 due to rounding related to sample size.
A more efficient approach involves saving the sample size from the previous command as follows:
pwr.output <- pwr.t.test(d = (4-3)/2, sig.level = 0.05, power = 0.90,
alternative = "greater", type = "one.sample")
# Examine contents of pwr.output object
summary(pwr.output)
## Length Class Mode
## n 1 -none- numeric
## d 1 -none- numeric
## sig.level 1 -none- numeric
## power 1 -none- numeric
## alternative 1 -none- character
## note 0 -none- NULL
## method 1 -none- character
# Save sample size into n
n.ss = ceiling(pwr.output$n)
# Invoke pwr.t.test again to obtain power without concern for n
pwr.t.test(d = (4-3)/2, sig.level = 0.05, n = n.ss,
alternative = "greater", type = "one.sample")
##
## One-sample t test power calculation
##
## n = 36
## d = 0.5
## sig.level = 0.05
## power = 0.9025746
## alternative = greater
We aim to test the null hypothesis H_0:p\leq0.15 against the alternative
H_1:p=0.40, setting \alpha=0.10 and \beta=0.20. The library
clinfun
includes the function ph2single
, which
provides sample sizes based on binomial distribution for responders.
Because there may be a number of designs which would satisfy the \alpha level and power requirement, we
can see, for example, 5 of them by using the argument
nsoln
.
# Revisiting the Phase II situation with H0:p = 0.15
# versus the alternative HA:p = 0.40,
# alpha = 0.1 and beta = 0.20
# Using exact binomial test
library(clinfun)
ph2single(pu = 0.15, pa = 0.40, ep1 = 0.10, ep2 = 0.20, nsoln = 5)
## n r Type I error Type II error
## 1 16 4 0.07905130 0.16656738
## 2 17 4 0.09871000 0.12599913
## 3 19 5 0.05369611 0.16292248
## 4 20 5 0.06730797 0.12559897
## 5 21 5 0.08273475 0.09574016
All designs meet our criteria; however, we select one with the smallest sample size (i.e. n=16). n=16). Recall that we identified this design earlier in our discussion.
Using the function pwr.p.test
, we can find sample size
based on arcsine transformation, which normalizes distribution of
proportions so that normal approximation remains appropriate even for
small p, since the normal
approximation for the sampling distribution of an estimated proportion
is poor when \hat{p}\rightarrow 0
or \hat{p}\rightarrow 1.
# Using arcsine transformation of proportion to normalize distribution of test statistic
pwr.p.test(h = ES.h(0.40, 0.15), sig.level = 0.10,
power = 0.80, alternative = "greater")
##
## proportion power calculation for binomial distribution (arcsine transformation)
##
## h = 0.5740396
## n = 13.68003
## sig.level = 0.1
## power = 0.8
## alternative = greater
Note that we utilized ES.h
utility function to calculate
effect size for proportions; this results in an estimated sample size of
n=14. The two methods would yield
closer results with larger sample sizes; however, one method is exact
while the other relies on normal approximation.
As discussed in lectures, testing for two means involves using the sample size formula:
n=\frac{\left (z_{1-\alpha}-z_\beta \right )^2 2\sigma^2}{(\mu_2-\mu_1)^2}, where n is the sample size for each group (assuming equal sample sizes in both groups). This can also be expressed in terms of effect size:
f=\frac{(\mu_2-\mu_1)}{2\sigma}
as follows:
n= \frac{\left (z_{1-\alpha}-z_\beta \right )^2}{f^2}. The following calculations cycle through various effect sizes and produce a table of sample sizes:
# Verify calculation in Table 11.11 from textbook (Piantadosi,2005, pp.280)
table1 <- data.frame(ES = seq(0.25,1.50,by = 0.25),
beta = 0.10, "Alpha = 0.05" = NA, "Alpha = 0.10" = NA)
names(table1)[3:4] = c("Alpha = 0.05", "Alpha = 0.10")
for (i in 1:length(table1[,1])) {
table1[i,3] <- ceiling(2*pwr.t.test(d = table1[i,1], sig.level = 0.05,
power = 0.90, alternative = "two.sided",
type = "two.sample")$n)
table1[i,4] <- ceiling(2*pwr.t.test(d = table1[i,1], sig.level = 0.10,
power = 0.90, alternative = "two.sided",
type = "two.sample")$n)
}
table1
## ES beta Alpha = 0.05 Alpha = 0.10
## 1 0.25 0.1 675 550
## 2 0.50 0.1 171 139
## 3 0.75 0.1 77 63
## 4 1.00 0.1 45 36
## 5 1.25 0.1 29 24
## 6 1.50 0.1 21 17
Now we produce similar results with \beta=0.20:
table2 <- data.frame(ES = seq(0.25,1.50,by = 0.25),
beta = 0.20, "Alpha = 0.05" = NA, "Alpha = 0.10" = NA)
names(table2)[3:4] = c("Alpha = 0.05", "Alpha = 0.10")
for (i in 1:length(table2[,1])) {
table2[i,3] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.05,
power = 0.80, alternative = "two.sided",
type = "two.sample")$n)
table2[i,4] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.10,
power = 0.80, alternative = "two.sided",
type = "two.sample")$n)
}
table2
## ES beta Alpha = 0.05 Alpha = 0.10
## 1 0.25 0.2 505 398
## 2 0.50 0.2 128 101
## 3 0.75 0.2 58 46
## 4 1.00 0.2 34 27
## 5 1.25 0.2 23 18
## 6 1.50 0.2 17 13
Note that we utilized Student’s t-distribution instead of the normal distribution in computing the sample sizes.
Next, let’s unpack the previous code snippets step-by-step:
*First, we created a data.frame
containing, as its first
column, a sequence of effect sizes ranging from f=0.25 to f=1.50 in increments of 0.25. The
remaining columns were initialized with missing values
(NA
).
table1 <- data.frame(ES = seq(0.25,1.50,by = 0.25),
beta = 0.10, "Alpha = 0.05" = NA, "Alpha = 0.10" = NA)
*The command below assigns names to columns three and four of our table.
names(table1)[3:4] <- c("Alpha = 0.05", "Alpha = 0.10")
We then loop through all effect sizes in column one to alternatively produce sample sizes under two-sided alpha levels (two-sided) of 5% and 10%.
for (i in 1:length(table1[,1])) {
table1[i,3] <- ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.05,
power = 0.80, alternative = "two.sided",
type = "two.sample")$n)
table1[i,4] <- ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.10,
power = 0.80, alternative = "two.sided",
type = "two.sample")$n)
}
A slight modification occurs in sample size calculation when unequal
sample sizes are present between two groups by some factor r, r=1,2,\ldots,. We utilize the function
nNormal
found in the library gsDesign
. PPlease
consult its help file for more information.
Assuming that ratio of sample sizes between the two groups ranges from 1 (equal allocation) to 10, and that effect size is f=0.5 (mean difference divided by the standard deviation), we obtain:
# Unequal group allocation (using normal quantiles)
library(gsDesign)
ar = data.frame(ratio = 1/1:10, n = NA)
for (i in 1:10) {
ar$n[i] = nNormal(delta1 = 0.5, delta0 = 0, sd = 1, sd2 = 1,
alpha = 0.05, beta = 0.10,
sided = 2, ratio = ar$ratio[i])
}
The resulting sample table is as follows:
library(knitr)
library(kableExtra)
kable_styling(kable(ar, "html"), bootstrap_options = c("striped", "hover", "condensed", "responsive"))
ratio | n |
---|---|
1.0000000 | 168.1188 |
0.5000000 | 189.1336 |
0.3333333 | 224.1584 |
0.2500000 | 262.6856 |
0.2000000 | 302.6138 |
0.1666667 | 343.2425 |
0.1428571 | 384.2715 |
0.1250000 | 425.5506 |
0.1111111 | 466.9966 |
0.1000000 | 508.5593 |
We can also create a graph illustrating these findings:
plot(ceiling(n)~I(1/ratio), data = ar, xlim = c(1,10), ylim = c(0,550),
xlab = "Ratio of sample sizes",
ylab = "Total sample size", type = "l", xaxt = "n")
axis(1, at = 1:10)
Effect of ratio of sample sizes on the total sample size.
It is evident that as allocation deviates from balance (r = 1), total required sample size increases significantly; thus unless compelling reasons exist (e.g., enhancing participant comfort during randomized trials), equal group allocation is recommended.
Initially assessing hazard ratios between new treatment and standard treatment necessitates translating median survival information or patient survival proportions into hazard statementsan inherently abstract concept requiring simplifying assumptions about survival distributions.
Assuming time until death follows an exponential distribution for both groupsdenoting median survival times as m_0 and m_{1}we can derive hazards using integration over time until half-survival occurs:
\int_{0}^{m_{0}}\lambda_0e^{-\lambda_0 t}dt = 0.5
where \lambda_0 and \mu_0 are the hazard and the median survival time (i.e., the time point where 50% of the subjects have experienced the event of interest) under the null hypothesis respectively.
Running through the steps we get \lambda_0 = - \dfrac{\log(0.5)}{m_0} and similarly, \lambda_1 = - \dfrac{\log(0.5)}{m_1} in a similar manner, where \lambda_1 and \mu_1 are the hazard of failure and median survival time under the alternative hypothesis respectively.
Consequently hazard ratios emerge easily via property stating ratio equals inverse median ratio:, HR=\frac{\lambda_1}{\lambda_0}=\frac{m_0}{m_1}
Note above that we have not specified a unit in time in any of the above calculations. You should be very careful to maintain the same unit of time (months, years, etc.) throughout your calculations.
For example, the hazard ratio when the median survival times are \mu_0=15 and \mu_1=25 months (say) is:
# Median survival of standard therapy
m0 = 15
lambda0 = - log(0.5)/m0
# Median survival of experimental therapy
m1 = 25
lambda1 = - log(0.5)/m1
# Based on the properties of the exponential distribution
c(m0/m1, lambda1/lambda0)
## [1] 0.6 0.6
When conducting survival analysis focus should remain on events rather than patients; thus obtaining desired power requires specific event counts irrespective of patient numbers involveda challenge due various patient follow-up combinations leading towards achieving target event counts.
Utilizing function nSurv
, we explore such combinations
while defining key terms: 1. Accrual Time: Duration required enrolling
all study patients. 2. Follow-Up Time: Period after last enrolled
patient until study conclusion. 3. Study Duration: Sum total defined
above: (1)+(2).
The nSurv
function offers numerous options; thorough
examination is strongly recommended.
# Assume 55 months of accrual; 10 months of follow up afterwards
# The function solves
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
R = 55, minfup = 10, T = 65,
ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)
## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for: Accrual rate
## Hazard ratio H1/H0=0.6/1
## Study duration: T=65
## Accrual duration: 55
## Min. end-of-study follow-up: minfup=10
## Expected events (total, H1): 119.2209
## Expected sample size (total): 172.4299
## Accrual rates:
## Stratum 1
## 0-55 3.1351
## Control event rates (H1):
## Stratum 1
## 0-Inf 0.0462
## Censoring rates:
## Stratum 1
## 0-Inf 0
## Power: 100*(1-beta)=80%
## Type I error (1-sided): 100*alpha=5%
## Equal randomization: ratio=1
In this example: * lambdaC
is the event rate in the
control group * hr
is the hazard ratio under the
alternative hypothesis * hr0
is the hazard ratio under the
null hypothesis * R
is the accrual (recruitment) duration *
minfup
is the follow-up of the last patient enrolled *
T
is the maximum study duration
Herein lies expected sample size needed achieving desired power at n=173; accrual rate should approximate 3.13 patients monthlyconfirming accrual aligns specified period across given timeframe 3.1351\times 55= 172.4305.
Considering potential patient loss during follow-up implies variance inflation necessitating larger samples; thus requiring approximately 197 patients if annual drop-out rate equals \eta=0.009:
# Assuming dropout rates
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
R = 55, minfup = 10, T = 65, eta = 0.009, etaE = 0.009,
ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)
## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for: Accrual rate
## Hazard ratio H1/H0=0.6/1
## Study duration: T=65
## Accrual duration: 55
## Min. end-of-study follow-up: minfup=10
## Expected events (total, H1): 119.3854
## Expected sample size (total): 196.3055
## Accrual rates:
## Stratum 1
## 0-55 3.5692
## Control event rates (H1):
## Stratum 1
## 0-Inf 0.0462
## Censoring rates:
## Stratum 1
## 0-Inf 0.009
## Power: 100*(1-beta)=80%
## Type I error (1-sided): 100*alpha=5%
## Equal randomization: ratio=1
Extending follow-up period to one-hundred months yields substantial increases likelihood observing events among patients over time; thus expected sample count decreases significantly n=1122closely mirroring anticipated event counts indicating nearly all participants likely experience events within designated timeframe.
# Assuming a larger follow-up time
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
R = 55, minfup = 100, T = 155,
ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)
## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for: Accrual rate
## Hazard ratio H1/H0=0.6/1
## Study duration: T=155
## Accrual duration: 55
## Min. end-of-study follow-up: minfup=100
## Expected events (total, H1): 119.7092
## Expected sample size (total): 121.8807
## Accrual rates:
## Stratum 1
## 0-55 2.216
## Control event rates (H1):
## Stratum 1
## 0-Inf 0.0462
## Censoring rates:
## Stratum 1
## 0-Inf 0
## Power: 100*(1-beta)=80%
## Type I error (1-sided): 100*alpha=5%
## Equal randomization: ratio=1
The expected sample size is now much smaller (n=122). It is in fact very close to the expected number of events, which means that almost all patients are going to experience the event within that period of time.
Setting enrollment rate through parameter gamma equals five alongside accrual duration set at thirty-five months produces design under these criteria:
# Survival study design with known accrual rate and accrual duration
nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
gamma = 5, R = 35,
ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)
## Fixed design, two-arm trial with time-to-event
## outcome (Lachin and Foulkes, 1986).
## Solving for: Accrual rate
## Hazard ratio H1/H0=0.6/1
## Study duration: T=18
## Accrual duration: 12
## Min. end-of-study follow-up: minfup=6
## Expected events (total, H1): 120.8216
## Expected sample size (total): 346.2062
## Accrual rates:
## Stratum 1
## 0-12 28.8505
## Control event rates (H1):
## Stratum 1
## 0-Inf 0.0462
## Censoring rates:
## Stratum 1
## 0-Inf 0
## Power: 100*(1-beta)=80%
## Type I error (1-sided): 100*alpha=5%
## Equal randomization: ratio=1
Note that the expected number of events is virtually the same as in the previous cases. We need n=175 subjects and the additional follow-up time required is approximately 16.5 months.
On the following graph, we see that for a given recruitment duration of 5 months, as the follow up period increases the sample size decreases, and eventually converges to the expected number of events:
# Increase follow up from 5 up to 100 months
fp = seq(5, 100, by = 0.5)
n = rep(NA, length(fp))
ar = rep(NA, length(fp))
for (i in 1:length(fp)) {
x = nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,
R = 5, minfup = fp[i], T = 5 + fp[i],
ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)
n[i] = x$n
ar[i] = x$gamma
}
plot(fp, n, ylim = c(80,510), yaxt = "n", ylab = "Sample size",
xlab = "Follow up (months)", type = "l")
axis(2, at = c(100,120,200,300,400,500), las = 1)
abline(h = 119, col = "red")
text(18, 130, "Expected events")
Sample size as the length of follow-up is extended.