First, we will set the working directory to the location of our data
file and then import the data (trialPh1.csv
) into R.
# Set working directory to the location of data files
setwd("C:/Users/user/Desktop/Clinical_trials/Labs/Lab_2_Study_designs")
# Import the Phase 1 trial data
trialPh1 <- read.csv("data/trialPh1.csv")
By examining the data, we can see that each patient (id) has been assigned to exactly one dose of the drug. Ten doses have been investigated; let’s take a look at the data structure:
# Display the first few rows of the dataset
head(trialPh1)
## id toxicity dose
## 1 1 0 0.2500000
## 2 2 0 0.2750000
## 3 3 0 0.3025000
## 4 4 0 0.3327500
## 5 5 0 0.3660250
## 6 6 1 0.4026275
The outcome variable (toxicity) is binary, making logistic regression appropriate. We assume that the logit transformation of the probability Logit(p)=log(p1−p) depends linearly on the logarithm of the dose:
Logit(p)=β0+β1log(Dose)
To fit such a model, we can use glm
in R. We need to
define the appropriate formula indicating the outcome variable (on the
left) and the covariates (on the right). We also need to specify that R
should use the binomial distribution by setting
family = "binomial"
, which assumes a logit link function by
default.
# Fit logistic regression model
fit <- glm(toxicity ~ log(dose), family = "binomial", data = trialPh1)
# Display the summary of the fitted model
summary(fit)
##
## Call:
## glm(formula = toxicity ~ log(dose), family = "binomial", data = trialPh1)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.9125 0.2341 12.44 <2e-16 ***
## log(dose) 4.9316 0.2890 17.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2050.5 on 1999 degrees of freedom
## Residual deviance: 1633.7 on 1998 degrees of freedom
## AIC: 1637.7
##
## Number of Fisher Scoring iterations: 5
The optimal dose is defined as the one that corresponds to a 30% probability of toxicity. Thus, we need to solve for Dose: Dose=exp(Logit(p)−β0β1)
To estimate this optimal dose, we calculate:
# Extract coefficients from the fitted model
beta_hat <- coef(fit)
# Define inverse logit functions
invlogit <- function(x) {
exp(x) / (1 + exp(x))
}
# Define logit functions
logit <- function(x) {
log(x / (1 - x))
}
# Calculate optimal dose (Dose30)
Dose30 <- exp((logit(0.30) - beta_hat[1]) / beta_hat[2])
Dose30
## (Intercept)
## 0.466553
Thus, the optimal dose which is associated with 30% chance of toxicity is 0.466.
We can produce a graph showing the predictions of the model:
# Create probability plot
curve(invlogit(beta_hat[1] + beta_hat[2] * x),
from = min(log(trialPh1$dose)),
to = max(log(trialPh1$dose)),
ylim = c(0,0.60),
xlab = "Logarithm of dose",
ylab = "Probability of toxicity",
main = "Probability of toxicity \nbased on a logistic regression model")
# Adding dashed red lines for Dose30 on the plot
lines(c(-1.4, log(Dose30)), rep(0.3, 2), col = "red", lty = 2) # Horizontal line at 30%
lines(rep(log(Dose30), 2), c(0, 0.3), col = "red", lty = 2) # Vertical line at Dose30
Figure 1. Probability of toxicity based on a logistic regression model
Note: The x-axis is on a logarithmic scale, so the log-dose is log(0.466553)≈−0.762.
Although this approach can accurately estimate the optimal dose, it cannot be used in modern clinical trials primarily due to ethical reasons. In this artificial example, patients have been exposed to doses that are likely to lead to serious toxicities; however, the goal is to expose as few patients as possible to new experimental drug.
The conditional probability of escalating beyond the ith dose is expressed as:
Pi=b(0;3,pi)+b(1;3,pi)×b(0;3,pi),
where b(k;n,pi)=Pr(X=k;n,pi) denotes the binomial probability of achieving k successes out of n trials, with a success probability of pi.
There are two scenarios that can result in an increase to the ith dose:
Given the hypothetical probability of toxicity at the ith dose, denoted as pi:
The second scenario involves two events: one toxicity in the original cohort (which occurs with probability b(1;3,pi)) and no toxicity in additional subjects (which occurs with probability b(0;3,pi), conditional on one toxicity in the original cohort).
Overall, the conditional probability of passing the ith dose is given by:
b(0;3,pi)+b(1;3,pi)×b(0;3,pi)
Then, the unconditional probability of passing the ith dose is:
Qi=i∏k=1b(0;3,pk)+b(1;3,pk)×b(0;3,pk).
This follows from the multiplication law of probability1:
Here, event A, which represents exactly one among the first three subjects experiencing serious treatment-related toxicity or death with probability P(A)=b(1,3;p)), does not provide any information about event B, which represents no toxicities or deaths among three additional subjects with probability P(B)=b(0,3;p). Thus, we can apply this principle as follows:
P(A∩B)=P(1 toxicity among 3 subjects AND 0 toxicities among 3 subjects)=P(A)P(B)=b(1,3;p)b(0,3;p) To obtain the unconditional probability of passing the ith dose, we need to multiply all earlier probabilities together. You may find these calculations similar to those involved in understanding relationships between survival and hazard functions.
# Function to implement the three-up/three-down design
OC <- function(p, dose = paste("dose", 1:length(p), sep = "")) {
# Calculate conditional probabilities of escalating past the i-th dose
Pi <- dbinom(x = 0, size = 3, prob = p) # Probability of no responders
Pi <- Pi + dbinom(x = 0, size = 3, prob = p) * dbinom(x = 1, size = 3, prob = p) # Add one responder probability
# Calculate the unconditional probability of passing the i-th dose
Qi <- cumprod(Pi)
# Calculate the operating characteristic (OC)
OC <- 1 - Qi
return(data.frame(dose = dose, prob = p, Pi = Pi, Qi = Qi, OC = OC))
}
# Example usage with hypothetical probabilities
OC_example <- OC(p = c(0.2, 0.3))
print(OC_example)
## dose prob Pi Qi OC
## 1 dose1 0.2 0.708608 0.7086080 0.2913920
## 2 dose2 0.3 0.494263 0.3502387 0.6497613
First, let’s plot the toxicity probabilities for both scenarios:
# Define probability scenarios
ptrueB <- seq(0.10, 0.55, by = 0.05) # Probabilities for scenario B
ptrueA <- c(0.001, 0.01, 0.02, 0.05, 0.11, 0.19, 0.28, 0.39, 0.53, 0.68) # Probabilities for scenario A
# Plotting probabilities of set A
plot(1:10, ptrueA,
xlab = "Dose level",
ylab = "Probability of toxicity",
xaxt = "n", yaxt = "n",
ylim = c(0, 0.8),
type = "b", pch = 21,
lty = 3,
main = "Toxicity Probabilities by Dose Level")
# Customizing axes
axis(1, 1:10)
axis(2, seq(0, 0.8, by = 0.1), las = 1)
# Adding grid lines to the plot
abline(v = 1:10, col = "lightgray", lty = "dotted")
abline(h = seq(0, 0.8, by = 0.1), col = "lightgray", lty = "dotted")
# Adding probabilities of set B to the same plot
lines(1:10, ptrueB, pch = 16, type = "b")
text(7, 0.2, label = "A") # Label for set A
text(4.5, 0.35,label = "B") # Label for set B
Now, let’s visualize the Operating Characteristics:
# Calculate operating characteristics (OCs) for both sets A and B
setAoc <- OC(ptrueA)
setBoc <- OC(ptrueB)
# Plotting OCs for set A
plot(1:10, setAoc$OC,
xlab = "Dose level",
ylab = "Cumulative probability of stopping",
xaxt = "n", yaxt = "n",
ylim = c(0, 1),
type = "b", pch = 21,
lty = 3,
main = "Operating Characteristics by Dose Level")
# Customizing axes for OC plot
axis(1, 1:10)
axis(2, seq(0, 1, 0.1), las = 1)
# Adding grid lines to the OC plot
abline(v = 1:10, col = "lightgray", lty = "dotted")
abline(h = seq(0, 0.9, 0.1), col = "lightgray", lty = "dotted")
# Adding OCs for set B to the same plot
lines(1:10, setBoc$OC, pch = 16, type = "b")
text(6.5, 0.4, label = "A") # Label for set A in OC plot
text(3.15, 0.65, label = "B") # Label for set B in OC plot
For set A:
Expected stopping point is between doses 6 and 7 (stopping probabilities close to 50%)
~90% probability of stopping before or at dose 8
For scenario B:
Expected stopping point is at dose 3
85% probability of stopping before or at dose 5 (the optimal dose)
The design may stop before reaching the optimal dose
This analysis suggests that while the three-up/three-down design is conservative from a safety perspective, it might be overly cautious and stop before reaching the optimal dose level.
Let p denote the response rate of the new therapy. The standard treatment has a response rate of approximately 15%, while the new therapy is expected to achieve success in about 40% of patients. We test the following hypotheses:
H0:p≤0.15 H1:p=0.40
with significance level α=0.10, power 1−β=0.8 and sample size n=16. The number of responders X follows a binomial distribution: X|p∼Binomial(n=16,p)
Let’s calculate the binomial probabilities using R:
Common sense suggests that we should reject the null hypothesis in
favor of the alternative when the number of responders exceeds some
threshold k (k=0,1,…,16). We also would like this
k to satisfy both the type-I error
requirement, i.e., that Pr and Type-II
error requirement, i.e., \Pr(X\leq
k|p=p_{1})\leq\beta. We can easily get the binomial
probabilities in R through the function dbinom
:
# Calculate binomial probabilities
n <- 16 # Sample size
k <- 0:16 # Possible number of responders
# Calculate probabilities under null hypothesis
round(dbinom(k, size = n, prob = 0.15),5)
## [1] 0.07425 0.20965 0.27748 0.22851 0.13106 0.05551 0.01796 0.00453 0.00090
## [10] 0.00014 0.00002 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
Note that the function dbinom
takes k
1,\cdots, 16 as a vector and produces a
vector of 17 (i.e., n+1)
elements.
Let’s create a comprehensive summary of the test:
# Create a data frame summarizing binomial test results
bintest <- round(data.frame(
k = k,
dbinom(k, size = n, prob = 0.15),
pbinom(k, size = n, prob = 0.15, lower.tail = FALSE),
dbinom(k, size = n, prob = 0.40),
pbinom(k, size = n, prob = 0.40, lower.tail = FALSE)),
digits = 5)
# Add column names
names(bintest) <- c("k","Pr(X=k|p=0.15)","Pr(X>k|p=0.15)",
"Pr(X=k|p=0.40)","Pr(X>k|p=0.40)")
print(bintest)
## k Pr(X=k|p=0.15) Pr(X>k|p=0.15) Pr(X=k|p=0.40) Pr(X>k|p=0.40)
## 1 0 0.07425 0.92575 0.00028 0.99972
## 2 1 0.20965 0.71610 0.00301 0.99671
## 3 2 0.27748 0.43862 0.01505 0.98166
## 4 3 0.22851 0.21011 0.04681 0.93485
## 5 4 0.13106 0.07905 0.10142 0.83343
## 6 5 0.05551 0.02354 0.16227 0.67116
## 7 6 0.01796 0.00559 0.19833 0.47283
## 8 7 0.00453 0.00106 0.18889 0.28394
## 9 8 0.00090 0.00016 0.14167 0.14227
## 10 9 0.00014 0.00002 0.08395 0.05832
## 11 10 0.00002 0.00000 0.03918 0.01914
## 12 11 0.00000 0.00000 0.01425 0.00490
## 13 12 0.00000 0.00000 0.00396 0.00094
## 14 13 0.00000 0.00000 0.00081 0.00013
## 15 14 0.00000 0.00000 0.00012 0.00001
## 16 15 0.00000 0.00000 0.00001 0.00000
## 17 16 0.00000 0.00000 0.00000 0.00000
It is easy to see from the output above that the smallest k satisfying \text{Pr}(X>k|p=p_0) \leq \alpha is k = 4. Therefore, we reject H_0 when we observe X \geq 5 responses.
# Find threshold k value for rejection
k1 <- which(bintest[,3] < 0.10)[1] # Find first k where Pr(X>k|p=0.15) < alpha level
result <- bintest[k1, c(1,3,5)] # Extract relevant results for decision making
print(result) # Display results for decision making
## k Pr(X>k|p=0.15) Pr(X>k|p=0.40)
## 5 4 0.07905 0.83343
Thus, the exact significance level and power of the test are 0.07905 and 0.83343, respectively. You have to keep in mind that although you can set the type-I error at any level you want, you cannot minimize both the type-I and type-II errors for a given sample size. The only thing you can do in order to increase the power (or decrease the Type-II error) is to increase the sample size.
To verify our results, we can implement a sample size calculation function that determines the required sample size while controlling both type-I and type-II errors:
# Sample size calculations function definition
rsppow <- function(alpha, beta, p0, p1) {
# Validate input
if (p1 <= p0) {
cat("Error: p1 must be greater than p0. Please try again.\n")
return(NULL)
}
# Calculate initial sample size using normal approximation (Fleming, 1982)
F <- (qnorm(1 - beta) * sqrt(p1 * (1 - p1)) +
qnorm(1 - alpha) * sqrt(p0 * (1 - p0)))^2 / (p1 - p0)^2
# Search for the sample size between 0.8F and 4F
n <- round(0.8 * F)
found <- FALSE
# Initialize probabilities of false rejection (pp0) and false non-rejection (pp1)
pp0 <- 1.0
pp1 <- 1.0
# Loop until a suitable sample size is found or the upper limit is reached
while (n < round(4 * F) && !found) {
j <- 0
# Inner loop to check probabilities until conditions are met
while ((pp0 > alpha || pp1 > beta) && j <= n) {
pp0 <- 1 - pbinom(q = j, size = n, prob = p0) # Probability of false rejection
pp1 <- pbinom(q = j, size = n, prob = p1) # Probability of false non-rejection
if (pp0 < alpha || pp1 < beta) {
j <- j + 1
}
}
# If no suitable j found, increment n and continue searching
if (j > n) {
j <- 0
n <- n + 1
} else {
found <- TRUE
}
}
# Output results if a suitable sample size was found
if (found) {
cat("SAMPLE SIZE (N) =", n, "\n")
cat("P0 EXCLUDED IF X ???", j - 1, "\n")
cat("COVERAGE =", 1 - pp0, "\n")
cat("POWER =", 1 - pp1, "\n")
} else {
cat("No suitable sample size found.\n")
}
}
The program loops from 0.8F to 4F, where F=\left (\frac{z_{1-\beta}\sqrt{p_1(1-p_1)}+z_{1-\alpha}\sqrt{p_0(1-p_0)}}{p_1-p_0}\right )^2 is the sample size based on the normal approximation to the binomial distribution2. With p_0=0.15, p_1=0.40, \alpha=0.1 and \beta=0.2, F=10.
Let’s verify our calculations:
rsppow(alpha = 0.1, beta = 0.2, p0 = 0.15, p1 = 0.4)
## SAMPLE SIZE (N) = 16
## P0 EXCLUDED IF X ??? 4
## COVERAGE = 0.9209487
## POWER = 0.8334326
The results confirm our earlier findings: - With N=16, we achieve coverage (1- \alpha > 0.92) (or \alpha < 0.08), meeting our desired
significance level of 0.10 - The power (1-
\beta > 0.83) exceeds our desired power of 0.80 - These
results verify our exact binomial calculations using the
rsppow
program
Two-stage designs typically involve complex calculations. We’ll use
the ph2simon
function to implement both minimax
and optimal (Simon’s) two-stage designs. These designs assess
efficacy in two stages: 1. First stage: Initial efficacy assessment with
potential early stopping 2. Second stage: Additional recruitment if
results are promising
Two-stage designs enroll some patients in the first-stage, where a first assessment of efficacy is performed. If the therapy seems unacceptable, the trial stops. If not, more patients are recruited in the second stage and the efficacy of the drug is investigated again. Type-I and type-II errors are calculated over both stages and involve the cumulative probability of scenarios which result in the study stopping at the first stage and those which result in the study continuing to the second phase. The idea is identical to the operating characteristics in the 3-up-3-down phase-1 design.
# Load required library
library(clinfun)
# Define parameters for the Simon's two-stage design
p0 <- 0.15 # Response rate under the null hypothesis
p1 <- 0.40 # Response rate under the alternative hypothesis
alpha <- 0.10 # Type I error rate (one-sided)
beta <- 0.20 # Type II error rate
# Calculate optimal and minimax designs
simon_results <- ph2simon(pu = p0, pa = p1, ep1 = alpha, ep2 = beta, nmax = 100)
# Print results of Simon's two-stage design
print(simon_results)
##
## Simon 2-stage Phase II design
##
## Unacceptable response rate: 0.15
## Desirable response rate: 0.4
## Error rates: alpha = 0.1 ; beta = 0.2
##
## r1 n1 r n EN(p0) PET(p0) qLo qHi
## Minimax 1 9 4 16 11.80 0.5995 0.457 1.000
## Optimal 1 7 4 18 10.12 0.7166 0.000 0.457
For both designs, the hypotheses to be tested are H_0:p\leq0.15 vs. H_1:p\geq0.40 (one-sided).
We need to enroll n=18 patients, of whom n_1 = 7 are involved in the first stage. - Decision rule: If r_1 \leq 1 of them respond, then the study will stop.
If r_1 > 1 individuals respond, then n_2 = 11 additional subjects will be enrolled in the second phase.
If r_2 \leq 4 respond, the null hypothesis will not be rejected and the drug will not be considered effective.
Otherwise, the null hypothesis will be rejected and the conclusion will be that the response rate p > p_0 = 0.15.
The Simon’s design is “optimal” in the sense that it exposes, on average, the minimum number of patients if the true probability of response is p_0, i.e., ASN (p_0) = EN0 = 10.1 is minimized if H_0 is true (i.e. if p \leq p_0).
We need to enroll n=16 patients, of whom n_1 = 9 are involved in the first stage. - Decision rule: If r_1 \leq 1 of them respond, then the study will stop.
If r_1 > 1 individuals respond, then n_2 = 7 additional subjects will be enrolled in the second phase.
If r_2 \leq 4 respond, the null hypothesis will not be rejected and the drug will not be considered effective.
Otherwise, the null hypothesis will be rejected and the conclusion will be that the response rate p > p_0 = 0.15.
The minimax design minimizes the total sample n but may result in larger average sample under the null hypothesis compared with Simon’s design. Here, ASN (p_0) = 11.8 individuals) if H_0 is true compared to the optimal two-stage design.
First, let’s import and examine the data:
# Import data
library(readr)
angina <- read_table("data/angina.txt") # Load angina data
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Patient = col_double(),
## Placebo = col_double(),
## Pronethalol = col_double(),
## Difference = col_double()
## )
head(angina) # Display first few rows of the dataset
## # A tibble: 6 × 4
## Patient Placebo Pronethalol Difference
## <dbl> <dbl> <dbl> <dbl>
## 1 1 71 29 42
## 2 2 323 348 25
## 3 3 8 1 7
## 4 4 14 7 7
## 5 5 23 16 7
## 6 6 34 25 9
Let’s assess normality through visualizations:
We can check the normality assumption for the differences in angina attacks or for the actual measurements between the two groups.
# Histogram of differences
library(lattice)
histogram(~ Difference, data = angina,
main = "Histogram of Differences in Angina Attacks",
xlab = "Difference in Attacks",
ylab = "Frequency")
Figure 1: Distribution of differences in angina attacks
We transform the data into a long format suitable for analysis.
# Transform data into long format suitable for analysis
angina_long <- data.frame(
attacks = NA,
trt = rep(c("Pronethalol", "Placebo"), each = nrow(angina))
)
angina_long[angina_long$trt == "Pronethalol", "attacks"] <- angina$Pronethalol
angina_long[angina_long$trt == "Placebo", "attacks"] <- angina$Placebo
Now we can produce a histogram of the raw number of angina events by treatment group.
# Histogram of Angina Events by Treatment Group
histogram(~ attacks | trt, data = angina_long,
main = "Histogram of Angina Episodes by Treatment Group",
xlab = "Number of Angina Episodes",
ylab = "Frequency")
Figure 2: Distribution of angina episodes by treatment
Note in the histogram, the syntax ~ attacks | trt
results in a histogram of the number of attacks by treatment arm. This
is a common feature in various R commands.
We can also check the normality of the distribution of events using Q-Q plots.
# Q-Q plot for number of angina episodes between treatment groups
qqmath(~ attacks | trt, data = angina_long,
distribution = qnorm,
prepanel = prepanel.qqmathline,
panel = function(x, ...) {
panel.qqmathline(x, ...)
panel.qqmath(x, ...)
})
Figure 7: Q-Q plot of the number of angina pectoris episodes between pronethanol and placebo groups
The same can be done for the differences. The Q-Q plot of the differences in the number of angina episodes is given in Figure 7.
# Q-Q plot of differences in angina episodes between treatment groups
qqmath(~ Difference, data = angina,
distribution = qnorm,
prepanel = prepanel.qqmathline,
panel = function(x, ...) {
panel.qqmathline(x, ...)
panel.qqmath(x, ...)
})
Figure 8: Q-Q plot of the differences in the number of angina pectoris episodes between pronethanol and placebo groups.
The distribution is far from normal so we go on with nonparametric tests.
Initially, we ignore the study design and assume that the data come from two independent populations. We will use the angina data frame, which includes two rows per patient. We perform a two-sample Wilcoxon test:
# (b) Naive analysis using Wilcoxon test for independent samples
wilcox.test(attacks ~ trt, data = angina_long)
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: attacks by trt
## W = 88, p-value = 0.3705
## alternative hypothesis: true location shift is not equal to 0
This test suggests that there is no significant difference between the two groups regarding the number of attacks of angina. Let’s see what happens when we take into account the matched study design. We will use the paired version of the Wilcoxon test (i.e., a nonparametric equivalent to Student’s t-test):
Now considering the matched design using paired Wilcoxon test:
# Paired Wilcoxon test
wilcox.test(angina$Placebo, angina$Pronethalol, paired = TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: angina$Placebo and angina$Pronethalol
## V = 67, p-value = 0.03066
## alternative hypothesis: true location shift is not equal to 0
Now, the treatment effect becomes significant! This implies that when considering a matched study design, we must account for this in our analysis. However, cross-over trials require more assumptions to be valid. For example, we assume that there is no treatment \times period interaction. In other words, do the effects of pronethalol differ depending on whether it follows placebo or comes first? Additional prerequisites for applying crossover studies are provided in supplementary notes.