Bayesian Clinical Trial Design: From Prior to Approval
A comprehensive end-to-end tutorial using Zetyra's Bayesian Toolkit with FDA-approved case studies. Walk through all six toolkit steps—from eliciting a prior to monitoring an ongoing trial—using real data from REBYOTA, oncology dose-finding, and adaptive platform trials.
| Section | Case Study | Endpoint | Key Method |
|---|---|---|---|
| Prior Elicitation & Borrowing | REBYOTA (PUNCH CD3) | Binary | Power prior, historical borrowing |
| Dose-Finding | Oncology Phase I | Toxicity | BOIN design |
| Survival Endpoints | DLBCL Lymphoma | Time-to-event | Commensurate prior, Weibull model |
| Adaptive Randomization | GBM AGILE | Survival | Response-adaptive allocation |
| Sequential Monitoring | REBYOTA (extended) | Binary | Posterior probability boundaries |
| Interim Monitoring | All | Various | PPoS, predictive probability |
Contents
1. The Regulatory Landscape
FDA's Bayesian Framework
The FDA's January 2026 guidance on Bayesian methodology marks a watershed moment for clinical trial design. The FDA now explicitly endorses Bayesian methods for:
Governing adaptation rules for interim analyses
Informing design elements (e.g., dose selection) for subsequent trials
Supporting primary inference in registration trials
Augmenting concurrent controls with external or historical data
Key Regulatory Programs
| Program | Purpose | Contact Point |
|---|---|---|
| Complex Innovative Trial Design (CID) | Adaptive/Bayesian design meetings | CDER/CBER |
| Center for Clinical Trial Innovation (C3TI) | Non-adaptive Bayesian demonstrations | C3TI portal |
| Rare Disease Program | Small-sample Bayesian approaches | Office of Orphan Products |
Zetyra Toolkit Overview
| Step | Calculator | Use Case | Key Outputs |
|---|---|---|---|
| 1 | Prior Elicitation | Historical data → informative prior | Beta parameters, ESS, prior predictive |
| 2 | Bayesian Borrowing | Multi-study synthesis, MAP priors | Discount comparison, conflict diagnostics |
| 3 | Sample Size (Single-Arm) | Power/assurance calculations | N, operating characteristics, power curves |
| 4 | Two-Arm Design | RCT with Bayesian borrowing | Frequentist comparison, efficiency gain |
| 5 | Sequential Monitoring | Interim stopping rules | Stopping boundaries, ASN curves |
| 6 | Predictive Power (PPoS) | Interim monitoring | Go/no-go thresholds, sensitivity |
2. REBYOTA: Binary Endpoints with Historical Borrowing
Background
REBYOTA (fecal microbiota, live-jslm) was approved November 2022 for preventing C. difficile recurrence—the first FDA-approved microbiota-based live biotherapeutic. The pivotal PUNCH CD3 trial used Bayesian hierarchical borrowing from the Phase 2b PUNCH CD2 trial.
Why Bayesian? Widespread availability of FMT under enforcement discretion made placebo-controlled enrollment increasingly difficult. FDA recommended formal Bayesian borrowing to enable a feasible trial.
| Trial | Phase | Design | Key Results |
|---|---|---|---|
| PUNCH CD2 | 2b | RCT (2 arms + placebo) | 56.8% success (1-dose) vs 43.2% (placebo) |
| PUNCH CD3 | 3 | RCT with Bayesian borrowing | 70.6% vs 57.5%, P(superiority) = 99.1% |
Step 1: Prior Elicitation
Goal: Build a prior for treatment success rate using PUNCH CD2 data.
Historical Data (PUNCH CD2, single-dose ITT): Treatment: 25/45 successes (55.6%), Placebo: 19/44 successes (43.2%).
Calculator Inputs
| Parameter | Value | Rationale |
|---|---|---|
| Method | Historical Data | Phase 2b results available |
| Events (k) | 25 | Successes in CD2 treatment arm |
| Total (n) | 45 | Patients in CD2 treatment arm |
| Discount factor () | 0.5 | Skeptical borrowing—Phase 2b to 3 uncertainty |
Why ?
1. Phase 2b often shows inflated effects (smaller N, selected sites)
2. PUNCH CD3 had broader eligibility (≥1 vs ≥2 recurrences)
3. Regulatory conservatism—better to be pleasantly surprised
Calculator Output
Prior Distribution: Beta(13.5, 11.5)
Mean: 54.0% Median: 54.0%
95% CI: [36.6%, 70.7%]
ESS: 25 (50% of original 50)
Interpretation: Expects ~54% success with substantial uncertainty, reflecting discounted Phase 2b evidence.
Step 2: Sample Size with Bayesian Borrowing
Clinical context: Null rate () = 45%, Alternative rate () = 65%, Decision threshold: .
| Parameter | Value |
|---|---|
| Prior | Beta(13.5, 11.5) |
| Null rate | 0.45 |
| Alternative rate | 0.65 |
| Decision threshold | 0.975 |
| Target power | 0.80 |
Recommended Sample Size: N = 45
Type I Error: 0.032 (≤ 0.05 ✔)
Power: 0.81 (≥ 0.80 ✔)
Decision Rule: Declare success if P(θ > 0.45 | Data) ≥ 0.975. At N=45: Need ≥27 successes (60.0%)
Power Curve
| True Success Rate | Power |
|---|---|
| 45% (null) | 3.2% |
| 50% | 12% |
| 55% | 30% |
| 60% | 56% |
| 65% (alternative) | 81% |
| 70% | 95% |
Step 3: Two-Arm Comparison (Borrowing vs. Traditional)
Actual PUNCH CD3 Design: 2:1 randomization (treatment:placebo), Treatment arm N = 180, Placebo arm N = 87, Total N = 267.
| Design | Treatment N | Control N | Total | Reduction |
|---|---|---|---|---|
| Frequentist RCT (no borrowing) | 220 | 110 | 330 | — |
| Bayesian + Borrowing (actual) | 180 | 87 | 267 | 19% |
Key Insight: Bayesian borrowing saved 63 patients—critical for a difficult-to-enroll population.
3. Oncology Dose-Finding with BOIN
REBYOTA used a fixed dose. For oncology Phase I trials seeking the Maximum Tolerated Dose (MTD), the Bayesian Optimal Interval (BOIN) design is FDA's preferred model-assisted approach. BOIN uses pre-calculated decision boundaries—no real-time Bayesian computation—while achieving optimal statistical properties.
BOIN Boundaries (Target DLT = 30%)
| Patients Treated | Escalate if DLTs ≤ | De-escalate if DLTs ≥ |
|---|---|---|
| 3 | 0 | 2 |
| 6 | 1 | 3 |
| 9 | 2 | 4 |
| 12 | 3 | 5 |
| 15 | 4 | 6 |
R Implementation
library(BOIN)
# Design parameters
target_dlt <- 0.30 # Target DLT rate (MTD definition)
ncohort <- 10 # Maximum cohorts
cohortsize <- 3 # Patients per cohort
n_doses <- 5 # Number of dose levels
# Step 1: Get decision boundaries
boundaries <- get.boundary(
target = target_dlt,
ncohort = ncohort,
cohortsize = cohortsize
)
# Step 2: Simulate operating characteristics
true_dlt <- c(0.05, 0.10, 0.25, 0.35, 0.50)
oc <- get.oc(
target = target_dlt,
p.true = true_dlt,
ncohort = ncohort,
cohortsize = cohortsize,
ntrial = 10000
)
# Dose 3 (25% DLT) selected ~55% of time
# <10% patients at doses above MTD4. Survival Endpoints with Commensurate Priors (DLBCL)
REBYOTA's endpoint was binary (recurrence yes/no at 8 weeks). For oncology trials with Overall Survival (OS) or Progression-Free Survival (PFS) endpoints, we need time-to-event models. Diffuse Large B-Cell Lymphoma (DLBCL) trials use Bayesian commensurate priors with Weibull models to incorporate external control data.
The Commensurate Prior Framework
For concurrent control parameter and external control parameter :
The commensurability parameter automatically adapts borrowing: means full borrowing (external ≈ concurrent), means no borrowing (data conflict detected).
R Implementation with psborrow2
library(psborrow2)
library(cmdstanr)
analysis <- create_analysis_obj(
data_matrix = combined_data,
outcome = outcome_surv_exponential(
time_var = "os_time",
cens_var = "os_event",
baseline_prior = prior_normal(0, 1000)
),
borrowing = borrowing_hierarchical_commensurate(
ext_flag_col = "is_external",
tau_prior = prior_half_cauchy(0, 0.5)
),
treatment = treatment_details(
trt_flag_col = "treatment_arm",
trt_prior = prior_normal(0, 2.5)
)
)
result <- mcmc_sample(analysis,
iter_warmup = 2000, iter_sampling = 4000, chains = 4
)
# Key outputs: HR posterior, P(HR < 1), ESS borrowed5. Adaptive Randomization (GBM AGILE)
GBM AGILE (Glioblastoma Adaptive Global Innovative Learning Environment) is a phase 2/3 Bayesian adaptive platform trial—distinct from REBYOTA's fixed randomization. Multiple experimental arms are tested against a common control with Bayesian response-adaptive randomization within disease subtypes.
GBM AGILE Results: Regorafenib Arm
| Subtype | Patients | Mean HR | P(HR < 1.0) | Decision |
|---|---|---|---|---|
| Recurrent | 85 | 1.07 | 0.43 | No benefit |
| Newly Diagnosed | 91 | 1.12 | 0.24 | No benefit |
| Overall | 176 | 1.10 | 0.24 | Discontinued |
The Bayesian framework provides direct probability statements—none approached the 0.98 efficacy threshold, making discontinuation straightforward.
6. Sequential Monitoring
Why Sequential Monitoring? PPoS (Step 6) answers “how likely is this trial to succeed?” but doesn't formally control Type I error. Sequential Monitoring (Step 5) provides pre-specified stopping boundaries that can serve as the primary monitoring framework in the SAP.
PPoS vs. Sequential Monitoring
| Aspect | PPoS (Step 6) | Sequential Monitoring (Step 5) |
|---|---|---|
| When to use | Ad-hoc interim decisions | Pre-planned stopping rules |
| Error control | Not formally controlled | Formal Type I error control |
| Output | Go/no-go probability | Z-score boundaries per look |
| Regulatory role | Supplementary | Can be primary monitoring |
| Prior dependency | Strong — drives PPoS calculation | Moderate — affects boundary shape |
Three Bayesian Sequential Approaches
1. Posterior Probability (PP) — Implemented
Stop when . Analytical z-score boundaries exist for Normal-Normal conjugate models via the Zhou & Ji (2024) formula:
where are the prior mean and variance, is the data variance, and is the cumulative sample size at look .
2. Posterior Predictive Probability (PPP)
Asks “Given current data, will the final analysis succeed?” More conservative early, permissive late. Resembles stochastically curtailed testing.
3. Decision-Theoretic (DT)
Explicit loss functions for Type I/Type II errors. Optimal boundaries via backward induction. Most flexible but requires loss specification.
REBYOTA Sequential Design Example
Extending the REBYOTA design with Bayesian sequential monitoring. We add 3 planned analyses at 50%, 75%, and 100% of information to allow early stopping for either efficacy or futility.
Calculator Inputs
| Parameter | Value | Rationale |
|---|---|---|
| Endpoint type | Continuous (Normal-Normal) | Difference in success rates |
| N per look | [45, 68, 90] | 50%, 75%, 100% of N=90 per arm |
| Prior mean | 0.0 | Non-informative starting point |
| Prior variance | 1.0 | Moderate prior uncertainty |
| Data variance | 1.0 | Standardized scale |
| Efficacy threshold () | 0.975 | Stop if P(θ > 0 | data) ≥ 97.5% |
| Futility threshold | 0.10 | Stop if P(θ > 0 | data) ≤ 10% |
Expected Outputs
The calculator produces z-score boundaries at each look. Key structural properties:
Efficacy boundaries monotonically decrease with accumulating data
Futility boundaries are always below efficacy boundaries at each look
With vague priors, boundaries converge to the frequentist z-critical (1.96)
Informative priors shift boundaries — a positive prior mean lowers the evidence bar
7. Predictive Power & Interim Monitoring
PPoS Framework
Predictive Probability of Success (PPoS) answers: “Given current data, what's the probability the trial will succeed if continued?”
REBYOTA Interim Example (Hypothetical)
Setup: Interim at 50% enrollment (N=135). Treatment: 50/90 (55.6%), Placebo: 22/45 (48.9%).
PPoS: 87%
Decision: CONTINUE (20% ≤ PPoS < 90%)
Sensitivity by Prior:
Skeptical (ESS=4): 79%
Moderate (ESS=25): 87%
Enthusiastic (ESS=45): 94%
Conclusion: Robust across priors — continue enrollment.
Decision Thresholds by Endpoint Type
| Endpoint | Futility (Stop) | Continue | Efficacy (Stop) |
|---|---|---|---|
| Binary (REBYOTA) | PPoS < 20% | 20–90% | PPoS ≥ 90% + P(δ>0) ≥ 99% |
| Survival (Oncology) | PPoS < 10% | 10–95% | PPoS ≥ 95% + HR CrI excludes 1 |
| Dose-Finding (BOIN) | P(current > MTD) > 95% | — | MTD identified with ≥6 patients |
8. Implementation: R and Python
Complete R Workflow
# ============================================
# BAYESIAN CLINICAL TRIAL ANALYSIS TOOLKIT
# ============================================
library(BOIN) # Dose-finding
library(psborrow2) # Historical borrowing
library(gsDesign) # Sample size
library(rstan) # General Bayesian
# --- PRIOR ELICITATION ---
elicit_prior <- function(k, n, delta = 0.5) {
alpha <- 1 + delta * k
beta <- 1 + delta * (n - k)
list(
alpha = alpha, beta = beta,
ess = alpha + beta - 2,
mean = alpha / (alpha + beta),
ci = qbeta(c(0.025, 0.975), alpha, beta)
)
}
# --- BAYESIAN SAMPLE SIZE ---
bayesian_sample_size <- function(
prior_alpha, prior_beta,
null_rate, alt_rate,
threshold = 0.975,
target_power = 0.80,
n_sim = 10000
) {
for (n in seq(20, 500, by = 5)) {
null_successes <- rbinom(n_sim, n, null_rate)
null_posts <- pbeta(null_rate,
prior_alpha + null_successes,
prior_beta + n - null_successes,
lower.tail = FALSE)
type1 <- mean(null_posts >= threshold)
alt_successes <- rbinom(n_sim, n, alt_rate)
alt_posts <- pbeta(null_rate,
prior_alpha + alt_successes,
prior_beta + n - alt_successes,
lower.tail = FALSE)
power <- mean(alt_posts >= threshold)
if (type1 <= 0.05 && power >= target_power) {
return(list(n = n, type1 = type1, power = power))
}
}
}
# --- PPoS ---
calculate_ppos <- function(
current_successes, current_n, planned_n,
prior_alpha, prior_beta,
null_rate, threshold = 0.975,
n_sim = 10000
) {
post_alpha <- prior_alpha + current_successes
post_beta <- prior_beta + current_n - current_successes
remaining <- planned_n - current_n
future_p <- rbeta(n_sim, post_alpha, post_beta)
future_successes <- rbinom(n_sim, remaining, future_p)
final_successes <- current_successes + future_successes
final_alpha <- prior_alpha + final_successes
final_beta <- prior_beta + planned_n - final_successes
final_posts <- pbeta(null_rate, final_alpha,
final_beta, lower.tail = FALSE)
mean(final_posts >= threshold)
}
# --- USAGE (REBYOTA) ---
prior <- elicit_prior(k = 25, n = 45, delta = 0.5)
cat("Prior: Beta(", prior$alpha, ",", prior$beta, ")\n")
ss <- bayesian_sample_size(
prior$alpha, prior$beta,
null_rate = 0.45, alt_rate = 0.65
)
cat("Required N:", ss$n, "\n")
ppos <- calculate_ppos(
current_successes = 50, current_n = 90,
planned_n = 180, prior$alpha, prior$beta,
null_rate = 0.45
)
cat("PPoS:", round(ppos * 100, 1), "%\n")Python Implementation
import numpy as np
from scipy import stats
def elicit_prior(k: int, n: int, delta: float = 0.5):
"""Construct power prior from historical data."""
alpha = 1 + delta * k
beta = 1 + delta * (n - k)
return {
'alpha': alpha, 'beta': beta,
'ess': alpha + beta - 2,
'mean': alpha / (alpha + beta),
'ci': stats.beta.ppf([0.025, 0.975], alpha, beta)
}
def bayesian_sample_size(
prior_alpha, prior_beta,
null_rate, alt_rate,
threshold=0.975, target_power=0.80, n_sim=10000
):
for n in range(20, 501, 5):
null_successes = np.random.binomial(n, null_rate, n_sim)
null_posts = 1 - stats.beta.cdf(
null_rate,
prior_alpha + null_successes,
prior_beta + n - null_successes
)
type1 = np.mean(null_posts >= threshold)
alt_successes = np.random.binomial(n, alt_rate, n_sim)
alt_posts = 1 - stats.beta.cdf(
null_rate,
prior_alpha + alt_successes,
prior_beta + n - alt_successes
)
power = np.mean(alt_posts >= threshold)
if type1 <= 0.05 and power >= target_power:
return {'n': n, 'type1': type1, 'power': power}
def calculate_ppos(
current_successes, current_n, planned_n,
prior_alpha, prior_beta,
null_rate, threshold=0.975, n_sim=10000
):
post_alpha = prior_alpha + current_successes
post_beta = prior_beta + current_n - current_successes
remaining = planned_n - current_n
future_p = np.random.beta(post_alpha, post_beta, n_sim)
future_successes = np.random.binomial(remaining, future_p)
final_successes = current_successes + future_successes
final_alpha = prior_alpha + final_successes
final_beta = prior_beta + planned_n - final_successes
final_posts = 1 - stats.beta.cdf(
null_rate, final_alpha, final_beta
)
return np.mean(final_posts >= threshold)
# --- USAGE ---
prior = elicit_prior(k=25, n=45, delta=0.5)
print(f"Prior: Beta({prior['alpha']}, {prior['beta']})")
ss = bayesian_sample_size(
prior['alpha'], prior['beta'], 0.45, 0.65
)
print(f"Required N: {ss['n']}, Power: {ss['power']:.2%}")
ppos = calculate_ppos(50, 90, 180,
prior['alpha'], prior['beta'], 0.45)
print(f"PPoS: {ppos:.1%}")9. Regulatory Documentation Checklist
Prior Specification (FDA Guidance Section V.D)
| Requirement | Documentation |
|---|---|
| Source of information | Study ID, publication, data cut date |
| Prior parameters | Distribution family, parameters, ESS |
| Discounting rationale | Why chosen discount factor is appropriate |
| Sensitivity analysis | Results under skeptical/moderate/enthusiastic priors |
Operating Characteristics (Section IV.A)
| Metric | Report | Target |
|---|---|---|
| Type I error | Simulated under null | ≤ α (one-sided) |
| Power | Simulated under alternative | ≥ 80% |
| Sample size | N (or events) | — |
| Decision rule | Explicit threshold | P(benefit) ≥ γ |
SAP-Ready Template
Bayesian Primary Analysis: The primary endpoint will be analyzed using a Bayesian model with a [distribution] prior for [parameter], derived from [source] with [discount]% discounting (ESS = [X]).
Decision Rule: The trial will declare success if P([parameter] > [threshold] | Data) ≥ [γ].
Operating Characteristics: Under the null hypothesis ([H₀ specification]), Type I error is [X]%. Under the alternative ([H₁ specification]), power is [Y]%.
Sensitivity Analysis: Primary results will be accompanied by analyses under skeptical (ESS=[a]), moderate (ESS=[b]), and enthusiastic (ESS=[c]) priors.
Interim Analysis: At [information fractions], PPoS will be computed. Futility stopping is recommended if PPoS < 20%. Early efficacy stopping is recommended if PPoS ≥ 90% AND posterior probability ≥ 99%.
10. Quick Reference Cards
Prior Selection Guide
| Scenario | Prior Type | ESS |
|---|---|---|
| No historical data | Weakly informative | 2–4 |
| Single historical study | Power prior (δ=0.3–0.7) | 10–30 |
| Multiple studies | MAP prior | Pool |
| Strong historical match | Commensurate | Adaptive |
| Regulatory skepticism | Skeptical (at null) | <10 |
Package Cheat Sheet
| Task | R | Python |
|---|---|---|
| Prior elicitation | RBesT | scipy.stats |
| Dose-finding | BOIN | pyboin |
| Historical borrowing | psborrow2 | pymc |
| Sample size | gsDesign | statsmodels |
| General Bayesian | rstan, brms | pymc, cmdstanpy |
11. References
Regulatory Guidance
- FDA (2026). Draft Guidance: Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products.
- FDA (2019). Adaptive Design Clinical Trials for Drugs and Biologics: Guidance for Industry.
Case Study Publications
- Khanna S, et al. (2022). Efficacy and Safety of RBX2660 in PUNCH CD3. Drugs, 82(15):1527-1538.
- Yuan Y, et al. (2016). Bayesian Optimal Interval Design. Clinical Cancer Research, 22:4291-4301.
- Wen PY, et al. (2020). GBM AGILE: A Global Adaptive Platform Trial. JCO, 40(16_suppl):TPS2078.
- Zhou T, Ji Y. (2024). On Bayesian Sequential Clinical Trial Designs. NEJSDS, 2(1):136-151.
Software
- Yan F, et al. (2020). BOIN: An R Package for Dose-Finding Trials. Journal of Statistical Software, 94(13):1-32.
- Genentech. psborrow2 Package. genentech.github.io/psborrow2/
Ready to design your trial?
Start with Prior Elicitation and work through the six-step workflow.