Docs/Tutorials/Bayesian Workflow

Bayesian Clinical Trial Design: From Prior to Approval

A comprehensive end-to-end tutorial using Zetyra's Bayesian Toolkit with FDA-approved case studies. Walk through all six toolkit steps—from eliciting a prior to monitoring an ongoing trial—using real data from REBYOTA, oncology dose-finding, and adaptive platform trials.

SectionCase StudyEndpointKey Method
Prior Elicitation & BorrowingREBYOTA (PUNCH CD3)BinaryPower prior, historical borrowing
Dose-FindingOncology Phase IToxicityBOIN design
Survival EndpointsDLBCL LymphomaTime-to-eventCommensurate prior, Weibull model
Adaptive RandomizationGBM AGILESurvivalResponse-adaptive allocation
Sequential MonitoringREBYOTA (extended)BinaryPosterior probability boundaries
Interim MonitoringAllVariousPPoS, predictive probability

1. The Regulatory Landscape

FDA's Bayesian Framework

The FDA's January 2026 guidance on Bayesian methodology marks a watershed moment for clinical trial design. The FDA now explicitly endorses Bayesian methods for:

Governing adaptation rules for interim analyses

Informing design elements (e.g., dose selection) for subsequent trials

Supporting primary inference in registration trials

Augmenting concurrent controls with external or historical data

Key Regulatory Programs

ProgramPurposeContact Point
Complex Innovative Trial Design (CID)Adaptive/Bayesian design meetingsCDER/CBER
Center for Clinical Trial Innovation (C3TI)Non-adaptive Bayesian demonstrationsC3TI portal
Rare Disease ProgramSmall-sample Bayesian approachesOffice of Orphan Products

Zetyra Toolkit Overview

StepCalculatorUse CaseKey Outputs
1Prior ElicitationHistorical data → informative priorBeta parameters, ESS, prior predictive
2Bayesian BorrowingMulti-study synthesis, MAP priorsDiscount comparison, conflict diagnostics
3Sample Size (Single-Arm)Power/assurance calculationsN, operating characteristics, power curves
4Two-Arm DesignRCT with Bayesian borrowingFrequentist comparison, efficiency gain
5Sequential MonitoringInterim stopping rulesStopping boundaries, ASN curves
6Predictive Power (PPoS)Interim monitoringGo/no-go thresholds, sensitivity

2. REBYOTA: Binary Endpoints with Historical Borrowing

Background

REBYOTA (fecal microbiota, live-jslm) was approved November 2022 for preventing C. difficile recurrence—the first FDA-approved microbiota-based live biotherapeutic. The pivotal PUNCH CD3 trial used Bayesian hierarchical borrowing from the Phase 2b PUNCH CD2 trial.

Why Bayesian? Widespread availability of FMT under enforcement discretion made placebo-controlled enrollment increasingly difficult. FDA recommended formal Bayesian borrowing to enable a feasible trial.

TrialPhaseDesignKey Results
PUNCH CD22bRCT (2 arms + placebo)56.8% success (1-dose) vs 43.2% (placebo)
PUNCH CD33RCT with Bayesian borrowing70.6% vs 57.5%, P(superiority) = 99.1%

Step 1: Prior Elicitation

Goal: Build a prior for treatment success rate using PUNCH CD2 data.

Historical Data (PUNCH CD2, single-dose ITT): Treatment: 25/45 successes (55.6%), Placebo: 19/44 successes (43.2%).

Calculator Inputs

ParameterValueRationale
MethodHistorical DataPhase 2b results available
Events (k)25Successes in CD2 treatment arm
Total (n)45Patients in CD2 treatment arm
Discount factor (δ\delta)0.5Skeptical borrowing—Phase 2b to 3 uncertainty

Why δ=0.5\delta = 0.5?

1. Phase 2b often shows inflated effects (smaller N, selected sites)

2. PUNCH CD3 had broader eligibility (≥1 vs ≥2 recurrences)

3. Regulatory conservatism—better to be pleasantly surprised

Calculator Output

Prior Distribution: Beta(13.5, 11.5)

Mean: 54.0%    Median: 54.0%

95% CI: [36.6%, 70.7%]

ESS: 25 (50% of original 50)

Interpretation: Expects ~54% success with substantial uncertainty, reflecting discounted Phase 2b evidence.

Step 2: Sample Size with Bayesian Borrowing

Clinical context: Null rate (θ0\theta_0) = 45%, Alternative rate (θ1\theta_1) = 65%, Decision threshold: P(θ>θ0Data)97.5%P(\theta > \theta_0 \mid \text{Data}) \geq 97.5\%.

ParameterValue
PriorBeta(13.5, 11.5)
Null rate0.45
Alternative rate0.65
Decision threshold0.975
Target power0.80

Recommended Sample Size: N = 45

Type I Error: 0.032 (≤ 0.05 ✔)

Power: 0.81 (≥ 0.80 ✔)

Decision Rule: Declare success if P(θ > 0.45 | Data) ≥ 0.975. At N=45: Need ≥27 successes (60.0%)

Power Curve

True Success RatePower
45% (null)3.2%
50%12%
55%30%
60%56%
65% (alternative)81%
70%95%

Step 3: Two-Arm Comparison (Borrowing vs. Traditional)

Actual PUNCH CD3 Design: 2:1 randomization (treatment:placebo), Treatment arm N = 180, Placebo arm N = 87, Total N = 267.

DesignTreatment NControl NTotalReduction
Frequentist RCT (no borrowing)220110330
Bayesian + Borrowing (actual)1808726719%

Key Insight: Bayesian borrowing saved 63 patients—critical for a difficult-to-enroll population.

3. Oncology Dose-Finding with BOIN

REBYOTA used a fixed dose. For oncology Phase I trials seeking the Maximum Tolerated Dose (MTD), the Bayesian Optimal Interval (BOIN) design is FDA's preferred model-assisted approach. BOIN uses pre-calculated decision boundaries—no real-time Bayesian computation—while achieving optimal statistical properties.

BOIN Boundaries (Target DLT = 30%)

Patients TreatedEscalate if DLTs ≤De-escalate if DLTs ≥
302
613
924
1235
1546

R Implementation

R
library(BOIN)

# Design parameters
target_dlt <- 0.30    # Target DLT rate (MTD definition)
ncohort <- 10         # Maximum cohorts
cohortsize <- 3       # Patients per cohort
n_doses <- 5          # Number of dose levels

# Step 1: Get decision boundaries
boundaries <- get.boundary(
  target = target_dlt,
  ncohort = ncohort,
  cohortsize = cohortsize
)

# Step 2: Simulate operating characteristics
true_dlt <- c(0.05, 0.10, 0.25, 0.35, 0.50)
oc <- get.oc(
  target = target_dlt,
  p.true = true_dlt,
  ncohort = ncohort,
  cohortsize = cohortsize,
  ntrial = 10000
)
# Dose 3 (25% DLT) selected ~55% of time
# <10% patients at doses above MTD

4. Survival Endpoints with Commensurate Priors (DLBCL)

REBYOTA's endpoint was binary (recurrence yes/no at 8 weeks). For oncology trials with Overall Survival (OS) or Progression-Free Survival (PFS) endpoints, we need time-to-event models. Diffuse Large B-Cell Lymphoma (DLBCL) trials use Bayesian commensurate priors with Weibull models to incorporate external control data.

The Commensurate Prior Framework

For concurrent control parameter θc\theta_c and external control parameter θe\theta_e:

θcθe,τN(θe,τ1)\theta_c \mid \theta_e, \tau \sim N(\theta_e, \tau^{-1})

The commensurability parameter τ\tau automatically adapts borrowing: τ\tau \to \infty means full borrowing (external ≈ concurrent), τ0\tau \to 0 means no borrowing (data conflict detected).

R Implementation with psborrow2

R
library(psborrow2)
library(cmdstanr)

analysis <- create_analysis_obj(
  data_matrix = combined_data,
  outcome = outcome_surv_exponential(
    time_var = "os_time",
    cens_var = "os_event",
    baseline_prior = prior_normal(0, 1000)
  ),
  borrowing = borrowing_hierarchical_commensurate(
    ext_flag_col = "is_external",
    tau_prior = prior_half_cauchy(0, 0.5)
  ),
  treatment = treatment_details(
    trt_flag_col = "treatment_arm",
    trt_prior = prior_normal(0, 2.5)
  )
)

result <- mcmc_sample(analysis,
  iter_warmup = 2000, iter_sampling = 4000, chains = 4
)
# Key outputs: HR posterior, P(HR < 1), ESS borrowed

5. Adaptive Randomization (GBM AGILE)

GBM AGILE (Glioblastoma Adaptive Global Innovative Learning Environment) is a phase 2/3 Bayesian adaptive platform trial—distinct from REBYOTA's fixed randomization. Multiple experimental arms are tested against a common control with Bayesian response-adaptive randomization within disease subtypes.

GBM AGILE Results: Regorafenib Arm

SubtypePatientsMean HRP(HR < 1.0)Decision
Recurrent851.070.43No benefit
Newly Diagnosed911.120.24No benefit
Overall1761.100.24Discontinued

The Bayesian framework provides direct probability statements—none approached the 0.98 efficacy threshold, making discontinuation straightforward.

6. Sequential Monitoring

Why Sequential Monitoring? PPoS (Step 6) answers “how likely is this trial to succeed?” but doesn't formally control Type I error. Sequential Monitoring (Step 5) provides pre-specified stopping boundaries that can serve as the primary monitoring framework in the SAP.

PPoS vs. Sequential Monitoring

AspectPPoS (Step 6)Sequential Monitoring (Step 5)
When to useAd-hoc interim decisionsPre-planned stopping rules
Error controlNot formally controlledFormal Type I error control
OutputGo/no-go probabilityZ-score boundaries per look
Regulatory roleSupplementaryCan be primary monitoring
Prior dependencyStrong — drives PPoS calculationModerate — affects boundary shape

Three Bayesian Sequential Approaches

1. Posterior Probability (PP) — Implemented

Stop when P(θ>0data)γP(\theta > 0 \mid \text{data}) \geq \gamma. Analytical z-score boundaries exist for Normal-Normal conjugate models via the Zhou & Ji (2024) formula:

ck=Φ1(γ)1+σ2nkν2μσ2nkν2c_k = \Phi^{-1}(\gamma)\sqrt{1 + \frac{\sigma^2}{n_k \nu^2}} - \frac{\mu\sqrt{\sigma^2}}{\sqrt{n_k}\,\nu^2}

where μ,ν2\mu, \nu^2 are the prior mean and variance, σ2\sigma^2 is the data variance, and nkn_k is the cumulative sample size at look kk.

2. Posterior Predictive Probability (PPP)

Asks “Given current data, will the final analysis succeed?” More conservative early, permissive late. Resembles stochastically curtailed testing.

3. Decision-Theoretic (DT)

Explicit loss functions for Type I/Type II errors. Optimal boundaries via backward induction. Most flexible but requires loss specification.

REBYOTA Sequential Design Example

Extending the REBYOTA design with Bayesian sequential monitoring. We add 3 planned analyses at 50%, 75%, and 100% of information to allow early stopping for either efficacy or futility.

Calculator Inputs

ParameterValueRationale
Endpoint typeContinuous (Normal-Normal)Difference in success rates
N per look[45, 68, 90]50%, 75%, 100% of N=90 per arm
Prior mean0.0Non-informative starting point
Prior variance1.0Moderate prior uncertainty
Data variance1.0Standardized scale
Efficacy threshold (γ\gamma)0.975Stop if P(θ > 0 | data) ≥ 97.5%
Futility threshold0.10Stop if P(θ > 0 | data) ≤ 10%

Expected Outputs

The calculator produces z-score boundaries at each look. Key structural properties:

Efficacy boundaries monotonically decrease with accumulating data

Futility boundaries are always below efficacy boundaries at each look

With vague priors, boundaries converge to the frequentist z-critical (1.96)

Informative priors shift boundaries — a positive prior mean lowers the evidence bar

7. Predictive Power & Interim Monitoring

PPoS Framework

Predictive Probability of Success (PPoS) answers: “Given current data, what's the probability the trial will succeed if continued?”

REBYOTA Interim Example (Hypothetical)

Setup: Interim at 50% enrollment (N=135). Treatment: 50/90 (55.6%), Placebo: 22/45 (48.9%).

PPoS: 87%

Decision: CONTINUE (20% ≤ PPoS < 90%)

Sensitivity by Prior:

  Skeptical (ESS=4): 79%

  Moderate (ESS=25): 87%

  Enthusiastic (ESS=45): 94%

Conclusion: Robust across priors — continue enrollment.

Decision Thresholds by Endpoint Type

EndpointFutility (Stop)ContinueEfficacy (Stop)
Binary (REBYOTA)PPoS < 20%20–90%PPoS ≥ 90% + P(δ>0) ≥ 99%
Survival (Oncology)PPoS < 10%10–95%PPoS ≥ 95% + HR CrI excludes 1
Dose-Finding (BOIN)P(current > MTD) > 95%MTD identified with ≥6 patients

8. Implementation: R and Python

Complete R Workflow

R
# ============================================
# BAYESIAN CLINICAL TRIAL ANALYSIS TOOLKIT
# ============================================
library(BOIN)        # Dose-finding
library(psborrow2)   # Historical borrowing
library(gsDesign)    # Sample size
library(rstan)       # General Bayesian

# --- PRIOR ELICITATION ---
elicit_prior <- function(k, n, delta = 0.5) {
  alpha <- 1 + delta * k
  beta  <- 1 + delta * (n - k)
  list(
    alpha = alpha, beta = beta,
    ess = alpha + beta - 2,
    mean = alpha / (alpha + beta),
    ci = qbeta(c(0.025, 0.975), alpha, beta)
  )
}

# --- BAYESIAN SAMPLE SIZE ---
bayesian_sample_size <- function(
    prior_alpha, prior_beta,
    null_rate, alt_rate,
    threshold = 0.975,
    target_power = 0.80,
    n_sim = 10000
) {
  for (n in seq(20, 500, by = 5)) {
    null_successes <- rbinom(n_sim, n, null_rate)
    null_posts <- pbeta(null_rate,
      prior_alpha + null_successes,
      prior_beta + n - null_successes,
      lower.tail = FALSE)
    type1 <- mean(null_posts >= threshold)

    alt_successes <- rbinom(n_sim, n, alt_rate)
    alt_posts <- pbeta(null_rate,
      prior_alpha + alt_successes,
      prior_beta + n - alt_successes,
      lower.tail = FALSE)
    power <- mean(alt_posts >= threshold)

    if (type1 <= 0.05 && power >= target_power) {
      return(list(n = n, type1 = type1, power = power))
    }
  }
}

# --- PPoS ---
calculate_ppos <- function(
    current_successes, current_n, planned_n,
    prior_alpha, prior_beta,
    null_rate, threshold = 0.975,
    n_sim = 10000
) {
  post_alpha <- prior_alpha + current_successes
  post_beta  <- prior_beta + current_n - current_successes
  remaining  <- planned_n - current_n

  future_p <- rbeta(n_sim, post_alpha, post_beta)
  future_successes <- rbinom(n_sim, remaining, future_p)

  final_successes <- current_successes + future_successes
  final_alpha <- prior_alpha + final_successes
  final_beta  <- prior_beta + planned_n - final_successes

  final_posts <- pbeta(null_rate, final_alpha,
    final_beta, lower.tail = FALSE)
  mean(final_posts >= threshold)
}

# --- USAGE (REBYOTA) ---
prior <- elicit_prior(k = 25, n = 45, delta = 0.5)
cat("Prior: Beta(", prior$alpha, ",", prior$beta, ")\n")

ss <- bayesian_sample_size(
  prior$alpha, prior$beta,
  null_rate = 0.45, alt_rate = 0.65
)
cat("Required N:", ss$n, "\n")

ppos <- calculate_ppos(
  current_successes = 50, current_n = 90,
  planned_n = 180, prior$alpha, prior$beta,
  null_rate = 0.45
)
cat("PPoS:", round(ppos * 100, 1), "%\n")

Python Implementation

Python
import numpy as np
from scipy import stats

def elicit_prior(k: int, n: int, delta: float = 0.5):
    """Construct power prior from historical data."""
    alpha = 1 + delta * k
    beta  = 1 + delta * (n - k)
    return {
        'alpha': alpha, 'beta': beta,
        'ess': alpha + beta - 2,
        'mean': alpha / (alpha + beta),
        'ci': stats.beta.ppf([0.025, 0.975], alpha, beta)
    }

def bayesian_sample_size(
    prior_alpha, prior_beta,
    null_rate, alt_rate,
    threshold=0.975, target_power=0.80, n_sim=10000
):
    for n in range(20, 501, 5):
        null_successes = np.random.binomial(n, null_rate, n_sim)
        null_posts = 1 - stats.beta.cdf(
            null_rate,
            prior_alpha + null_successes,
            prior_beta + n - null_successes
        )
        type1 = np.mean(null_posts >= threshold)

        alt_successes = np.random.binomial(n, alt_rate, n_sim)
        alt_posts = 1 - stats.beta.cdf(
            null_rate,
            prior_alpha + alt_successes,
            prior_beta + n - alt_successes
        )
        power = np.mean(alt_posts >= threshold)

        if type1 <= 0.05 and power >= target_power:
            return {'n': n, 'type1': type1, 'power': power}

def calculate_ppos(
    current_successes, current_n, planned_n,
    prior_alpha, prior_beta,
    null_rate, threshold=0.975, n_sim=10000
):
    post_alpha = prior_alpha + current_successes
    post_beta  = prior_beta + current_n - current_successes
    remaining  = planned_n - current_n

    future_p = np.random.beta(post_alpha, post_beta, n_sim)
    future_successes = np.random.binomial(remaining, future_p)

    final_successes = current_successes + future_successes
    final_alpha = prior_alpha + final_successes
    final_beta  = prior_beta + planned_n - final_successes

    final_posts = 1 - stats.beta.cdf(
        null_rate, final_alpha, final_beta
    )
    return np.mean(final_posts >= threshold)

# --- USAGE ---
prior = elicit_prior(k=25, n=45, delta=0.5)
print(f"Prior: Beta({prior['alpha']}, {prior['beta']})")

ss = bayesian_sample_size(
    prior['alpha'], prior['beta'], 0.45, 0.65
)
print(f"Required N: {ss['n']}, Power: {ss['power']:.2%}")

ppos = calculate_ppos(50, 90, 180,
    prior['alpha'], prior['beta'], 0.45)
print(f"PPoS: {ppos:.1%}")

9. Regulatory Documentation Checklist

Prior Specification (FDA Guidance Section V.D)

RequirementDocumentation
Source of informationStudy ID, publication, data cut date
Prior parametersDistribution family, parameters, ESS
Discounting rationaleWhy chosen discount factor is appropriate
Sensitivity analysisResults under skeptical/moderate/enthusiastic priors

Operating Characteristics (Section IV.A)

MetricReportTarget
Type I errorSimulated under null≤ α (one-sided)
PowerSimulated under alternative≥ 80%
Sample sizeN (or events)
Decision ruleExplicit thresholdP(benefit) ≥ γ

SAP-Ready Template

Bayesian Primary Analysis: The primary endpoint will be analyzed using a Bayesian model with a [distribution] prior for [parameter], derived from [source] with [discount]% discounting (ESS = [X]).

Decision Rule: The trial will declare success if P([parameter] > [threshold] | Data) ≥ [γ].

Operating Characteristics: Under the null hypothesis ([H₀ specification]), Type I error is [X]%. Under the alternative ([H₁ specification]), power is [Y]%.

Sensitivity Analysis: Primary results will be accompanied by analyses under skeptical (ESS=[a]), moderate (ESS=[b]), and enthusiastic (ESS=[c]) priors.

Interim Analysis: At [information fractions], PPoS will be computed. Futility stopping is recommended if PPoS < 20%. Early efficacy stopping is recommended if PPoS ≥ 90% AND posterior probability ≥ 99%.

10. Quick Reference Cards

Prior Selection Guide

ScenarioPrior TypeESS
No historical dataWeakly informative2–4
Single historical studyPower prior (δ=0.3–0.7)10–30
Multiple studiesMAP priorPool
Strong historical matchCommensurateAdaptive
Regulatory skepticismSkeptical (at null)<10

Package Cheat Sheet

TaskRPython
Prior elicitationRBesTscipy.stats
Dose-findingBOINpyboin
Historical borrowingpsborrow2pymc
Sample sizegsDesignstatsmodels
General Bayesianrstan, brmspymc, cmdstanpy

11. References

Regulatory Guidance

  1. FDA (2026). Draft Guidance: Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products.
  2. FDA (2019). Adaptive Design Clinical Trials for Drugs and Biologics: Guidance for Industry.

Case Study Publications

  1. Khanna S, et al. (2022). Efficacy and Safety of RBX2660 in PUNCH CD3. Drugs, 82(15):1527-1538.
  2. Yuan Y, et al. (2016). Bayesian Optimal Interval Design. Clinical Cancer Research, 22:4291-4301.
  3. Wen PY, et al. (2020). GBM AGILE: A Global Adaptive Platform Trial. JCO, 40(16_suppl):TPS2078.
  4. Zhou T, Ji Y. (2024). On Bayesian Sequential Clinical Trial Designs. NEJSDS, 2(1):136-151.

Software

  1. Yan F, et al. (2020). BOIN: An R Package for Dose-Finding Trials. Journal of Statistical Software, 94(13):1-32.
  2. Genentech. psborrow2 Package. genentech.github.io/psborrow2/

Ready to design your trial?

Start with Prior Elicitation and work through the six-step workflow.