Technical White Paper
Zetyra Bayesian Toolkit: A Comprehensive Suite of Validated Bayesian Calculators for Clinical Trial Design
Complete mathematical formulations, validation methodologies, and practical guidance for six integrated Bayesian modules addressing the FDA's evolving regulatory guidance.
Key Findings
Contents
1Executive Summary
FDA Published Bayesian Guidance (January 12, 2026)
The FDA's Draft Guidance on the Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products, coupled with the February 2010 guidance for medical device trials and the ICH E9(R1) estimands framework, has catalyzed widespread adoption of Bayesian methods across therapeutic development.
This toolkit addresses the implementation gap through six integrated calculators with transparent, auditable implementations of conjugate Bayesian models.
The Bayesian Toolkit
Six integrated calculators spanning the full Bayesian trial design workflow:
- Prior Elicitation: Three data-driven methods (quantile matching, ESS-based, historical summary) for specifying informative Beta priors
- Bayesian Borrowing: Power Prior, Commensurate Prior, and Meta-Analytic Predictive (MAP) approaches for leveraging historical control data
- Single-Arm Design: Sample size calculations for binary and continuous endpoints using Beta-Binomial and Normal-Normal conjugate models
- Two-Arm Design: Comparative designs supporting superiority and non-inferiority with flexible allocation ratios
- Sequential Monitoring: Predictive probability stopping rules for interim efficacy and futility assessments
- Predictive Power: Bayesian posterior predictive probability for trial completion forecasting, including survival endpoints
Table 1
Bayesian Validation Summary (248 tests)
| Module | Tests | Key Benchmark |
|---|---|---|
| Prior Elicitation | 22 | Quantile PPF accuracy |
| Bayesian Borrowing | 18 | Power prior, DerSimonian-Laird |
| Single-Arm Design | 26 | Operating characteristics |
| Two-Arm Design | 24 | Superiority/NI Type I error |
| Sequential Monitoring | 95 | Zhou & Ji (2024) boundaries |
| Predictive Power/Survival | 63 | MC simulation vs. analytical |
| Total | 248 | 100% pass rate |
2Bayesian Inference Foundations
The foundation of all Bayesian inference is Bayes' Theorem: the posterior distribution p(θ | D) is proportional to the likelihood p(D | θ) times the prior p(θ). The posterior summarizes all available information about θ and serves as the basis for inference, prediction, and decision-making.
Table 2
Bayesian vs. Frequentist Paradigms
| Aspect | Bayesian | Frequentist |
|---|---|---|
| Parameters | Fixed but uncertain; have distributions | Fixed constants |
| Intervals | Direct probability on parameter | Long-run coverage probability |
| Prior information | Explicitly incorporated | Only through design |
| Sequential analysis | Flexible; same posterior logic | Requires complex corrections |
| Interpretation | P(θ ∈ [L,U] | data) | If repeated ∞ times, 95% of CIs contain θ |
2.1 Operating Characteristics
Despite philosophical differences, Bayesian trial designs are evaluated on frequentist operating characteristics to ensure regulatory acceptability. Type I error (α ≤ 0.05) and power (1 − β ≥ 0.80) are computed via Monte Carlo simulation under H₀ and H₁, as advocated by the FDA.
3Conjugate Prior Families
A prior is conjugate to a likelihood if the posterior belongs to the same distribution family. Conjugate priors yield closed-form posteriors without numerical integration—enabling sub-second computation, numerical stability, and deterministic reproducibility.
Beta-Binomial (Binary Endpoints)
Prior: Beta(α, β) → Posterior: Beta(α + x, β + n − x). ESS = α + β. Used for response rates, event proportions.
Normal-Normal (Continuous Endpoints)
Prior: N(μ₀, σ₀²) → Posterior: N(μₚₒₛₜ, σₚₒₛₜ²). Precision-weighted combination of prior and data. Used for mean treatment differences.
Log-HR Survival (Time-to-Event)
Prior: N(μ₀, τ₀²) on log(HR). Information I = d/4 (Schoenfeld, 1981). Used for survival analysis with hazard ratios.
Gamma-Poisson (Count/Rate Endpoints)
Prior: Gamma(α, β) → Posterior: Gamma(α + x, β + T). Used for rare events and incidence rate data.
4Prior Elicitation
Prior elicitation translates expert knowledge, historical data, or regulatory guidance into a formal probability distribution. The module implements three complementary methods:
Method 1: Quantile Matching
Ask experts to specify plausible ranges (e.g., “90% confident the response rate is between 15% and 45%”), then fit a Beta distribution whose quantiles match the elicited values via L-BFGS-B optimization.
Method 2: ESS-Based Priors
Directly specify the prior mean and equivalent sample size (ESS). ESS = 2 is uninformative (Beta(1,1)); ESS = 20–50 is moderately informative; ESS = 100+ means the prior dominates.
Method 3: Historical Summary Priors
Use summary statistics from historical studies via the Power Prior framework with a discount factor δ ∈ [0, 1] controlling the degree of borrowing.
Regulatory compliance: Per FDA guidance, priors must be prospectively specified, transparent, sensitivity-analyzed, and justified by clinical/historical evidence. The toolkit generates audit trails for all elicitation steps.
5Bayesian Borrowing from Historical Data
Bayesian borrowing enables efficient use of historical control data to reduce current trial sample size. The module implements three principled approaches with conflict diagnostics.
Power Prior
Discounts the historical likelihood by raising it to a power δ. Effective Prior = Beta(α₀ + δ·x₀, β₀ + δ·(n₀ − x₀)). Discount factor δ = 1.0 (full borrowing) to δ = 0.0 (no borrowing).
Commensurate Prior
Uses a commensurability parameter τ that adapts borrowing strength. The effective discount δᵉᵎᵎ = τ/(1 + τ), providing flexible shrinkage based on observed heterogeneity.
Meta-Analytic Predictive (MAP) Prior
Pools multiple historical studies using random-effects meta-analysis (DerSimonian-Laird), accounting for between-study heterogeneity via I² statistic. Optional robust mixture component for outlier protection.
Conflict Diagnostics
When interim data arrives, the toolkit assesses prior-data conflict via tail probability. Classified as: no conflict (p > 0.10), moderate conflict (0.01 < p ≤ 0.10, reduce borrowing by 50%), or severe conflict (p ≤ 0.01, recommend minimal borrowing).
6Single-Arm Bayesian Sample Size Design
Single-arm designs are common in rare diseases, oncology (Phase II), and adaptive trials. The module determines n to achieve target operating characteristics (power, Type I error) for both binary and continuous endpoints.
Design Setup (Binary)
- • Prior: Beta(α, β) from Prior Elicitation or Borrowing
- • Decision rule: Declare success if P(θ > θ₀ | data) ≥ γ
- • Operating characteristics: Type I error ≤ 0.05, Power ≥ 0.80 via 10,000+ Monte Carlo simulations
- • Prior impact: Prior weight = ESS/(ESS + n), with interpretive guidelines (<10% minimal, 10–25% moderate, >50% requires strong justification)
Worked Example: Rare Disease
Rare genetic disorder, null = 0%, alternative = 20%. Prior: Beta(1, 5) (skeptical, ESS = 6). Decision threshold γ = 0.95.
7Two-Arm Bayesian Comparative Design
Supports both superiority (P(θᵀ > θᶜ | data) ≥ γ) and non-inferiority (P(θᵀ > θᶜ − δⁿᵢ | data) ≥ γ) trials with flexible allocation ratios, for both binary and continuous endpoints.
Non-Inferiority Example
New antibiotic vs. standard of care. Control prior: Beta(15, 10) (ESS=25 from historical data). NI margin: −5%. Allocation: 1:1.
8Bayesian Sequential Monitoring
Sequential designs allow interim analyses with stopping rules for efficacy or futility, potentially reducing average sample size while maintaining error control. Implements predictive probability stopping rules aligned with Zhou & Ji (2024).
Stopping Rules
- • Efficacy: Stop if P(θᵀ > θᶜ | data) ≥ γᵉᵎᵎ
- • Futility: Stop if predictive power ≤ γᶠᵘᵗ
- • Continue: Otherwise, proceed to next look
Continuous Endpoints (Analytical)
Z-score boundaries from Zhou & Ji (2024) Normal-Normal conjugate model. All boundary values validated against their Table 3.
Binary/Survival (Simulation-Based)
Boundaries determined via Monte Carlo: simulate trials, record posterior probabilities at each interim look, identify thresholds corresponding to target error rates.
Advantage over frequentist GSD: Bayesian sequential monitoring requires only posterior inference at each look (no inflation for repeated testing), naturally accommodates changing sample sizes, and provides coherent probability statements that simplify regulatory communication.
9Bayesian Predictive Power—Extended
Extends predictive power methodology to survival (time-to-event) endpoints using the log-HR Normal conjugate prior framework. At interim, computes P(declare efficacy at final | current data) by simulating remaining enrollment and events.
Survival Endpoints (Log-HR Conjugate Prior)
With d observed events, the log-HR estimate is approximately Normal with information I = d/4 (Schoenfeld, 1981). The posterior is computed by conjugacy, then predictive power is obtained via Monte Carlo simulation of remaining events to final database lock.
This module integrates with the Sequential Monitoring calculator for boundary computation at each interim look.
10Integrated Bayesian Design Workflow
Six-Stage Workflow
Prior Specification
Engage experts, elicit prior via quantile matching/ESS/historical data, run prior predictive checks
Historical Borrowing (Optional)
Apply power prior or MAP methods, pre-specify discount factor, assess prior-data conflict
Design
Specify decision rule, determine null/alternative, run sample size calculator
Operating Characteristics
Simulate 10,000+ trials under H₀ and H₁, conduct sensitivity analyses
Monitoring Plan
Specify interim look schedules, compute efficacy/futility boundaries, validate stopping rules
Statistical Analysis Plan
Finalize prior, decision rules, pre-specify sensitivity analyses, pre-register trial
Table 3
Design Recommendations by Therapeutic Area
| Area | Design Type | Prior | Borrowing |
|---|---|---|---|
| Rare Disease | Single-arm | Informative from external data | Yes (if controlled) |
| Oncology (Phase II) | Single-arm | Weak to moderate | Light borrowing |
| Cardiovascular | Two-arm | Weak (skeptical) | Yes if historical controls |
| Infectious Disease | Two-arm | Weak (fast accrual) | Rare (ethical) |
| Device | Single-arm/Two-arm | Informative from predicate | Yes (common) |
11Regulatory Context
FDA Guidance: January 2026 Draft
The FDA's Draft Guidance on Bayesian Methods for Medical Products provides explicit recommendations:
- • Prospective Specification: All priors, decision rules, and stopping boundaries pre-specified before enrollment
- • Prior Justification: Justified by historical data, expert elicitation, or regulatory precedent
- • Operating Characteristics: Frequentist Type I error and power documented via simulation
- • Sensitivity Analysis: Results robust across reasonable alternative priors
- • Historical Borrowing: Document exchangeability, quantify ESS, assess prior-data conflict
- • Transparency: All calculations, simulation code, and results auditable
The toolkit also aligns with FDA February 2010 guidance on Bayesian methods in medical devices and ICH E9(R1) estimands framework.
12Validation Framework
The toolkit has been validated against 248 Bayesian-specific tests, supplemented by 251 additional tests covering non-Bayesian frequentist methods (total 499 tests in the validation repository).
Table 4
Bayesian Test Coverage Summary
| Module | Tests | Validation Rate | Key Benchmark |
|---|---|---|---|
| Prior Elicitation | 22 | 100% | Quantile PPF accuracy |
| Bayesian Borrowing | 18 | 100% | Power prior, DerSimonian-Laird |
| Single-Arm Design | 26 | 100% | Operating characteristics |
| Two-Arm Design | 24 | 100% | Superiority/NI Type I error |
| Sequential Monitoring | 95 | 100% | Zhou & Ji 2024 boundaries |
| Predictive Power | 63 | 100% | MC simulation vs. analytical |
| Total Bayesian | 248 | 100% | — |
Validation Repository
Complete validation results available at:
github.com/evidenceinthewild/zetyra-validation13Case Studies
Rare Disease Gene Therapy (Single-Arm)
Single-arm, open-label trial for neurodevelopmental disorder. Prior elicited via quantile matching: Beta(5.6, 13.1), ESS = 18.7. Sample size: n = 24 powered for 80% to detect 30% vs. 10%.
Outcome: 8/24 respond (33%), P(θ > 0.10) = 0.983 > 0.95. Efficacy declared. FDA approved expanded access pending Phase III.
Cardiovascular with Historical Borrowing
New anticoagulant NI trial vs. warfarin. MAP prior from 12 historical trials (pooled rate 2.8/100 PY, I² = 35%). Required 3,200 patient-years with δⁿᵢ = 1.0/100 PY.
Outcome: P(new drug rate < control + 1.0 | data) = 0.982 > 0.975. Non-inferiority declared. Approved with post-approval safety surveillance.
Oncology Adaptive with Sequential Monitoring (Survival)
Phase II adaptive for second-line lung cancer. Two-stage: n=50 (Stage 1) with futility boundary, then n=150 (Stage 2). Log-HR prior: N(0, 1).
Outcome: P(HR < 1.0) = 0.78 < 0.95 at final analysis. Primary endpoint not met, but favorable trend prompted Phase III with refined enrichment—demonstrating value of pre-planned adaptivity.
•References
Core Methodology
1. Berry SM, Carlin BP, Lee JJ, Müller P. Bayesian Adaptive Methods for Clinical Trials. CRC Press, 2010.
2. Zhou T, Ji Y. On Bayesian Sequential Clinical Trial Designs. NEJSDS 2024; 2(1):136–151.
3. Ibrahim JG, Chen MH. Power prior distributions for regression models. JASA 2000; 95(449):285–299.
4. Schmidli H, et al. Robust meta-analytic-predictive priors in clinical trials. Biometrics 2014; 70(4):1023–1032.
5. Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials. John Wiley, 2004.
Statistical Foundations
6. Gelman A, et al. Bayesian Data Analysis, 3rd ed. CRC Press, 2013.
7. Schoenfeld D. Asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 1981; 68(1):316–319.
8. Cox DR. Regression models and life-tables. JRSS-B 1972; 34(2):187–220.
Regulatory Guidance
9. FDA. Bayesian Statistical Methods in Medical Product Development: Draft Guidance. January 12, 2026.
10. FDA. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. February 2010.
11. FDA. Adaptive Design Clinical Trials for Drugs and Biologics. November 2019.
12. ICH E9(R1). Estimands and Sensitivity Analysis. November 2019.
Full 13-reference bibliography available in PDF version, including O'Hagan et al. (2006) on expert elicitation and Kass & Wasserman (1996) on formal prior selection.
Suggested Citation
Qian, Lu. (2026). Zetyra Bayesian Toolkit: A Comprehensive Suite of Validated Bayesian Calculators for Clinical Trial Design (Version 1.0). Evidence in the Wild. https://zetyra.com/bayesian-whitepaper
Companion to: Zetyra Technical White Paper v2.0 (DOI: 10.5281/zenodo.18879839)
Ready to design Bayesian trials?
Try Zetyra's validated Bayesian Toolkit calculators today.