Technical White Paper

Zetyra Bayesian Toolkit: A Comprehensive Suite of Validated Bayesian Calculators for Clinical Trial Design

Complete mathematical formulations, validation methodologies, and practical guidance for six integrated Bayesian modules addressing the FDA's evolving regulatory guidance.

Version 1.0March 2026Lu (Maggie) Qian, MS

Key Findings

248 Bayesian Tests
All passing against analytic solutions & Zhou & Ji (2024)
6 Modules
Integrated suite from prior elicitation to predictive power
5,568 Lines
Production code with conjugate closed-form posteriors
FDA 2026 Aligned
January 12, 2026 Draft Guidance on Bayesian methods

1Executive Summary

FDA Published Bayesian Guidance (January 12, 2026)

The FDA's Draft Guidance on the Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products, coupled with the February 2010 guidance for medical device trials and the ICH E9(R1) estimands framework, has catalyzed widespread adoption of Bayesian methods across therapeutic development.

This toolkit addresses the implementation gap through six integrated calculators with transparent, auditable implementations of conjugate Bayesian models.

The Bayesian Toolkit

Six integrated calculators spanning the full Bayesian trial design workflow:

  • Prior Elicitation: Three data-driven methods (quantile matching, ESS-based, historical summary) for specifying informative Beta priors
  • Bayesian Borrowing: Power Prior, Commensurate Prior, and Meta-Analytic Predictive (MAP) approaches for leveraging historical control data
  • Single-Arm Design: Sample size calculations for binary and continuous endpoints using Beta-Binomial and Normal-Normal conjugate models
  • Two-Arm Design: Comparative designs supporting superiority and non-inferiority with flexible allocation ratios
  • Sequential Monitoring: Predictive probability stopping rules for interim efficacy and futility assessments
  • Predictive Power: Bayesian posterior predictive probability for trial completion forecasting, including survival endpoints

Table 1

Bayesian Validation Summary (248 tests)

ModuleTestsKey Benchmark
Prior Elicitation22Quantile PPF accuracy
Bayesian Borrowing18Power prior, DerSimonian-Laird
Single-Arm Design26Operating characteristics
Two-Arm Design24Superiority/NI Type I error
Sequential Monitoring95Zhou & Ji (2024) boundaries
Predictive Power/Survival63MC simulation vs. analytical
Total248100% pass rate

2Bayesian Inference Foundations

The foundation of all Bayesian inference is Bayes' Theorem: the posterior distribution p(θ | D) is proportional to the likelihood p(D | θ) times the prior p(θ). The posterior summarizes all available information about θ and serves as the basis for inference, prediction, and decision-making.

Table 2

Bayesian vs. Frequentist Paradigms

AspectBayesianFrequentist
ParametersFixed but uncertain; have distributionsFixed constants
IntervalsDirect probability on parameterLong-run coverage probability
Prior informationExplicitly incorporatedOnly through design
Sequential analysisFlexible; same posterior logicRequires complex corrections
InterpretationP(θ ∈ [L,U] | data)If repeated ∞ times, 95% of CIs contain θ

2.1 Operating Characteristics

Despite philosophical differences, Bayesian trial designs are evaluated on frequentist operating characteristics to ensure regulatory acceptability. Type I error (α ≤ 0.05) and power (1 − β ≥ 0.80) are computed via Monte Carlo simulation under H₀ and H₁, as advocated by the FDA.

3Conjugate Prior Families

A prior is conjugate to a likelihood if the posterior belongs to the same distribution family. Conjugate priors yield closed-form posteriors without numerical integration—enabling sub-second computation, numerical stability, and deterministic reproducibility.

Beta-Binomial (Binary Endpoints)

Prior: Beta(α, β) → Posterior: Beta(α + x, β + n − x). ESS = α + β. Used for response rates, event proportions.

Normal-Normal (Continuous Endpoints)

Prior: N(μ₀, σ₀²) → Posterior: N(μₚₒₛₜ, σₚₒₛₜ²). Precision-weighted combination of prior and data. Used for mean treatment differences.

Log-HR Survival (Time-to-Event)

Prior: N(μ₀, τ₀²) on log(HR). Information I = d/4 (Schoenfeld, 1981). Used for survival analysis with hazard ratios.

Gamma-Poisson (Count/Rate Endpoints)

Prior: Gamma(α, β) → Posterior: Gamma(α + x, β + T). Used for rare events and incidence rate data.

4Prior Elicitation

Prior elicitation translates expert knowledge, historical data, or regulatory guidance into a formal probability distribution. The module implements three complementary methods:

Method 1: Quantile Matching

Ask experts to specify plausible ranges (e.g., “90% confident the response rate is between 15% and 45%”), then fit a Beta distribution whose quantiles match the elicited values via L-BFGS-B optimization.

Method 2: ESS-Based Priors

Directly specify the prior mean and equivalent sample size (ESS). ESS = 2 is uninformative (Beta(1,1)); ESS = 20–50 is moderately informative; ESS = 100+ means the prior dominates.

Method 3: Historical Summary Priors

Use summary statistics from historical studies via the Power Prior framework with a discount factor δ ∈ [0, 1] controlling the degree of borrowing.

Regulatory compliance: Per FDA guidance, priors must be prospectively specified, transparent, sensitivity-analyzed, and justified by clinical/historical evidence. The toolkit generates audit trails for all elicitation steps.

5Bayesian Borrowing from Historical Data

Bayesian borrowing enables efficient use of historical control data to reduce current trial sample size. The module implements three principled approaches with conflict diagnostics.

Power Prior

Discounts the historical likelihood by raising it to a power δ. Effective Prior = Beta(α₀ + δ·x₀, β₀ + δ·(n₀ − x₀)). Discount factor δ = 1.0 (full borrowing) to δ = 0.0 (no borrowing).

Commensurate Prior

Uses a commensurability parameter τ that adapts borrowing strength. The effective discount δᵉᵎᵎ = τ/(1 + τ), providing flexible shrinkage based on observed heterogeneity.

Meta-Analytic Predictive (MAP) Prior

Pools multiple historical studies using random-effects meta-analysis (DerSimonian-Laird), accounting for between-study heterogeneity via I² statistic. Optional robust mixture component for outlier protection.

Conflict Diagnostics

When interim data arrives, the toolkit assesses prior-data conflict via tail probability. Classified as: no conflict (p > 0.10), moderate conflict (0.01 < p ≤ 0.10, reduce borrowing by 50%), or severe conflict (p ≤ 0.01, recommend minimal borrowing).

6Single-Arm Bayesian Sample Size Design

Single-arm designs are common in rare diseases, oncology (Phase II), and adaptive trials. The module determines n to achieve target operating characteristics (power, Type I error) for both binary and continuous endpoints.

Design Setup (Binary)

  • Prior: Beta(α, β) from Prior Elicitation or Borrowing
  • Decision rule: Declare success if P(θ > θ₀ | data) ≥ γ
  • Operating characteristics: Type I error ≤ 0.05, Power ≥ 0.80 via 10,000+ Monte Carlo simulations
  • Prior impact: Prior weight = ESS/(ESS + n), with interpretive guidelines (<10% minimal, 10–25% moderate, >50% requires strong justification)

Worked Example: Rare Disease

Rare genetic disorder, null = 0%, alternative = 20%. Prior: Beta(1, 5) (skeptical, ESS = 6). Decision threshold γ = 0.95.

n = 34
Recommended sample size
0.048
Type I error (≤ 0.05)
0.802
Power (≥ 0.80)

7Two-Arm Bayesian Comparative Design

Supports both superiority (P(θᵀ > θᶜ | data) ≥ γ) and non-inferiority (P(θᵀ > θᶜ − δⁿᵢ | data) ≥ γ) trials with flexible allocation ratios, for both binary and continuous endpoints.

Non-Inferiority Example

New antibiotic vs. standard of care. Control prior: Beta(15, 10) (ESS=25 from historical data). NI margin: −5%. Allocation: 1:1.

120/arm
240 total subjects
0.048
Type I error
0.804
Power

8Bayesian Sequential Monitoring

Sequential designs allow interim analyses with stopping rules for efficacy or futility, potentially reducing average sample size while maintaining error control. Implements predictive probability stopping rules aligned with Zhou & Ji (2024).

Stopping Rules

  • Efficacy: Stop if P(θᵀ > θᶜ | data) ≥ γᵉᵎᵎ
  • Futility: Stop if predictive power ≤ γᶠᵘᵗ
  • Continue: Otherwise, proceed to next look

Continuous Endpoints (Analytical)

Z-score boundaries from Zhou & Ji (2024) Normal-Normal conjugate model. All boundary values validated against their Table 3.

Binary/Survival (Simulation-Based)

Boundaries determined via Monte Carlo: simulate trials, record posterior probabilities at each interim look, identify thresholds corresponding to target error rates.

Advantage over frequentist GSD: Bayesian sequential monitoring requires only posterior inference at each look (no inflation for repeated testing), naturally accommodates changing sample sizes, and provides coherent probability statements that simplify regulatory communication.

9Bayesian Predictive Power—Extended

Extends predictive power methodology to survival (time-to-event) endpoints using the log-HR Normal conjugate prior framework. At interim, computes P(declare efficacy at final | current data) by simulating remaining enrollment and events.

Survival Endpoints (Log-HR Conjugate Prior)

With d observed events, the log-HR estimate is approximately Normal with information I = d/4 (Schoenfeld, 1981). The posterior is computed by conjugacy, then predictive power is obtained via Monte Carlo simulation of remaining events to final database lock.

This module integrates with the Sequential Monitoring calculator for boundary computation at each interim look.

10Integrated Bayesian Design Workflow

Six-Stage Workflow

1

Prior Specification

Engage experts, elicit prior via quantile matching/ESS/historical data, run prior predictive checks

2

Historical Borrowing (Optional)

Apply power prior or MAP methods, pre-specify discount factor, assess prior-data conflict

3

Design

Specify decision rule, determine null/alternative, run sample size calculator

4

Operating Characteristics

Simulate 10,000+ trials under H₀ and H₁, conduct sensitivity analyses

5

Monitoring Plan

Specify interim look schedules, compute efficacy/futility boundaries, validate stopping rules

6

Statistical Analysis Plan

Finalize prior, decision rules, pre-specify sensitivity analyses, pre-register trial

Table 3

Design Recommendations by Therapeutic Area

AreaDesign TypePriorBorrowing
Rare DiseaseSingle-armInformative from external dataYes (if controlled)
Oncology (Phase II)Single-armWeak to moderateLight borrowing
CardiovascularTwo-armWeak (skeptical)Yes if historical controls
Infectious DiseaseTwo-armWeak (fast accrual)Rare (ethical)
DeviceSingle-arm/Two-armInformative from predicateYes (common)

11Regulatory Context

FDA Guidance: January 2026 Draft

The FDA's Draft Guidance on Bayesian Methods for Medical Products provides explicit recommendations:

  • Prospective Specification: All priors, decision rules, and stopping boundaries pre-specified before enrollment
  • Prior Justification: Justified by historical data, expert elicitation, or regulatory precedent
  • Operating Characteristics: Frequentist Type I error and power documented via simulation
  • Sensitivity Analysis: Results robust across reasonable alternative priors
  • Historical Borrowing: Document exchangeability, quantify ESS, assess prior-data conflict
  • Transparency: All calculations, simulation code, and results auditable

The toolkit also aligns with FDA February 2010 guidance on Bayesian methods in medical devices and ICH E9(R1) estimands framework.

12Validation Framework

The toolkit has been validated against 248 Bayesian-specific tests, supplemented by 251 additional tests covering non-Bayesian frequentist methods (total 499 tests in the validation repository).

Table 4

Bayesian Test Coverage Summary

ModuleTestsValidation RateKey Benchmark
Prior Elicitation22100%Quantile PPF accuracy
Bayesian Borrowing18100%Power prior, DerSimonian-Laird
Single-Arm Design26100%Operating characteristics
Two-Arm Design24100%Superiority/NI Type I error
Sequential Monitoring95100%Zhou & Ji 2024 boundaries
Predictive Power63100%MC simulation vs. analytical
Total Bayesian248100%

Validation Repository

Complete validation results available at:

github.com/evidenceinthewild/zetyra-validation

13Case Studies

A

Rare Disease Gene Therapy (Single-Arm)

Single-arm, open-label trial for neurodevelopmental disorder. Prior elicited via quantile matching: Beta(5.6, 13.1), ESS = 18.7. Sample size: n = 24 powered for 80% to detect 30% vs. 10%.

Outcome: 8/24 respond (33%), P(θ > 0.10) = 0.983 > 0.95. Efficacy declared. FDA approved expanded access pending Phase III.

B

Cardiovascular with Historical Borrowing

New anticoagulant NI trial vs. warfarin. MAP prior from 12 historical trials (pooled rate 2.8/100 PY, I² = 35%). Required 3,200 patient-years with δⁿᵢ = 1.0/100 PY.

Outcome: P(new drug rate < control + 1.0 | data) = 0.982 > 0.975. Non-inferiority declared. Approved with post-approval safety surveillance.

C

Oncology Adaptive with Sequential Monitoring (Survival)

Phase II adaptive for second-line lung cancer. Two-stage: n=50 (Stage 1) with futility boundary, then n=150 (Stage 2). Log-HR prior: N(0, 1).

Outcome: P(HR < 1.0) = 0.78 < 0.95 at final analysis. Primary endpoint not met, but favorable trend prompted Phase III with refined enrichment—demonstrating value of pre-planned adaptivity.

References

Core Methodology

1. Berry SM, Carlin BP, Lee JJ, Müller P. Bayesian Adaptive Methods for Clinical Trials. CRC Press, 2010.

2. Zhou T, Ji Y. On Bayesian Sequential Clinical Trial Designs. NEJSDS 2024; 2(1):136–151.

3. Ibrahim JG, Chen MH. Power prior distributions for regression models. JASA 2000; 95(449):285–299.

4. Schmidli H, et al. Robust meta-analytic-predictive priors in clinical trials. Biometrics 2014; 70(4):1023–1032.

5. Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials. John Wiley, 2004.

Statistical Foundations

6. Gelman A, et al. Bayesian Data Analysis, 3rd ed. CRC Press, 2013.

7. Schoenfeld D. Asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 1981; 68(1):316–319.

8. Cox DR. Regression models and life-tables. JRSS-B 1972; 34(2):187–220.

Regulatory Guidance

9. FDA. Bayesian Statistical Methods in Medical Product Development: Draft Guidance. January 12, 2026.

10. FDA. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. February 2010.

11. FDA. Adaptive Design Clinical Trials for Drugs and Biologics. November 2019.

12. ICH E9(R1). Estimands and Sensitivity Analysis. November 2019.

Full 13-reference bibliography available in PDF version, including O'Hagan et al. (2006) on expert elicitation and Kass & Wasserman (1996) on formal prior selection.

Suggested Citation

Qian, Lu. (2026). Zetyra Bayesian Toolkit: A Comprehensive Suite of Validated Bayesian Calculators for Clinical Trial Design (Version 1.0). Evidence in the Wild. https://zetyra.com/bayesian-whitepaper

Companion to: Zetyra Technical White Paper v2.0 (DOI: 10.5281/zenodo.18879839)

Ready to design Bayesian trials?

Try Zetyra's validated Bayesian Toolkit calculators today.