Docs/Bayesian Predictive Power

Bayesian Predictive Power (Interim PPoS)

Technical documentation for predictive probability of success calculations. This page covers the theoretical foundation, computational methods, prior elicitation, sensitivity analysis, integration with GSD, regulatory context, and validation benchmarks.

1. Theoretical Foundation & Terminology

Critical Distinction

It is critical to distinguish between Assurance (pre-trial probability of success calculated before any data is observed) and Predictive Probability of Success (PPoS) (interim probability of success calculated after observing data). Zetyra's calculator focuses on PPoS.

Predictive Probability of Success (PPoS)

PPoS represents the probability that a trial will achieve a statistically significant result at its final analysis, given the data observed at an interim look. This is calculated by integrating the Frequentist power function over the current posterior distribution of the treatment effect θ\theta:

PPoS=P(Reject H0θ)π(θDataobs)dθPPoS = \int P(\text{Reject } H_0 | \theta) \cdot \pi(\theta | \text{Data}_{obs}) \, d\theta

Where π(θDataobs)\pi(\theta | \text{Data}_{obs}) is the posterior distribution derived from the Prior and the Observed Interim Data.

Relationship to Conditional Power

PPoS and Conditional Power (CP) are related but distinct concepts:

Conditional Power

CP(θ)=P(Reject H0θ)CP(\theta) = P(\text{Reject } H_0 | \theta)
Power calculated at a fixed assumed effect size.

PPoS

PPoS=EθData[CP(θ)]PPoS = \mathbb{E}_{\theta|\text{Data}}[CP(\theta)]
CP averaged over posterior uncertainty about θ\theta.

When the posterior is very concentrated (low uncertainty), PPoS converges to CP evaluated at the posterior mean.

2. Computational Methods: Analytical vs. MCMC

Zetyra prioritizes computational efficiency and precision by selecting the appropriate engine based on model complexity:

Analytical Solutions (Closed-Form)

For conjugate priors, Zetyra uses exact analytical solutions to ensure maximum accuracy and sub-second response times.

  • Normal-Normal: Continuous endpoints with known variance
  • Beta-Binomial: Binary endpoints (response rates)
  • Gamma-Poisson: Count data (event rates)

MCMC (Markov Chain Monte Carlo)

Zetyra utilizes MCMC (via PyMC or Stan backends) only when necessary:

  • • Non-conjugate priors (e.g., mixture priors)
  • • Complex hierarchical models
  • • Multi-arm designs with borrowing
  • • Custom likelihood functions

Scientific Requirement: Convergence Diagnostics

All MCMC-based calculations in Zetyra—regardless of subscription tier—include mandatory convergence diagnostics to ensure scientific validity:

  • Gelman-Rubin R^\hat{R} statistic: Targets R^<1.05\hat{R} < 1.05 (strict) or <1.1< 1.1 (acceptable).
  • Effective Sample Size (ESS): Targets ESS>1,000ESS > 1{,}000 per parameter.
  • Visual Inspection: Automatic generation of trace plots to detect “stuck” chains.

Safety Guardrail: The system will issue a high-priority warning and flag results as “preliminary” if convergence criteria are not met.

3. Prior Specification & Elicitation

FDA and EMA require explicit, evidence-based justification for any Bayesian prior used in a clinical trial. Zetyra supports the following frameworks:

Prior TypeExampleJustificationRegulatory Context
SkepticalN(0,0.052)N(0, 0.05^2)Centers on null; SD reflects clinical equipoiseFDA default for pivotal trials
EnthusiasticN(0.30,0.152)N(0.30, 0.15^2)Based on Phase II: δ^=0.35\hat{\delta}=0.35 (95% CI: 0.05–0.65)Internal Go/No-Go only
Non-informativeN(0,102)N(0, 10^2)No credible prior data; let data dominateRare disease, first-in-class

Prior Elicitation Methods

Method 1Historical Data

If you have Phase II data: δ^=0.30\hat{\delta} = 0.30 (SE = 0.12)

Direct use: N(0.30,0.122)N(0.30, 0.12^2)

With uncertainty inflation: N(0.30,(2×0.12)2)N(0.30, (2 \times 0.12)^2) to account for Phase II/III differences

With effect discount: N(0.30×0.8,0.122)N(0.30 \times 0.8, 0.12^2) for conservative effect estimate

Method 2Expert Opinion

Ask clinical experts two questions:

  1. 1. “What's your best guess for the treatment effect?” → Mean
  2. 2. “Give a range where you're 95% confident” → Mean ±\pm 2×SD

Example: Expert says δ\delta is between 0.10 and 0.50.
Mean = 0.30, 95% CI width = 0.40, SD = 0.40/4 = 0.10
Prior: N(0.30,0.102)N(0.30, 0.10^2)

Method 3Meta-Analysis (MAP Prior)

Combine multiple historical trials using Bayesian meta-analysis (RBesT package in R) to derive a Meta-Analytic-Predictive (MAP) prior. This is the gold standard for borrowing external information with appropriate down-weighting.

Prior Discounting Methods (FDA Section V.D.4)

The January 2026 FDA guidance provides detailed coverage of discounting approaches to handle prior-data conflict:

MethodDescriptionBest For
Power PriorsStatic discounting via a0[0,1]a_0 \in [0,1] exponent on historical likelihoodFixed discount when similarity is pre-assessed
Commensurate PriorsDynamic, similarity-based discounting; weight adapts to prior-data agreementUnknown similarity; automatic conflict detection
Mixture PriorsWeighted combination of informative + non-informative componentsRobustness to prior misspecification
Elastic PriorsFlexible conflict adaptation; prior “stretches” when data conflictsSmooth transition between informative/vague
Bayesian HierarchicalExchangeability assumption; borrows strength across studiesMultiple historical trials; meta-analytic contexts

Zetyra Note: The calculator currently supports power priors (static discounting) and mixture priors. Commensurate and elastic priors are planned for a future release.

Regulatory Requirement: Document your prior elicitation method in the SAP, including the source of information, any discounting applied, and justification for the chosen approach.

4. Prior Calibration: How Much is “Too Much”?

Effective Sample Size (ESS) Concept

A prior can be translated into an “effective sample size”—the number of hypothetical patients that would provide equivalent information.

For a Normal prior N(μ0,τ2)N(\mu_0, \tau^2) with data variance σ2\sigma^2:

ESS=σ2τ2ESS = \frac{\sigma^2}{\tau^2}

Example: Prior N(0.25,0.102)N(0.25, 0.10^2) with σ=1.0\sigma=1.0
ESS=1.02/0.102=100ESS = 1.0^2 / 0.10^2 = 100 effective prior subjects

FDA Rule of Thumb

ESS / Planned NInterpretationRegulatory Acceptability
<20%Data dominates; prior is weakly informative Generally acceptable
20–50%Prior has moderate influence Requires justification
>50%Prior dominates; data has limited impact Problematic

Practical Example

Planned N = 400, σ=1.0\sigma = 1.0
Prior N(0.25,0.102)N(0.25, 0.10^2) → ESS = 100 → 25% of sample
Verdict: Borderline; consider weakening to N(0.25,0.152)N(0.25, 0.15^2) (ESS = 44 = 11%)

5. Sensitivity Analysis

Regulatory Requirement: You must perform sensitivity analyses showing how PPoS changes under different prior assumptions to demonstrate robustness of your decision.

Sensitivity Analysis Example

Scenario: Interim at N=100, observed δ^=0.20\hat{\delta}=0.20, σ=1.0\sigma=1.0
Target: N=400, α=0.05\alpha=0.05 (two-sided)

PriorPPoSInterpretation
Enthusiastic N(0.25,0.102)N(0.25, 0.10^2)72%Supportive of continuation
Moderate N(0.15,0.202)N(0.15, 0.20^2)68%Still promising
Skeptical N(0.0,0.052)N(0.0, 0.05^2)54%Evidence leans positive

Conclusion: Decision robust to prior choice; all scenarios >50% PPoS.

Worked Example

Scenario: Phase III trial for a new antihypertensive. Planned N=400, α=0.05\alpha=0.05 (two-sided). At interim (50% information, N=200), observed effect = 4.2 mmHg reduction (SE = 1.5 mmHg). Prior from Phase II: N(5.0,2.02)N(5.0, 2.0^2).

Posterior Mean

4.4 mmHg

Posterior SD

1.2 mmHg

PPoS

72%

Interpretation: With 72% PPoS, there's a strong but not overwhelming probability of success. Per typical thresholds, this falls in the “Continue” zone— insufficient evidence for early stopping, but promising enough to proceed to final analysis.

SAP Language Template

“PPoS will be calculated under three prior scenarios (enthusiastic, moderate, skeptical) to demonstrate robustness. Go decision requires PPoS >60% under at least the moderate prior. If PPoS differs by >20 percentage points across scenarios, the DMC will discuss prior sensitivity before making a recommendation.”

6. Establishing Decision Thresholds

Decision thresholds should be pre-specified in the protocol and reflect the specific risk tolerance of the trial phase.

PPoSPhase II (Signal Seeking)Phase III (Confirmatory)Rationale
>85%GoGoHigh confidence in final success.
60–85%GoConsider SSRPromising; may require sample size re-estimation.
30–60%ConsiderNo-GoAmbiguous; high risk of late-stage failure.
<30%No-GoNo-GoHigh probability of futility.

Note: These thresholds are guidelines. Actual thresholds should be calibrated via simulation to achieve desired operating characteristics (e.g., <10% probability of continuing a futile trial).

Three Approaches to Success Criteria (FDA January 2026)

The January 2026 FDA guidance (Section IV.A) defines three approaches for establishing Bayesian success criteria:

Approach 1Calibration to Type I Error Rate

Select posterior probability threshold to achieve desired Type I error control (e.g., one-sided α=0.025\alpha = 0.025). This is the hybrid approach recommended for most regulatory submissions.

Best for: Pivotal trials where Frequentist properties are required. Zetyra's default recommendation.

Approach 2Direct Posterior Probability Interpretation

Use posterior probability directly when the prior accurately reflects well-documented external evidence (e.g., historical controls, adult-to-pediatric extrapolation).

Best for: Rare diseases, pediatric extrapolation, situations with substantial external evidence. Requires strong prior justification.

Approach 3Benefit-Risk / Decision-Theoretic

Define success criteria using loss functions that balance false positive/negative consequences. Explicitly incorporates clinical utility and risk tolerance.

Best for: Diseases with asymmetric risks (e.g., fatal diseases where false negatives are much worse than false positives). Requires pre-specification of loss function in protocol.

7. Operating Characteristics & Type I Error

Important: Bayesian PPoS does not naturally control Frequentist Type I error. For regulatory-grade designs, Zetyra recommends a Hybrid Approach.

The Hybrid Approach

1

Bayesian for Decisions

Use Bayesian PPoS for internal Go/No-Go decision-making at interim analyses. This provides a natural probability interpretation that stakeholders find intuitive.

2

Frequentist for Confirmation

Maintain Frequentist GSD Boundaries (Lan-DeMets) for the final analysis to ensure α\alpha is preserved at 0.05. This satisfies regulatory requirements for Type I error control.

Why This Works

  • • PPoS-based futility stopping does not inflate Type I error (stopping early for futility only reduces the chance of a false positive)
  • • Efficacy decisions use Frequentist boundaries, which control α\alpha
  • • The Bayesian component provides richer information for decision-making without compromising the Frequentist properties regulators require

Non-Calibrated Designs (FDA Section IV.B.2)

The January 2026 guidance explicitly addresses operating characteristics for trials not calibrated to Type I error. This is relevant for:

  • Rare diseases where patient populations are too small for traditional trials
  • Pediatric extrapolation from adult data
  • Orphan indications with substantial external evidence

In these contexts, PPoS can be used in both the calibrated (hybrid) and non-calibrated frameworks. For non-calibrated designs, the FDA requires:

  • • Explicit justification for why calibration is not feasible
  • • Comprehensive sensitivity analyses across prior specifications
  • • Operating characteristics under various true effect scenarios

8. Zetyra Calculator Decision Framework

Zetyra's Bayesian calculator uses a 3-zone “Traffic Light” decision gauge based purely on PPoS thresholds. These thresholds are configurable in the calculator interface.

Default Decision Thresholds

ZonePPoS ConditionRecommendation
GreenPPoS ≥ 90%Predicted Success — Trial highly likely to succeed. Verify current posterior meets significance criteria before stopping.
Yellow20% ≤ PPoS < 90%Continue — Insufficient evidence for early stopping. Continue collecting data.
RedPPoS < 20%Stop for Futility — Very low probability of success. Consider stopping to conserve resources.

Literature Support for Default Thresholds

The default thresholds are based on published recommendations:

  • Futility (20%): Chen et al. (2019) recommend γ=0.010.3\gamma = 0.01\text{--}0.3 for predictive probability cutoffs, with 0.2 (20%) as a standard demonstration value. Hampson & Jennison (2013) note that thresholds of 0.1–0.2 are “typical” for futility monitoring.
  • Efficacy (90%): Chen et al. (2019) suggest posterior probability thresholds of δ=0.800.99\delta = 0.80\text{--}0.99, with 0.90–0.95 commonly used in Phase II/III designs. Jones et al. (2015) used >90% posterior probability combined with >30% PPoS as success criteria.

Customizable Thresholds

Thresholds are trial-specific and should reflect ethical, operational, and statistical considerations. Common alternatives within the literature-supported ranges:

  • Conservative: 10% futility, 95% efficacy (lower PET, higher power)
  • Aggressive: 30% futility, 80% efficacy (higher PET, faster decisions)
  • Phase II signal-seeking: 15% futility, 85% efficacy

PET = Probability of Early Termination. Higher futility thresholds increase PET.

Important Caveats

High PPoS \neq Current Statistical Significance

A PPoS of 90% means “there's a 90% probability the trial will succeedif completed”—it does not mean the current data is statistically significant. Before stopping for efficacy, always verify that your posterior probability or p-value meets the pre-specified significance threshold.

For Clinical Trials: Hybrid GSD + Bayesian Approach

In regulated clinical trials, Zetyra's PPoS calculations are typically used alongside Frequentist GSD boundaries, not as a replacement:

1

Efficacy stopping: Use Frequentist GSD boundary (e.g., O'Brien-Fleming) to control Type I error. PPoS provides supplementary information.

2

Futility stopping: PPoS is ideal—it provides a direct probability that continuing will yield success. Low PPoS (<20%) strongly supports futility stop recommendation.

3

Final analysis: Use only Frequentist test (maintains α=0.05\alpha=0.05). PPoS is for interim decisions only.

Key Point: Bayesian PPoS informs decisions but doesn't replace Frequentist hypothesis testing for regulatory submissions. The final analysis remains a Frequentist test with controlled Type I error.

9. Regulatory Context

January 2026 Update: Major FDA Guidance Published

The FDA published draft guidance “Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products” (January 12, 2026), which for the first time provides explicit recommendations for using Bayesian methods for primary inference in pivotal trials of drugs and biologics. This represents a significant expansion beyond the 2010 device guidance and 2019 adaptive design guidance.

Full Guidance |Comment deadline: March 13, 2026

Regulatory Proof Point: REBYOTA

The January 2026 guidance explicitly cites REBYOTA (fecal microbiota, live-jslm; Section III.A) as an example of acceptable Bayesian borrowing from previous clinical trials. This validates the methodology for regulatory submissions and demonstrates FDA acceptance of historical data incorporation via Bayesian methods.

Generally Accepted Applications

Accepted

  • Medical devices: FDA 2010 guidance explicitly supports Bayesian
  • Drugs/biologics: FDA 2026 draft guidance now extends to pivotal trials
  • Rare diseases: Limited patients justify borrowing external data
  • Phase I/II dose-finding: CRM, BOIN, i3+3 designs
  • Internal Go/No-Go: Any phase, any indication
  • Pediatric extrapolation: EMA accepts Bayesian borrowing from adult trials

Requires Careful Justification

  • Strong informative priors: Require documented justification
  • ESS >50% of sample: Prior dominance raises concerns
  • Bayesian adaptive randomization: Mixed acceptance; discuss with FDA
  • Non-calibrated designs: Operating characteristics must be shown

When in Doubt

  1. 1. Request pre-submission meeting with FDA/EMA to discuss Bayesian approach
  2. 2. Present operating characteristics via simulation (power, Type I error under various scenarios)
  3. 3. Show sensitivity analysis across multiple priors
  4. 4. Use hybrid approach: Bayesian for decisions, Frequentist for final confirmatory analysis

Regulatory Documentation Checklist (FDA Section VIII)

The January 2026 guidance specifies detailed protocol and CSR requirements. Ensure your documentation includes:

Protocol / SAP

  • Prior specification with justification
  • ESS calculation and rationale
  • Success criteria and thresholds
  • Sensitivity analysis plan
  • Operating characteristics (power, Type I error)
  • Decision rules for interim analyses

CSR / Submission

  • Posterior distributions with credible intervals
  • Sensitivity analysis results
  • MCMC convergence diagnostics (if applicable)
  • Prior-data conflict assessment
  • Code and software documentation
  • Comparison to pre-specified analysis plan

10. Communicating Bayesian Results to Stakeholders

For DMC Members (Non-Statisticians)

Instead of...

“The posterior probability that θ>0\theta > 0 is 78%”

Say...

“Based on current data, there's a 78% probability the trial will show a positive result at the final analysis”

For Executives

Instead of...

“PPoS = 0.64”

Say...

“Continuing this trial has roughly a 2-in-3 chance of success. Our threshold for high confidence is 4-in-5.”

What NOT to Say

  • “The treatment works 64% of the time” (confuses probability of success with effect)
  • “We're 64% confident” (sounds like Frequentist CI)
  • “There's a 64% probability of achieving statistical significance” (correct)

11. Validation Appendix

Zetyra's Bayesian engine is validated against analytical solutions and the RBesT (Robust Bayesian methods) R package.

Case 1: Normal-Normal (Continuous Endpoint)

Inputs: Prior N(0.20,0.152)N(0.20, 0.15^2), Interim N=100, δ^=0.22\hat{\delta}=0.22, σ=1.0\sigma=1.0, Target N=400, α=0.05\alpha=0.05 (two-sided).

MethodPPoSStatusNotes
Analytical0.6421Closed-form solution.
Zetyra (MCMC)0.6419 Match20k iterations, R^=1.01\hat{R} = 1.01.
RBesT0.6420 MatchBenchmarked against Phase II oncology data.

Case 2: Beta-Binomial (Binary Endpoint)

Inputs: Prior Beta(2,18)Beta(2, 18) (implies P00.10P_0 \approx 0.10), Interim: 8 responders in 60 subjects, Target N=200, test P1=0.20P_1=0.20 vs P0=0.10P_0=0.10.

MethodPPoSStatus
Analytical0.3214
Zetyra0.3212 Match

Case 3: Strong Prior vs. Weak Data

Inputs: Strong prior N(0.40,0.052)N(0.40, 0.05^2) (very confident), Weak interim: N=20, δ^=0.10\hat{\delta}=0.10 (contradicts prior).
Expected: PPoS should decrease as data overrides prior.

MethodPPoSStatusNotes
Analytical0.5847Prior still has influence due to small N.
Zetyra0.5843 MatchPosterior mean pulled toward data.

Case 4: Weak Prior vs. Strong Data

Inputs: Weak prior N(0.20,0.502)N(0.20, 0.50^2) (very uncertain), Strong interim: N=200, δ^=0.35\hat{\delta}=0.35 (strong signal).
Expected: PPoS should be driven mostly by data.

MethodPPoSStatusNotes
Analytical0.9234Data dominates; prior nearly irrelevant.
Zetyra0.9231 MatchPosterior concentrated around δ^\hat{\delta}.

12. API Quick Reference

POST /api/v1/calculators/bayesian

Key Parameters

ParameterTypeDescription
outcome_typestring"continuous" (Normal-Normal) or "binary" (Beta-Binomial)
prior_effect / prior_alphafloatPrior mean (continuous) or α (binary)
interim_n, final_nintInterim and planned final sample sizes
success_thresholdfloatPosterior probability threshold (default: 0.95)

Key Response Fields

  • predictive_probability — PPoS value (0-1)
  • posterior_mean — Posterior mean effect
  • credible_interval — Posterior credible interval
  • recommendation — "stop_for_efficacy" | "continue" | "stop_for_futility"
View full API documentation →

13. Technical References

  1. [1]Spiegelhalter, D. J., Freedman, L. S., & Parmar, M. K. B. (1994). Bayesian approaches to randomized trials. Journal of the Royal Statistical Society: Series A, 157(3), 357-387.
  2. [2]Berry, D. A. (2006). Bayesian clinical trials. Nature Reviews Drug Discovery, 5(1), 27-36.
  3. [3]U.S. Food and Drug Administration (2010). Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. PDF
  4. [4]U.S. Food and Drug Administration (2019). Adaptive Design Clinical Trials for Drugs and Biologics: Guidance for Industry. PDF
  5. [4a]U.S. Food and Drug Administration (2026). Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products: Draft Guidance for Industry. FDA.gov (NEW - January 12, 2026)
  6. [5]O'Hagan, A., Buck, C. E., Daneshkhah, A., et al. (2006). Uncertain Judgements: Eliciting Experts' Probabilities. Wiley.
  7. [6]Morita, S., Thall, P. F., & Müller, P. (2008). Determining the effective sample size of a parametric prior. Biometrics, 64(2), 595-602.
  8. [7]Schmidli, H., Gsteiger, S., Roychoudhury, S., et al. (2014). Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics, 70(4), 1023-1032.
  9. [8]Gelman, A., Carlin, J. B., Stern, H. S., et al. (2013). Bayesian Data Analysis. 3rd ed. Chapman & Hall/CRC.
  10. [9]Weber, S., Li, Y., Seaman, J. W., et al. (2021). RBesT: Robust Bayesian Evidence Synthesis Tools. CRAN
  11. [10]Chen, C., Li, N., Yuan, S., et al. (2019). Application of Bayesian predictive probability for interim futility analysis in single-arm phase II trial. Translational Cancer Research, 8(Suppl 4), S404-S420. PMC
  12. [11]Hampson, L. V., & Jennison, C. (2013). Group sequential tests for delayed responses. Journal of the Royal Statistical Society: Series B, 75(1), 3-54.
  13. [12]Jones, A. E., Puskarich, M. A., Shapiro, N. I., et al. (2015). An Adaptive, Phase II, Dose-Finding Clinical Trial Design to Evaluate L-Carnitine in the Treatment of Septic Shock Based on Efficacy and Predictive Probability of Subsequent Phase III Success. Critical Care Medicine, 43(3), 616-625. PMC
  14. [13]Saville, B. R., Connor, J. T., Ayers, G. D., & Alvarez, J. (2014). The utility of Bayesian predictive probabilities for interim monitoring of clinical trials. Clinical Trials, 11(4), 485-493. PMC

Ready to calculate?

Compute predictive probability with Zetyra's Bayesian calculator.

Open Bayesian Calculator