Bayesian Predictive Power (Interim PPoS)
Technical documentation for predictive probability of success calculations. This page covers the theoretical foundation, computational methods, prior elicitation, sensitivity analysis, integration with GSD, regulatory context, and validation benchmarks.
Contents
1. Theoretical Foundation & Terminology
Critical Distinction
It is critical to distinguish between Assurance (pre-trial probability of success calculated before any data is observed) and Predictive Probability of Success (PPoS) (interim probability of success calculated after observing data). Zetyra's calculator focuses on PPoS.
Predictive Probability of Success (PPoS)
PPoS represents the probability that a trial will achieve a statistically significant result at its final analysis, given the data observed at an interim look. This is calculated by integrating the Frequentist power function over the current posterior distribution of the treatment effect :
Where is the posterior distribution derived from the Prior and the Observed Interim Data.
Relationship to Conditional Power
PPoS and Conditional Power (CP) are related but distinct concepts:
Conditional Power
Power calculated at a fixed assumed effect size.
PPoS
CP averaged over posterior uncertainty about .
When the posterior is very concentrated (low uncertainty), PPoS converges to CP evaluated at the posterior mean.
2. Computational Methods: Analytical vs. MCMC
Zetyra prioritizes computational efficiency and precision by selecting the appropriate engine based on model complexity:
Analytical Solutions (Closed-Form)
For conjugate priors, Zetyra uses exact analytical solutions to ensure maximum accuracy and sub-second response times.
- • Normal-Normal: Continuous endpoints with known variance
- • Beta-Binomial: Binary endpoints (response rates)
- • Gamma-Poisson: Count data (event rates)
MCMC (Markov Chain Monte Carlo)
Zetyra utilizes MCMC (via PyMC or Stan backends) only when necessary:
- • Non-conjugate priors (e.g., mixture priors)
- • Complex hierarchical models
- • Multi-arm designs with borrowing
- • Custom likelihood functions
Scientific Requirement: Convergence Diagnostics
All MCMC-based calculations in Zetyra—regardless of subscription tier—include mandatory convergence diagnostics to ensure scientific validity:
- Gelman-Rubin statistic: Targets (strict) or (acceptable).
- Effective Sample Size (ESS): Targets per parameter.
- Visual Inspection: Automatic generation of trace plots to detect “stuck” chains.
Safety Guardrail: The system will issue a high-priority warning and flag results as “preliminary” if convergence criteria are not met.
3. Prior Specification & Elicitation
FDA and EMA require explicit, evidence-based justification for any Bayesian prior used in a clinical trial. Zetyra supports the following frameworks:
| Prior Type | Example | Justification | Regulatory Context |
|---|---|---|---|
| Skeptical | Centers on null; SD reflects clinical equipoise | FDA default for pivotal trials | |
| Enthusiastic | Based on Phase II: (95% CI: 0.05–0.65) | Internal Go/No-Go only | |
| Non-informative | No credible prior data; let data dominate | Rare disease, first-in-class |
Prior Elicitation Methods
Method 1Historical Data
If you have Phase II data: (SE = 0.12)
Direct use:
With uncertainty inflation: to account for Phase II/III differences
With effect discount: for conservative effect estimate
Method 2Expert Opinion
Ask clinical experts two questions:
- 1. “What's your best guess for the treatment effect?” → Mean
- 2. “Give a range where you're 95% confident” → Mean 2×SD
Example: Expert says is between 0.10 and 0.50.
Mean = 0.30, 95% CI width = 0.40, SD = 0.40/4 = 0.10
Prior:
Method 3Meta-Analysis (MAP Prior)
Combine multiple historical trials using Bayesian meta-analysis (RBesT package in R) to derive a Meta-Analytic-Predictive (MAP) prior. This is the gold standard for borrowing external information with appropriate down-weighting.
Prior Discounting Methods (FDA Section V.D.4)
The January 2026 FDA guidance provides detailed coverage of discounting approaches to handle prior-data conflict:
| Method | Description | Best For |
|---|---|---|
| Power Priors | Static discounting via exponent on historical likelihood | Fixed discount when similarity is pre-assessed |
| Commensurate Priors | Dynamic, similarity-based discounting; weight adapts to prior-data agreement | Unknown similarity; automatic conflict detection |
| Mixture Priors | Weighted combination of informative + non-informative components | Robustness to prior misspecification |
| Elastic Priors | Flexible conflict adaptation; prior “stretches” when data conflicts | Smooth transition between informative/vague |
| Bayesian Hierarchical | Exchangeability assumption; borrows strength across studies | Multiple historical trials; meta-analytic contexts |
Zetyra Note: The calculator currently supports power priors (static discounting) and mixture priors. Commensurate and elastic priors are planned for a future release.
Regulatory Requirement: Document your prior elicitation method in the SAP, including the source of information, any discounting applied, and justification for the chosen approach.
4. Prior Calibration: How Much is “Too Much”?
Effective Sample Size (ESS) Concept
A prior can be translated into an “effective sample size”—the number of hypothetical patients that would provide equivalent information.
For a Normal prior with data variance :
Example: Prior with
effective prior subjects
FDA Rule of Thumb
| ESS / Planned N | Interpretation | Regulatory Acceptability |
|---|---|---|
| <20% | Data dominates; prior is weakly informative | Generally acceptable |
| 20–50% | Prior has moderate influence | Requires justification |
| >50% | Prior dominates; data has limited impact | Problematic |
Practical Example
Planned N = 400,
Prior → ESS = 100 → 25% of sample
Verdict: Borderline; consider weakening to (ESS = 44 = 11%)
5. Sensitivity Analysis
Regulatory Requirement: You must perform sensitivity analyses showing how PPoS changes under different prior assumptions to demonstrate robustness of your decision.
Sensitivity Analysis Example
Scenario: Interim at N=100, observed ,
Target: N=400, (two-sided)
| Prior | PPoS | Interpretation |
|---|---|---|
| Enthusiastic | 72% | Supportive of continuation |
| Moderate | 68% | Still promising |
| Skeptical | 54% | Evidence leans positive |
Conclusion: Decision robust to prior choice; all scenarios >50% PPoS.
Worked Example
Scenario: Phase III trial for a new antihypertensive. Planned N=400, (two-sided). At interim (50% information, N=200), observed effect = 4.2 mmHg reduction (SE = 1.5 mmHg). Prior from Phase II: .
Posterior Mean
4.4 mmHg
Posterior SD
1.2 mmHg
PPoS
72%
Interpretation: With 72% PPoS, there's a strong but not overwhelming probability of success. Per typical thresholds, this falls in the “Continue” zone— insufficient evidence for early stopping, but promising enough to proceed to final analysis.
SAP Language Template
“PPoS will be calculated under three prior scenarios (enthusiastic, moderate, skeptical) to demonstrate robustness. Go decision requires PPoS >60% under at least the moderate prior. If PPoS differs by >20 percentage points across scenarios, the DMC will discuss prior sensitivity before making a recommendation.”
6. Establishing Decision Thresholds
Decision thresholds should be pre-specified in the protocol and reflect the specific risk tolerance of the trial phase.
| PPoS | Phase II (Signal Seeking) | Phase III (Confirmatory) | Rationale |
|---|---|---|---|
| >85% | Go | Go | High confidence in final success. |
| 60–85% | Go | Consider SSR | Promising; may require sample size re-estimation. |
| 30–60% | Consider | No-Go | Ambiguous; high risk of late-stage failure. |
| <30% | No-Go | No-Go | High probability of futility. |
Note: These thresholds are guidelines. Actual thresholds should be calibrated via simulation to achieve desired operating characteristics (e.g., <10% probability of continuing a futile trial).
Three Approaches to Success Criteria (FDA January 2026)
The January 2026 FDA guidance (Section IV.A) defines three approaches for establishing Bayesian success criteria:
Approach 1Calibration to Type I Error Rate
Select posterior probability threshold to achieve desired Type I error control (e.g., one-sided ). This is the hybrid approach recommended for most regulatory submissions.
Best for: Pivotal trials where Frequentist properties are required. Zetyra's default recommendation.
Approach 2Direct Posterior Probability Interpretation
Use posterior probability directly when the prior accurately reflects well-documented external evidence (e.g., historical controls, adult-to-pediatric extrapolation).
Best for: Rare diseases, pediatric extrapolation, situations with substantial external evidence. Requires strong prior justification.
Approach 3Benefit-Risk / Decision-Theoretic
Define success criteria using loss functions that balance false positive/negative consequences. Explicitly incorporates clinical utility and risk tolerance.
Best for: Diseases with asymmetric risks (e.g., fatal diseases where false negatives are much worse than false positives). Requires pre-specification of loss function in protocol.
7. Operating Characteristics & Type I Error
Important: Bayesian PPoS does not naturally control Frequentist Type I error. For regulatory-grade designs, Zetyra recommends a Hybrid Approach.
The Hybrid Approach
Bayesian for Decisions
Use Bayesian PPoS for internal Go/No-Go decision-making at interim analyses. This provides a natural probability interpretation that stakeholders find intuitive.
Frequentist for Confirmation
Maintain Frequentist GSD Boundaries (Lan-DeMets) for the final analysis to ensure is preserved at 0.05. This satisfies regulatory requirements for Type I error control.
Why This Works
- • PPoS-based futility stopping does not inflate Type I error (stopping early for futility only reduces the chance of a false positive)
- • Efficacy decisions use Frequentist boundaries, which control
- • The Bayesian component provides richer information for decision-making without compromising the Frequentist properties regulators require
Non-Calibrated Designs (FDA Section IV.B.2)
The January 2026 guidance explicitly addresses operating characteristics for trials not calibrated to Type I error. This is relevant for:
- • Rare diseases where patient populations are too small for traditional trials
- • Pediatric extrapolation from adult data
- • Orphan indications with substantial external evidence
In these contexts, PPoS can be used in both the calibrated (hybrid) and non-calibrated frameworks. For non-calibrated designs, the FDA requires:
- • Explicit justification for why calibration is not feasible
- • Comprehensive sensitivity analyses across prior specifications
- • Operating characteristics under various true effect scenarios
8. Zetyra Calculator Decision Framework
Zetyra's Bayesian calculator uses a 3-zone “Traffic Light” decision gauge based purely on PPoS thresholds. These thresholds are configurable in the calculator interface.
Default Decision Thresholds
| Zone | PPoS Condition | Recommendation |
|---|---|---|
| Green | PPoS ≥ 90% | Predicted Success — Trial highly likely to succeed. Verify current posterior meets significance criteria before stopping. |
| Yellow | 20% ≤ PPoS < 90% | Continue — Insufficient evidence for early stopping. Continue collecting data. |
| Red | PPoS < 20% | Stop for Futility — Very low probability of success. Consider stopping to conserve resources. |
Literature Support for Default Thresholds
The default thresholds are based on published recommendations:
- Futility (20%): Chen et al. (2019) recommend for predictive probability cutoffs, with 0.2 (20%) as a standard demonstration value. Hampson & Jennison (2013) note that thresholds of 0.1–0.2 are “typical” for futility monitoring.
- Efficacy (90%): Chen et al. (2019) suggest posterior probability thresholds of , with 0.90–0.95 commonly used in Phase II/III designs. Jones et al. (2015) used >90% posterior probability combined with >30% PPoS as success criteria.
Customizable Thresholds
Thresholds are trial-specific and should reflect ethical, operational, and statistical considerations. Common alternatives within the literature-supported ranges:
- • Conservative: 10% futility, 95% efficacy (lower PET, higher power)
- • Aggressive: 30% futility, 80% efficacy (higher PET, faster decisions)
- • Phase II signal-seeking: 15% futility, 85% efficacy
PET = Probability of Early Termination. Higher futility thresholds increase PET.
Important Caveats
High PPoS Current Statistical Significance
A PPoS of 90% means “there's a 90% probability the trial will succeedif completed”—it does not mean the current data is statistically significant. Before stopping for efficacy, always verify that your posterior probability or p-value meets the pre-specified significance threshold.
For Clinical Trials: Hybrid GSD + Bayesian Approach
In regulated clinical trials, Zetyra's PPoS calculations are typically used alongside Frequentist GSD boundaries, not as a replacement:
Efficacy stopping: Use Frequentist GSD boundary (e.g., O'Brien-Fleming) to control Type I error. PPoS provides supplementary information.
Futility stopping: PPoS is ideal—it provides a direct probability that continuing will yield success. Low PPoS (<20%) strongly supports futility stop recommendation.
Final analysis: Use only Frequentist test (maintains ). PPoS is for interim decisions only.
Key Point: Bayesian PPoS informs decisions but doesn't replace Frequentist hypothesis testing for regulatory submissions. The final analysis remains a Frequentist test with controlled Type I error.
9. Regulatory Context
January 2026 Update: Major FDA Guidance Published
The FDA published draft guidance “Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products” (January 12, 2026), which for the first time provides explicit recommendations for using Bayesian methods for primary inference in pivotal trials of drugs and biologics. This represents a significant expansion beyond the 2010 device guidance and 2019 adaptive design guidance.
Regulatory Proof Point: REBYOTA
The January 2026 guidance explicitly cites REBYOTA (fecal microbiota, live-jslm; Section III.A) as an example of acceptable Bayesian borrowing from previous clinical trials. This validates the methodology for regulatory submissions and demonstrates FDA acceptance of historical data incorporation via Bayesian methods.
Generally Accepted Applications
Accepted
- • Medical devices: FDA 2010 guidance explicitly supports Bayesian
- • Drugs/biologics: FDA 2026 draft guidance now extends to pivotal trials
- • Rare diseases: Limited patients justify borrowing external data
- • Phase I/II dose-finding: CRM, BOIN, i3+3 designs
- • Internal Go/No-Go: Any phase, any indication
- • Pediatric extrapolation: EMA accepts Bayesian borrowing from adult trials
Requires Careful Justification
- • Strong informative priors: Require documented justification
- • ESS >50% of sample: Prior dominance raises concerns
- • Bayesian adaptive randomization: Mixed acceptance; discuss with FDA
- • Non-calibrated designs: Operating characteristics must be shown
When in Doubt
- 1. Request pre-submission meeting with FDA/EMA to discuss Bayesian approach
- 2. Present operating characteristics via simulation (power, Type I error under various scenarios)
- 3. Show sensitivity analysis across multiple priors
- 4. Use hybrid approach: Bayesian for decisions, Frequentist for final confirmatory analysis
Regulatory Documentation Checklist (FDA Section VIII)
The January 2026 guidance specifies detailed protocol and CSR requirements. Ensure your documentation includes:
Protocol / SAP
- Prior specification with justification
- ESS calculation and rationale
- Success criteria and thresholds
- Sensitivity analysis plan
- Operating characteristics (power, Type I error)
- Decision rules for interim analyses
CSR / Submission
- Posterior distributions with credible intervals
- Sensitivity analysis results
- MCMC convergence diagnostics (if applicable)
- Prior-data conflict assessment
- Code and software documentation
- Comparison to pre-specified analysis plan
10. Communicating Bayesian Results to Stakeholders
For DMC Members (Non-Statisticians)
Instead of...
“The posterior probability that is 78%”
Say...
“Based on current data, there's a 78% probability the trial will show a positive result at the final analysis”
For Executives
Instead of...
“PPoS = 0.64”
Say...
“Continuing this trial has roughly a 2-in-3 chance of success. Our threshold for high confidence is 4-in-5.”
What NOT to Say
- “The treatment works 64% of the time” (confuses probability of success with effect)
- “We're 64% confident” (sounds like Frequentist CI)
- “There's a 64% probability of achieving statistical significance” (correct)
11. Validation Appendix
Zetyra's Bayesian engine is validated against analytical solutions and the RBesT (Robust Bayesian methods) R package.
Case 1: Normal-Normal (Continuous Endpoint)
Inputs: Prior , Interim N=100, , , Target N=400, (two-sided).
| Method | PPoS | Status | Notes |
|---|---|---|---|
| Analytical | 0.6421 | — | Closed-form solution. |
| Zetyra (MCMC) | 0.6419 | Match | 20k iterations, . |
| RBesT | 0.6420 | Match | Benchmarked against Phase II oncology data. |
Case 2: Beta-Binomial (Binary Endpoint)
Inputs: Prior (implies ), Interim: 8 responders in 60 subjects, Target N=200, test vs .
| Method | PPoS | Status |
|---|---|---|
| Analytical | 0.3214 | — |
| Zetyra | 0.3212 | Match |
Case 3: Strong Prior vs. Weak Data
Inputs: Strong prior (very confident), Weak interim: N=20, (contradicts prior).
Expected: PPoS should decrease as data overrides prior.
| Method | PPoS | Status | Notes |
|---|---|---|---|
| Analytical | 0.5847 | — | Prior still has influence due to small N. |
| Zetyra | 0.5843 | Match | Posterior mean pulled toward data. |
Case 4: Weak Prior vs. Strong Data
Inputs: Weak prior (very uncertain), Strong interim: N=200, (strong signal).
Expected: PPoS should be driven mostly by data.
| Method | PPoS | Status | Notes |
|---|---|---|---|
| Analytical | 0.9234 | — | Data dominates; prior nearly irrelevant. |
| Zetyra | 0.9231 | Match | Posterior concentrated around . |
12. API Quick Reference
Key Parameters
| Parameter | Type | Description |
|---|---|---|
| outcome_type | string | "continuous" (Normal-Normal) or "binary" (Beta-Binomial) |
| prior_effect / prior_alpha | float | Prior mean (continuous) or α (binary) |
| interim_n, final_n | int | Interim and planned final sample sizes |
| success_threshold | float | Posterior probability threshold (default: 0.95) |
Key Response Fields
predictive_probability— PPoS value (0-1)posterior_mean— Posterior mean effectcredible_interval— Posterior credible intervalrecommendation— "stop_for_efficacy" | "continue" | "stop_for_futility"
13. Technical References
- [1]Spiegelhalter, D. J., Freedman, L. S., & Parmar, M. K. B. (1994). Bayesian approaches to randomized trials. Journal of the Royal Statistical Society: Series A, 157(3), 357-387.
- [2]Berry, D. A. (2006). Bayesian clinical trials. Nature Reviews Drug Discovery, 5(1), 27-36.
- [3]U.S. Food and Drug Administration (2010). Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. PDF
- [4]U.S. Food and Drug Administration (2019). Adaptive Design Clinical Trials for Drugs and Biologics: Guidance for Industry. PDF
- [4a]U.S. Food and Drug Administration (2026). Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products: Draft Guidance for Industry. FDA.gov (NEW - January 12, 2026)
- [5]O'Hagan, A., Buck, C. E., Daneshkhah, A., et al. (2006). Uncertain Judgements: Eliciting Experts' Probabilities. Wiley.
- [6]Morita, S., Thall, P. F., & Müller, P. (2008). Determining the effective sample size of a parametric prior. Biometrics, 64(2), 595-602.
- [7]Schmidli, H., Gsteiger, S., Roychoudhury, S., et al. (2014). Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics, 70(4), 1023-1032.
- [8]Gelman, A., Carlin, J. B., Stern, H. S., et al. (2013). Bayesian Data Analysis. 3rd ed. Chapman & Hall/CRC.
- [9]Weber, S., Li, Y., Seaman, J. W., et al. (2021). RBesT: Robust Bayesian Evidence Synthesis Tools. CRAN
- [10]Chen, C., Li, N., Yuan, S., et al. (2019). Application of Bayesian predictive probability for interim futility analysis in single-arm phase II trial. Translational Cancer Research, 8(Suppl 4), S404-S420. PMC
- [11]Hampson, L. V., & Jennison, C. (2013). Group sequential tests for delayed responses. Journal of the Royal Statistical Society: Series B, 75(1), 3-54.
- [12]Jones, A. E., Puskarich, M. A., Shapiro, N. I., et al. (2015). An Adaptive, Phase II, Dose-Finding Clinical Trial Design to Evaluate L-Carnitine in the Treatment of Septic Shock Based on Efficacy and Predictive Probability of Subsequent Phase III Success. Critical Care Medicine, 43(3), 616-625. PMC
- [13]Saville, B. R., Connor, J. T., Ayers, G. D., & Alvarez, J. (2014). The utility of Bayesian predictive probabilities for interim monitoring of clinical trials. Clinical Trials, 11(4), 485-493. PMC
Ready to calculate?
Compute predictive probability with Zetyra's Bayesian calculator.