CUPED vs. GSD vs. Bayesian: When to Use Each
These three methods solve different problems at different stages of your experiment. Understanding when they complement each other—and when they compete—helps you design more efficient trials without overcomplicating your analysis plan.
On This Page
1. Overview: Complement vs. Compete
The three Pro calculators address fundamentally different questions:
CUPED
“How can I reduce noise using what I already know about subjects?”
Stage: Design & Analysis
GSD
“When should I stop early, and what boundaries preserve Type I error?”
Stage: Design & Monitoring
Bayesian PP
“Given current data, what's the probability we'll succeed at the end?”
Stage: Monitoring & Decision
Key insight: These methods rarely compete directly. CUPED operates on variance, GSD operates on stopping rules, and Bayesian PP operates on probability updating. You can often use all three in the same trial.
Method Relationships
| Pairing | Relationship | When Both Apply |
|---|---|---|
| CUPED + GSD | Complement | CUPED reduces required N; GSD allows early stopping. Use both for maximum efficiency. |
| GSD + Bayesian PP | Complement | GSD for regulatory boundaries; Bayesian PP for internal Go/No-Go. Different audiences. |
| CUPED + Bayesian PP | Complement | CUPED improves precision of interim estimates; Bayesian PP uses those estimates for projection. |
2. CUPED vs. GSD (Design-Stage Decisions)
Both CUPED and GSD can reduce your expected sample size, but they do so through completely different mechanisms. Understanding this distinction is crucial for choosing the right approach—or using both together.
CUPED Approach
Reduces variance by adjusting for pre-experiment covariates, making each observation more informative. Works at the analysis level.
- Reduces required N upfront (guaranteed savings)
- No operational complexity during trial
- Works with single-look designs
- Requires pre-experiment data
- Benefit depends on covariate correlation
GSD Approach
Allows early stopping when results are conclusive, reducing expected sample size while preserving Type I error. Works at the monitoring level.
- Savings depend on true effect (larger effect = more savings)
- No pre-experiment data required
- Provides futility stopping option
- Requires DMC, unblinding logistics
- Max N is larger than fixed design
When to Use Which
Use CUPED when:
- • You have strong baseline covariates (ρ > 0.5)
- • Single-look design is acceptable
- • You want guaranteed variance reduction
Use GSD when:
- • Early stopping has ethical/financial value
- • Effect size uncertainty is high
- • No good baseline covariates exist
| Dimension | CUPED | GSD |
|---|---|---|
| How it saves N | Variance reduction (certain) | Early stopping (probabilistic) |
| Savings magnitude | 10-50% (depends on ρ) | 0-50% (depends on true effect) |
| Operational complexity | Low | Medium-High |
| Data requirements | Pre-experiment covariate | Interim readouts |
| Regulatory acceptance | Well-established (ANCOVA) | Well-established (FDA guidance) |
| Can use together? | Yes — apply CUPED to GSD-adjusted N for maximum efficiency | |
3. GSD vs. Bayesian PP (Monitoring-Stage Decisions)
Both GSD and Bayesian Predictive Power inform interim decision-making, but they serve different audiences and answer different questions. Many trials use both in parallel—GSD for regulatory stopping rules, Bayesian PP for internal Go/No-Go.
GSD (Frequentist)
Pre-specified stopping boundaries that control overall Type I error. Binary decision: cross boundary = stop, otherwise continue.
- Preserves α at regulatory-required level
- Fully pre-specified (no post-hoc flexibility)
- Clear regulatory precedent
- Doesn't directly answer “will we succeed?”
- Requires commitment to spending function
Bayesian PP
Probability of trial success given current data and prior beliefs. Continuous measure that informs resource allocation decisions.
- Direct probability statement stakeholders want
- Incorporates prior information naturally
- Flexible thresholds for different decisions
- Doesn't control Type I error directly
- Prior specification can be contentious
Common pattern: Use GSD boundaries for the official stopping rules in your SAP/DMC charter, and use Bayesian PP in parallel to inform internal portfolio decisions. The DMC sees both; regulators see the frequentist analysis.
The Fundamental Difference
GSD asks:
“Is the evidence strong enough to stop now while controlling false positives?”
Bayesian PP asks:
“If we continue, what's the probability we'll declare success at the end?”
| Dimension | GSD | Bayesian PP |
|---|---|---|
| Statistical framework | Frequentist | Bayesian |
| Output type | Binary (stop/continue) | Continuous (0-100%) |
| Controls Type I error? | Yes (guaranteed) | Indirectly (via threshold) |
| Incorporates prior? | No | Yes (explicitly) |
| Best for | Regulatory decisions | Portfolio/internal decisions |
| Can use together? | Yes — GSD for boundaries, Bayesian PP for futility assessment | |
4. CUPED vs. Bayesian (Variance Reduction vs. Interim Assessment)
These two methods rarely compete because they solve completely different problems. CUPED is an analysis technique that improves precision; Bayesian PP is a monitoring tool that projects future success. However, they interact in useful ways.
How They Interact
CUPED reduces standard error of your interim treatment effect estimate, making it more precise.
More precise interim estimates lead to tighter posterior distributions in the Bayesian update.
Tighter posteriors yield more decisive PPoS values—closer to 0% or 100%, less often in the “uncertain” zone.
CUPED is about...
Making your current data more informative by removing noise attributable to baseline differences.
Bayesian PP is about...
Projecting future success probability given what you've observed so far and what you believed beforehand.
5. Combined Workflows
For sophisticated trial designs, these methods work together. Here are common integration patterns used in industry and clinical research.
Pattern A: Full Integration (Large Phase III)
- 1.Use GSD to design overall sample size and stopping boundaries
- 2.Apply CUPED adjustment factor to reduce maximum N
- 3.At each interim: GSD for regulatory stopping, Bayesian PP for internal Go/No-Go
- 4.CUPED-adjusted estimates feed into both analyses
Pattern B: Efficiency Crosshair (Tech A/B Testing)
Most tech experiments don't need interim stopping—they run for fixed durations (1-2 weeks). CUPED alone provides 25-50% variance reduction using pre-experiment user behavior.
When to add GSD: Very long experiments (>4 weeks) or high-stakes launches where early stopping saves significant resources.
Pattern C: Portfolio Decision (Phase II/III Transition)
At end of Phase II, use Bayesian PP to inform Go/No-Go for Phase III investment. Prior from Phase II data; likelihood from anticipated Phase III design.
When to add CUPED: If Phase III will collect baseline biomarkers or patient history that correlate with the primary endpoint.
6. Decision Matrix
Use this matrix to quickly identify which methods apply to your situation.
| Scenario | CUPED | GSD | Bayesian PP |
|---|---|---|---|
| Short A/B test with baseline data | |||
| Phase III with planned interims | If baseline exists | ||
| Phase II Go/No-Go decision | |||
| Fixed-sample RCT, no interims | |||
| Adaptive dose-finding study | Complex* | ||
| Long-term outcomes (5+ years) | Limited value |
* Adaptive designs require specialized GSD extensions beyond standard spending functions.
7. Frequently Asked Questions
Can I use CUPED with a group sequential design?
Yes, and you should. CUPED reduces the variance of your test statistic at each interim analysis, which means you need fewer subjects to reach the same information fraction. Apply the CUPED variance reduction factor when sizing your GSD. The stopping boundaries remain unchanged—they're based on the information fraction, not raw sample size.
Should I use conditional power or Bayesian predictive power?
Bayesian PP is generally preferred because it averages over uncertainty in the true effect, rather than assuming a single value. Conditional power requires you to specify “the effect we assume going forward”—typically the observed effect or the design assumption—which can mislead when the observed effect is noisy. Bayesian PP naturally accounts for this uncertainty through the posterior distribution.
What if I don't have pre-experiment data for CUPED?
Then CUPED doesn't apply. Consider whether you can collect baseline measurements before randomization (e.g., a run-in period). If not, focus on GSD for efficiency gains through early stopping. For tech experiments, even one week of pre-experiment data can yield meaningful variance reduction if the metric is stable.
Do regulators accept Bayesian methods for primary analysis?
Increasingly, yes—but with caveats. FDA has approved Bayesian designs for medical devices and certain drug indications. For Phase III confirmatory trials, most sponsors still use frequentist GSD as the primary analysis and Bayesian PP as a supportive/internal metric. The key is pre-specifying everything: prior, likelihood, decision thresholds.
How do I choose between O'Brien-Fleming and Pocock for GSD?
O'Brien-Fleming is the default for most confirmatory trials. It's conservative at early looks (hard to stop) but preserves power near the final analysis. Pocock is more aggressive early, which costs statistical power. Use Pocock only when early stopping has exceptional value—pandemic response, diseases with rapid progression, or when the ethical imperative to stop is paramount.
Can these methods be used for non-inferiority trials?
Yes. CUPED works identically—variance reduction applies regardless of hypothesis type. GSD for non-inferiority uses similar spending functions but with the test statistic reframed around the non-inferiority margin. Bayesian PP would compute the probability of concluding non-inferiority at the end. The principles transfer directly.
Ready to apply these methods?
Use our calculators to size your trial, plan stopping rules, or evaluate interim results.