Docs/Guides/Comparison

CUPED vs. GSD vs. Bayesian: When to Use Each

These three methods solve different problems at different stages of your experiment. Understanding when they complement each other—and when they compete—helps you design more efficient trials without overcomplicating your analysis plan.

1. Overview: Complement vs. Compete

The three Pro calculators address fundamentally different questions:

CUPED

“How can I reduce noise using what I already know about subjects?”

Stage: Design & Analysis

GSD

“When should I stop early, and what boundaries preserve Type I error?”

Stage: Design & Monitoring

Bayesian PP

“Given current data, what's the probability we'll succeed at the end?”

Stage: Monitoring & Decision

Key insight: These methods rarely compete directly. CUPED operates on variance, GSD operates on stopping rules, and Bayesian PP operates on probability updating. You can often use all three in the same trial.

Method Relationships

PairingRelationshipWhen Both Apply
CUPED + GSDComplementCUPED reduces required N; GSD allows early stopping. Use both for maximum efficiency.
GSD + Bayesian PPComplementGSD for regulatory boundaries; Bayesian PP for internal Go/No-Go. Different audiences.
CUPED + Bayesian PPComplementCUPED improves precision of interim estimates; Bayesian PP uses those estimates for projection.

2. CUPED vs. GSD (Design-Stage Decisions)

Both CUPED and GSD can reduce your expected sample size, but they do so through completely different mechanisms. Understanding this distinction is crucial for choosing the right approach—or using both together.

CUPED Approach

Reduces variance by adjusting for pre-experiment covariates, making each observation more informative. Works at the analysis level.

  • Reduces required N upfront (guaranteed savings)
  • No operational complexity during trial
  • Works with single-look designs
  • Requires pre-experiment data
  • Benefit depends on covariate correlation

GSD Approach

Allows early stopping when results are conclusive, reducing expected sample size while preserving Type I error. Works at the monitoring level.

  • Savings depend on true effect (larger effect = more savings)
  • No pre-experiment data required
  • Provides futility stopping option
  • Requires DMC, unblinding logistics
  • Max N is larger than fixed design

When to Use Which

Use CUPED when:

  • • You have strong baseline covariates (ρ > 0.5)
  • • Single-look design is acceptable
  • • You want guaranteed variance reduction

Use GSD when:

  • • Early stopping has ethical/financial value
  • • Effect size uncertainty is high
  • • No good baseline covariates exist
DimensionCUPEDGSD
How it saves NVariance reduction (certain)Early stopping (probabilistic)
Savings magnitude10-50% (depends on ρ)0-50% (depends on true effect)
Operational complexityLowMedium-High
Data requirementsPre-experiment covariateInterim readouts
Regulatory acceptanceWell-established (ANCOVA)Well-established (FDA guidance)
Can use together?Yes — apply CUPED to GSD-adjusted N for maximum efficiency

3. GSD vs. Bayesian PP (Monitoring-Stage Decisions)

Both GSD and Bayesian Predictive Power inform interim decision-making, but they serve different audiences and answer different questions. Many trials use both in parallel—GSD for regulatory stopping rules, Bayesian PP for internal Go/No-Go.

GSD (Frequentist)

Pre-specified stopping boundaries that control overall Type I error. Binary decision: cross boundary = stop, otherwise continue.

  • Preserves α at regulatory-required level
  • Fully pre-specified (no post-hoc flexibility)
  • Clear regulatory precedent
  • Doesn't directly answer “will we succeed?”
  • Requires commitment to spending function

Bayesian PP

Probability of trial success given current data and prior beliefs. Continuous measure that informs resource allocation decisions.

  • Direct probability statement stakeholders want
  • Incorporates prior information naturally
  • Flexible thresholds for different decisions
  • Doesn't control Type I error directly
  • Prior specification can be contentious

Common pattern: Use GSD boundaries for the official stopping rules in your SAP/DMC charter, and use Bayesian PP in parallel to inform internal portfolio decisions. The DMC sees both; regulators see the frequentist analysis.

The Fundamental Difference

GSD asks:

“Is the evidence strong enough to stop now while controlling false positives?”

Bayesian PP asks:

“If we continue, what's the probability we'll declare success at the end?”

DimensionGSDBayesian PP
Statistical frameworkFrequentistBayesian
Output typeBinary (stop/continue)Continuous (0-100%)
Controls Type I error?Yes (guaranteed)Indirectly (via threshold)
Incorporates prior?NoYes (explicitly)
Best forRegulatory decisionsPortfolio/internal decisions
Can use together?Yes — GSD for boundaries, Bayesian PP for futility assessment

4. CUPED vs. Bayesian (Variance Reduction vs. Interim Assessment)

These two methods rarely compete because they solve completely different problems. CUPED is an analysis technique that improves precision; Bayesian PP is a monitoring tool that projects future success. However, they interact in useful ways.

How They Interact

1

CUPED reduces standard error of your interim treatment effect estimate, making it more precise.

2

More precise interim estimates lead to tighter posterior distributions in the Bayesian update.

3

Tighter posteriors yield more decisive PPoS values—closer to 0% or 100%, less often in the “uncertain” zone.

CUPED is about...

Making your current data more informative by removing noise attributable to baseline differences.

Bayesian PP is about...

Projecting future success probability given what you've observed so far and what you believed beforehand.

5. Combined Workflows

For sophisticated trial designs, these methods work together. Here are common integration patterns used in industry and clinical research.

Pattern A: Full Integration (Large Phase III)

CUPED+GSD+Bayesian PP
  1. 1.Use GSD to design overall sample size and stopping boundaries
  2. 2.Apply CUPED adjustment factor to reduce maximum N
  3. 3.At each interim: GSD for regulatory stopping, Bayesian PP for internal Go/No-Go
  4. 4.CUPED-adjusted estimates feed into both analyses

Pattern B: Efficiency Crosshair (Tech A/B Testing)

CUPEDonly

Most tech experiments don't need interim stopping—they run for fixed durations (1-2 weeks). CUPED alone provides 25-50% variance reduction using pre-experiment user behavior.

When to add GSD: Very long experiments (>4 weeks) or high-stakes launches where early stopping saves significant resources.

Pattern C: Portfolio Decision (Phase II/III Transition)

Bayesian PP+GSD (optional)

At end of Phase II, use Bayesian PP to inform Go/No-Go for Phase III investment. Prior from Phase II data; likelihood from anticipated Phase III design.

When to add CUPED: If Phase III will collect baseline biomarkers or patient history that correlate with the primary endpoint.

6. Decision Matrix

Use this matrix to quickly identify which methods apply to your situation.

ScenarioCUPEDGSDBayesian PP
Short A/B test with baseline data
Phase III with planned interimsIf baseline exists
Phase II Go/No-Go decision
Fixed-sample RCT, no interims
Adaptive dose-finding studyComplex*
Long-term outcomes (5+ years)Limited value

* Adaptive designs require specialized GSD extensions beyond standard spending functions.

7. Frequently Asked Questions

Can I use CUPED with a group sequential design?

Yes, and you should. CUPED reduces the variance of your test statistic at each interim analysis, which means you need fewer subjects to reach the same information fraction. Apply the CUPED variance reduction factor when sizing your GSD. The stopping boundaries remain unchanged—they're based on the information fraction, not raw sample size.

Should I use conditional power or Bayesian predictive power?

Bayesian PP is generally preferred because it averages over uncertainty in the true effect, rather than assuming a single value. Conditional power requires you to specify “the effect we assume going forward”—typically the observed effect or the design assumption—which can mislead when the observed effect is noisy. Bayesian PP naturally accounts for this uncertainty through the posterior distribution.

What if I don't have pre-experiment data for CUPED?

Then CUPED doesn't apply. Consider whether you can collect baseline measurements before randomization (e.g., a run-in period). If not, focus on GSD for efficiency gains through early stopping. For tech experiments, even one week of pre-experiment data can yield meaningful variance reduction if the metric is stable.

Do regulators accept Bayesian methods for primary analysis?

Increasingly, yes—but with caveats. FDA has approved Bayesian designs for medical devices and certain drug indications. For Phase III confirmatory trials, most sponsors still use frequentist GSD as the primary analysis and Bayesian PP as a supportive/internal metric. The key is pre-specifying everything: prior, likelihood, decision thresholds.

How do I choose between O'Brien-Fleming and Pocock for GSD?

O'Brien-Fleming is the default for most confirmatory trials. It's conservative at early looks (hard to stop) but preserves power near the final analysis. Pocock is more aggressive early, which costs statistical power. Use Pocock only when early stopping has exceptional value—pandemic response, diseases with rapid progression, or when the ethical imperative to stop is paramount.

Can these methods be used for non-inferiority trials?

Yes. CUPED works identically—variance reduction applies regardless of hypothesis type. GSD for non-inferiority uses similar spending functions but with the test statistic reframed around the non-inferiority margin. Bayesian PP would compute the probability of concluding non-inferiority at the end. The principles transfer directly.

Ready to apply these methods?

Use our calculators to size your trial, plan stopping rules, or evaluate interim results.