A Complete Guide to CUPED
A comprehensive overview of CUPED (Controlled-experiment Using Pre-experiment Data), a variance reduction technique used to increase the sensitivity and trustworthiness of controlled experiments.
Contents
I. Definition and Core Concept
CUPED is a statistical method designed to improve the power of an experiment—the probability of detecting a true treatment effect when it exists. It does this by using data collected before the experiment begins (pre-experiment data) to reduce the variance of the metrics analyzed during the experiment.
In online experimentation, the effects we care about are often small—frequently in the 0.1%–2% range. These signals are easy to miss because they are buried in random variation. Because the relationship between sample size and the standard deviation of an outcome is quadratic, even modest reductions in variance can translate into large reductions in the number of users or the duration required to reach a decision.
Analogy
Using CUPED is like noise-canceling headphones for your data. Pre-experiment behavior identifies the steady background hum of a user's normal activity. By subtracting that hum, the treatment signal becomes much easier to hear.
II. The Mechanism: Variance Reduction
CUPED works by explaining away part of the variability in the primary outcome metric () using a baseline covariate ()—typically the same metric measured during a period immediately preceding the experiment.
Correlation is key
The effectiveness of CUPED depends on the correlation () between the pre-experiment data and the experiment-period data.
Adjusted metric
CUPED constructs an adjusted version of the outcome () by removing the component of that can be predicted from baseline behavior.
Efficiency gain
The variance of the adjusted metric is approximately times the original variance.
Building Intuition
| Correlation () | Variance Reduction | Interpretation |
|---|---|---|
| 0.5 | ~25% | Moderate baseline stability |
| 0.8 | ~64% | High baseline stability |
This reduction directly lowers the standard error, yielding tighter confidence intervals and higher statistical power for the same sample size.
III. CUPED vs. Traditional Baseline Adjustments
CUPED is best understood as a large-scale application of ANCOVA (Analysis of Covariance). It improves upon simpler baseline-adjustment approaches such as difference scores.
Difference Scores ()
This approach subtracts the baseline value from the outcome. Its variance is proportional to:
Difference scores are only more efficient than a simple post-test mean when the correlation between and exceeds 0.5. Below that threshold, they can actually increase noise.
CUPED / ANCOVA
CUPED uses a regression-based adjustment that assigns the optimal weight to the baseline covariate.
Key Insight: In practice, CUPED dominates difference scores in almost all realistic experimentation settings.
IV. How to Implement CUPED
A practical CUPED strategy typically follows these steps:
Define the Overall Evaluation Criterion (OEC)
Identify the primary metric the experiment is designed to move (e.g., revenue per user, sessions per user).
Collect pre-experiment covariates
Gather historical data for the same metric at the same unit of analysis (usually users) prior to experiment start.
Ensure unit consistency
The randomization unit must be identical in the pre-period and experiment period.
Estimate the adjustment coefficient ()
Compute:
Using historical or pooled pre-period data.
Compute adjusted outcomes
For each experimental unit:
Run the experiment analysis
Perform a standard two-sample t-test (or regression) on the adjusted outcomes.
Because CUPED reduces variance, the resulting test will have a smaller standard error, narrower confidence intervals, and more sensitivity to small effects.
Minimal Pseudocode
# X: pre-period metric # Y: experiment-period metric b = cov(X, Y) / var(X) Y_tilde = Y - b * (X - mean(X)) # Use Y_tilde in your standard A/B test
V. When Not to Use CUPED
CUPED is powerful, but it is not universally helpful. Situations where it may offer little or no benefit include:
Low pre/post correlation
If user behavior is highly volatile or the metric has weak temporal stability, variance reduction will be minimal.
Mismatched units
If the pre-period data cannot be reliably joined to experiment units, the adjustment may introduce bias or noise.
Metrics with structural breaks
If the experiment fundamentally changes how a metric is generated (e.g., redefining the metric itself), pre-period values may no longer be predictive.
Very short pre-periods
Insufficient historical data can lead to unstable coefficient estimates.
Note: CUPED assumes a linear relationship between baseline behavior and outcomes. While this is often a good approximation at scale, it should be validated rather than assumed.
VI. Strategic and Financial Benefits
Organizations that adopt CUPED consistently realize several advantages:
Higher sensitivity
Smaller, previously undetectable effects become measurable.
Shorter experiments
Decisions can be reached faster with the same level of confidence.
Lower opportunity cost
Fewer users remain in control while improvements are delayed.
Cost efficiency
Leveraging existing data is often far cheaper than acquiring additional experimental traffic.
In mature experimentation programs, CUPED is less a nice-to-have optimization and more a baseline expectation—an example of statistical rigor that compounds over dozens of experiments.
VII. Summary and Next Steps
This guide focuses on the why and how of CUPED, but it is intentionally opinionated about scope. A few things you may notice are absent:
If you want to move from theory to application, the natural next step is to quantify how much variance reduction you can realistically expect.
Ready to estimate the sample size reduction for your experiment?
Use the CUPED Calculator to model different correlation scenarios and see how variance reduction translates into faster, more sensitive experiments.
Open CUPED Calculator