A Complete Guide to Sample Size Re-estimation
Sample Size Re-estimation (SSR) is an adaptive design strategy that allows a clinical trial to adjust its sample size at an interim analysis—compensating for planning assumptions that turn out to be wrong. This guide covers when to use SSR, how to choose between blinded and unblinded approaches, and how to integrate SSR into your protocol and Statistical Analysis Plan.
Analogy: Insurance Against Planning Uncertainty
Designing a clinical trial is like budgeting a construction project. You estimate material costs (variance, event rates) and scope (effect size) based on pilot data or literature. But once construction begins, lumber prices may spike or the foundation may need reinforcement. SSR is your change-order clause—a pre-specified mechanism to adjust the budget mid-project without starting over, while keeping the building code (Type I error) intact.
Contents
I. When to Use Sample Size Re-estimation
Every sample size calculation depends on planning assumptions: the expected treatment effect, the variance of the outcome, the response rate, or the event rate. When these assumptions come from small pilot studies, literature reviews, or expert opinion, they carry substantial uncertainty. An underpowered trial wastes patients and resources; an overpowered one is inefficient.
SSR addresses this by allowing a pre-specified interim look at the accumulating data to revise the sample size. It is appropriate when:
Nuisance parameter uncertainty
The variance, response rate, or event rate used in planning is based on limited data and could be substantially wrong. This is the most common and least-controversial use case.
Effect size uncertainty
The treatment effect may be smaller than hoped but still clinically meaningful. If the interim data shows a “promising” trend, you may want to increase the sample size to rescue the trial rather than fail due to optimistic planning.
Regulatory or operational flexibility
The protocol needs to accommodate real-world uncertainty while maintaining rigorous Type I error control—a requirement for regulatory submissions.
Key principle: SSR is not about “fishing” for significance. It is a pre-planned mechanism written into the protocol before the trial begins, with rules that preserve the validity of the final analysis.
II. Blinded vs. Unblinded: Choosing Your Approach
The fundamental decision is whether to look at treatment-specific data or only pooled data. This determines everything downstream: what you can adjust, what statistical machinery is required, and who needs to be involved.
| Aspect | Blinded SSR | Unblinded SSR |
|---|---|---|
| What you observe | Pooled data only (no treatment labels) | Per-arm data (treatment unblinded) |
| What you re-estimate | Nuisance parameters only (variance, pooled rate, event rate) | Treatment effect and nuisance parameters |
| When to increase N | Always (if nuisance parameter changed) | Only in the “promising zone” |
| Type I error control | Preserved by blinding—no special test needed | Requires inverse-normal combination test |
| DMC required? | No (sponsor can perform) | Yes (independent DMC) |
| Regulatory acceptance | Widely accepted (FDA, EMA) | Accepted with pre-specification; more scrutiny |
| Operational complexity | Low | High (DMC charter, firewalls) |
| Best for | Uncertain variance or event rate | Uncertain effect size with “rescue” intent |
Rule of thumb: If your main concern is that variance or event rates may be misspecified, use blinded SSR—it is simpler, requires no DMC, and has broad regulatory acceptance. If your concern is that the true treatment effect may be smaller than planned, use unblinded SSR—but be prepared for the operational overhead of a DMC and pre-specified decision rules.
Can you combine both? In principle, a trial could include both a blinded SSR (for nuisance parameters) and an unblinded SSR (for effect size) at different interim fractions. In practice, this is rare and adds complexity. Choose the approach that addresses your primary source of uncertainty.
III. Blinded SSR: How It Works
Blinded SSR re-estimates nuisance parameters (the quantities that affect power but are not the treatment effect itself) from pooled interim data, then recalculates the required sample size using the original effect size assumption.
The Algorithm
Compute the initial sample size
Using planning assumptions (effect size, variance/rate, alpha, power), calculate the target sample size with the standard formula for your endpoint type.
Enroll to the interim fraction
Collect data on subjects, where is the pre-specified interim fraction (typically 0.5).
Estimate the nuisance parameter from pooled data
Without unblinding, compute the blinded estimate: pooled variance for continuous endpoints, pooled response rate for binary, or pooled event rate for survival.
Recalculate N using the updated nuisance parameter
Plug the observed nuisance parameter into the sample size formula while keeping the planned effect size fixed. The treatment effect assumption does not change—only the “noise” estimate.
Apply constraints
Enforce the protocol cap () and the interim floor (—you cannot un-enroll patients). For continuous/binary endpoints, enforce even parity; for survival, split by allocation ratio.
Why does blinding preserve Type I error?
Because the sponsor never sees treatment-specific outcomes, the sample size adjustment depends only on the overall variability of the data—not on whether one arm is doing better. The decision to increase N carries no information about the treatment effect, so the standard test statistic at the final analysis remains valid. This was formally established by Kieser & Friede (2003) and is explicitly acknowledged in FDA Guidance on Adaptive Designs (2019).
Note on bias: The blinded pooled variance includes a bias term (Kieser & Friede 2003) because it mixes two distributions. For small effects relative to the variance, this bias is negligible. The calculator uses the blinded estimate directly, which is conservative (slightly overestimates N).
IV. Unblinded SSR: The Promising Zone Approach
Unblinded SSR (Mehta & Pocock 2011) allows the sample size to increase when the interim treatment effect falls in a “promising zone”—large enough to suggest a real benefit, but not large enough for the trial to succeed at its originally planned N. The key challenge is maintaining Type I error control when the sample size depends on the unblinded treatment effect.
The Four Zones
At the interim analysis, compute the conditional power (CP)—the probability of achieving significance at the final analysis given the data observed so far. The CP determines which zone the trial is in:
Favorable Zone (CP ≥ 80%)
The trial is on track to succeed. Keep the original N.
Promising Zone (30% ≤ CP < 80%)
The effect is trending in the right direction but the trial is underpowered at the current N. Increase N using the fixed-design sample size formula under the observed effect at the target power (typically 90%).
Unfavorable Zone (10% ≤ CP < 30%)
The effect is weak. Increasing N would require an impractically large increase. Keep the original N.
Futility Zone (CP < 10%)
The data suggest the treatment is unlikely to work. Consider stopping for futility.
The Combination Test
When N depends on the observed treatment effect, the standard z-test at the final analysis is no longer valid—the distribution of the test statistic changes because the sample size was influenced by the interim data. The inverse-normal combination test resolves this by splitting the evidence into two independent stages:
where and are pre-specified weights based on the information fraction , and are the z-statistics from Stage 1 and Stage 2 data respectively. Reject if .
Why it works: Because and Stage 1 and Stage 2 data are independent, follows a standard normal under regardless of how N was modified between stages. The critical value remains —the same as a fixed design.
For survival endpoints: The combination test uses the event-based information fraction rather than the sample-based fraction, since events (not patients) drive the statistical information.
V. Worked Examples
Blinded SSR: Continuous Endpoint
Scenario: A two-arm RCT targets a mean difference of units with planned variance , (one-sided), and 90% power.
An SSR interim look is planned at 50% enrollment, with a maximum sample size cap of .
Step 1: Initial sample size
Total . Cap . Interim at .
Step 2: Blinded variance estimate
At the interim, the pooled (blinded) variance across all 85 subjects is —44% higher than planned.
Step 3: Recalculate N
Raw . After constraints: floor ✔, cap ✔. Final (inflation factor 1.44×).
Outcome: Without SSR, the trial would have been underpowered (actual power ~75% given the true variance). The SSR adjustment restores 90% power by increasing enrollment from 170 to 244—without unblinding anyone.
Unblinded SSR: Binary Endpoint
Scenario: A Phase III trial compares a new therapy (planned ) vs. control () with and 90% power.
An unblinded SSR is planned at 50% enrollment. Zone thresholds: futility CP < 10%, promising 30–80%, favorable ≥ 80%.
Step 1: Initial sample size
Using the Fleiss, Levin & Paik pooled-variance formula: per arm, . Cap . Interim at .
Step 2: Unblinded interim results
The DMC reports observed rates: , . The observed effect is smaller than planned (0.15).
Step 3: Conditional power and zone
CP under the observed effect at the current N = 434 is approximately 60%. This falls in the promising zone (30% ≤ 60% < 80%).
Step 4: Re-estimate N
Using the fixed-design formula under the observed effect (), the calculator computes the N required for 90% power: (463 per arm). After constraints: floor ✔, cap —cap binds, .
Step 5: Final analysis with combination test
At trial completion, and are computed from Stage 1 and Stage 2 data. The combination statistic is compared to .
Outcome: The trial was “rescued” by the SSR: the true effect (10 pp) was clinically meaningful but smaller than planned (15 pp). Without SSR, the trial would have been underpowered and likely failed. With SSR, the increase from 434 to 868 subjects (cap-bound at 2×) restored adequate power while maintaining strict Type I error control via the combination test.
VI. Planning Workflow
SSR must be written into the protocol and SAP before the trial begins. Here is the typical sequence of decisions during protocol development:
Identify the source of uncertainty
Is the concern primarily about the nuisance parameter (variance, rate) or the treatment effect? This determines blinded vs. unblinded.
Choose the interim fraction
Typical range: 25–75% of planned enrollment. Common choices are 50% (balanced information) or 33% (earlier look, but noisier estimate). For survival endpoints, this is the fraction of planned events, not patients.
Set the maximum cap
The protocol must specify the maximum allowed increase (e.g., ). This limits operational and financial risk. Regulatory agencies expect this to be pre-specified.
Pre-specify zone thresholds (unblinded only)
Define the CP thresholds that delineate the four zones: futility (< 10%), unfavorable (10–30%), promising (30–80%), favorable (≥ 80%). These are the defaults from Mehta & Pocock (2011) but can be adjusted.
Run sensitivity analysis
Use the calculator's sensitivity table to explore how N changes across a range of plausible nuisance parameters (blinded) or effect sizes (unblinded). This informs the choice of cap and verifies feasibility.
Document in protocol and SAP
Write the SSR procedure, interim fraction, cap, and decision rules into the protocol and SAP using precise statistical language (see Section VIII below).
Simulation validation: Both calculators support Monte Carlo simulation (10,000 trials by default). Use this to verify that the empirical Type I error is preserved under the SSR procedure and that the operating characteristics (power, expected N) match expectations.
VII. When NOT to Use SSR
SSR is a powerful tool, but it is not appropriate in every situation:
Planning assumptions are well-established
If the variance, event rate, and effect size are well-characterized from large Phase II trials or meta-analyses, there is little to gain from SSR and the operational overhead is not justified.
The sample size is already capped by feasibility
If the maximum feasible enrollment is already reached (e.g., rare disease with a fixed patient pool), SSR cannot increase N beyond what is available.
GSD early stopping is the primary concern
If the goal is to stop early for efficacy or futility (not to increase N), use Group Sequential Design instead. SSR and GSD serve different purposes and can be combined, but SSR alone does not provide stopping boundaries.
Non-proportional hazards or complex censoring (survival)
The survival SSR assumes exponential event times and proportional hazards. If these assumptions are substantially violated (e.g., immunotherapy with delayed separation), external simulation tools may be needed.
No DMC and treatment effect is the concern
Unblinded SSR requires an independent DMC with appropriate firewalls. If your trial does not have a DMC (common in tech A/B tests or small academic trials), you are limited to blinded SSR.
VIII. Example SAP Language
The following templates can be adapted for your protocol or Statistical Analysis Plan. Replace bracketed values with your trial-specific parameters.
Blinded SSR
“A blinded sample size re-estimation will be conducted after [50%] of the planned [168] subjects have been enrolled and have completed the primary endpoint assessment. The blinded pooled [variance / response rate / event rate] will be estimated from all available data without unblinding treatment assignment.
The sample size will be recalculated using the observed [nuisance parameter] and the originally planned treatment effect of [δ = 5 units / 15 percentage points / HR = 0.70], maintaining the [one-sided α = 0.025] significance level and [90%] power.
The recalculated sample size will be subject to a maximum cap of [2.0×] the initial sample size ([336] subjects) and a minimum of the number of subjects already enrolled. This procedure preserves the Type I error rate as the sample size adjustment is based only on blinded aggregate data (Kieser & Friede, 2003; FDA Guidance on Adaptive Designs, 2019, Section IV.B.1).”
Unblinded SSR
“An unblinded sample size re-estimation based on the promising zone approach (Mehta & Pocock, 2011) will be conducted after [50%] of the planned [324] subjects have been enrolled. The independent Data Monitoring Committee (DMC) will review unblinded treatment-arm data and compute the conditional power (CP) under the observed treatment effect.
The conditional power will be used to classify the interim result into one of four zones: futility (CP < [10%]), unfavorable ([10%] ≤ CP < [30%]), promising ([30%] ≤ CP < [80%]), or favorable (CP ≥ [80%]). If the result falls in the promising zone, the total sample size will be recalculated using the fixed-design formula under the observed effect at [90%] power, subject to a maximum of [2.0×] the initial sample size ([648] subjects). In all other zones, the original sample size will be maintained.
The final analysis will employ the inverse-normal combination test with pre-specified weights and , rejecting the null hypothesis if . This procedure controls the familywise Type I error rate at [2.5%] one-sided regardless of the sample size modification (Müller & Schäfer, 2001; Mehta & Pocock, 2011).”
IX. R Code
Standalone R implementations for verifying the calculator results. These use base R only (no special packages required).
Blinded SSR: Continuous Endpoint
# Blinded SSR for continuous endpoint
blinded_ssr_continuous <- function(
delta, # Planned mean difference
sigma2_planned, # Planned variance
sigma2_obs, # Observed blinded pooled variance
alpha = 0.025, # One-sided alpha
power = 0.90, # Target power
interim_frac = 0.50,
n_max_factor = 2.0
) {
z_alpha <- qnorm(1 - alpha)
z_beta <- qnorm(power)
# Initial N
n_per_arm_0 <- ceiling(2 * (z_alpha + z_beta)^2 * sigma2_planned / delta^2)
N0 <- 2 * n_per_arm_0
N_interim <- ceiling(interim_frac * N0)
N_cap <- ceiling(N0 * n_max_factor)
# Recalculate with observed variance
n_per_arm_1 <- ceiling(2 * (z_alpha + z_beta)^2 * sigma2_obs / delta^2)
N1_raw <- 2 * n_per_arm_1
# Constrain: floor at interim, cap at N_max, even parity
# Backend logic: cap rounds DOWN (n_cap %/% 2), uncapped rounds UP
bounded <- min(max(N1_raw, N_interim), N_cap)
if (bounded < N_interim) {
n_per_arm <- ceiling(N_interim / 2)
} else if (N1_raw > N_cap) {
n_per_arm <- N_cap %/% 2 # cap binding: round DOWN
} else {
n_per_arm <- ceiling(N1_raw / 2) # uncapped: round UP
}
N1 <- 2 * n_per_arm
# Conditional power
n1_per_arm <- N_interim / 2
z_expected <- delta * sqrt(n1_per_arm / (2 * sigma2_obs))
R <- N1 / N_interim
CP <- pnorm(z_expected * sqrt(R) - z_alpha * sqrt(R - 1))
list(
initial_N = N0,
interim_N = N_interim,
new_N = N1,
inflation = N1 / N0,
cond_power = round(CP, 4),
cap_binding = N1_raw > N_cap
)
}
# Example: planned sigma2=100, observed sigma2=144
blinded_ssr_continuous(delta = 5, sigma2_planned = 100, sigma2_obs = 144)
# initial_N=170, new_N=244, inflation=1.44, cond_power~0.72Unblinded SSR: Binary Endpoint (Promising Zone)
# Unblinded SSR for binary endpoint (Mehta & Pocock 2011)
unblinded_ssr_binary <- function(
p_c_planned, # Planned control rate
p_t_planned, # Planned treatment rate
p_c_obs, # Observed control rate at interim
p_t_obs, # Observed treatment rate at interim
alpha = 0.025,
power = 0.90,
interim_frac = 0.50,
n_max_factor = 2.0,
cp_futility = 0.10,
cp_promising_lower = 0.30,
cp_promising_upper = 0.80
) {
z_alpha <- qnorm(1 - alpha)
z_beta <- qnorm(power)
# Initial N (Fleiss-Levin-Paik)
delta_plan <- abs(p_t_planned - p_c_planned)
p_bar_plan <- (p_c_planned + p_t_planned) / 2
n0 <- ceiling(
((z_alpha * sqrt(2 * p_bar_plan * (1 - p_bar_plan))
+ z_beta * sqrt(p_c_planned*(1-p_c_planned) + p_t_planned*(1-p_t_planned)))
/ delta_plan)^2
)
N0 <- 2 * n0
N_interim <- ceiling(interim_frac * N0)
N_cap <- ceiling(N0 * n_max_factor)
# Stage 1 z-statistic
n1_per_arm <- N_interim / 2
p_bar_obs <- (p_c_obs + p_t_obs) / 2
delta_obs <- p_t_obs - p_c_obs
SE1 <- sqrt(p_bar_obs * (1 - p_bar_obs) * 2 / n1_per_arm)
z1 <- delta_obs / SE1
# Conditional power under observed effect
R <- N0 / N_interim
CP <- pnorm(z1 * sqrt(R) - z_alpha * sqrt(R - 1))
# Zone classification
zone <- if (CP >= cp_promising_upper) "favorable"
else if (CP >= cp_promising_lower) "promising"
else if (CP >= cp_futility) "unfavorable"
else "futility"
# Re-estimate N if promising
if (zone == "promising") {
n1_new <- ceiling(
((z_alpha * sqrt(2 * p_bar_obs * (1 - p_bar_obs))
+ z_beta * sqrt(p_c_obs*(1-p_c_obs) + p_t_obs*(1-p_t_obs)))
/ abs(delta_obs))^2
)
N1_raw <- max(2 * n1_new, N_interim)
} else {
N1_raw <- N0
}
# Constrain: cap rounds DOWN, uncapped rounds UP (matches backend)
bounded <- min(max(N1_raw, N_interim), N_cap)
if (bounded < N_interim) {
n_per_arm <- ceiling(N_interim / 2)
} else if (N1_raw > N_cap) {
n_per_arm <- N_cap %/% 2
} else {
n_per_arm <- ceiling(N1_raw / 2)
}
N1 <- 2 * n_per_arm
# Combination weights
w1 <- sqrt(interim_frac)
w2 <- sqrt(1 - interim_frac)
list(
initial_N = N0,
interim_N = N_interim,
z1 = round(z1, 4),
cp_obs = round(CP, 4),
zone = zone,
new_N = N1,
inflation = N1 / N0,
w1 = round(w1, 4),
w2 = round(w2, 4),
z_crit = round(z_alpha, 4)
)
}
# Example: planned 45% vs 30%, observed 38% vs 28%
unblinded_ssr_binary(
p_c_planned = 0.30, p_t_planned = 0.45,
p_c_obs = 0.28, p_t_obs = 0.38
)
# zone="promising", N0=434, N1=868 (cap-bound at 2x)Survival SSR: Events and N Conversion
# Schoenfeld events and N conversion for survival SSR
survival_ssr_events <- function(
HR, # Planned hazard ratio
median_control, # Control median survival (months)
accrual_time, # Accrual period (months)
follow_up_time, # Follow-up after accrual (months)
alpha = 0.025,
power = 0.90,
alloc_ratio = 1.0,
dropout_rate = 0.0
) {
z_alpha <- qnorm(1 - alpha)
z_beta <- qnorm(power)
r <- alloc_ratio
# Required events (Schoenfeld)
d <- ceiling((z_alpha + z_beta)^2 * (1 + r)^2 / (r * log(HR)^2))
# Event probability (exponential model, uniform accrual)
lambda_c <- log(2) / median_control
lambda_t <- lambda_c * HR
total_time <- accrual_time + follow_up_time
p_c <- 1 - exp(-lambda_c * (total_time - accrual_time / 2))
p_t <- 1 - exp(-lambda_t * (total_time - accrual_time / 2))
# Apply dropout
if (dropout_rate > 0) {
years <- total_time / 12
retention <- (1 - dropout_rate)^years
p_c <- p_c * retention
p_t <- p_t * retention
}
p_avg <- (p_c + r * p_t) / (1 + r)
p_avg <- max(p_avg, 0.01)
N <- ceiling(d / p_avg)
n_control <- ceiling(N / (1 + r))
n_treatment <- N - n_control
list(
events_required = d,
p_event_control = round(p_c, 4),
p_event_treatment = round(p_t, 4),
p_event_avg = round(p_avg, 4),
N_total = N,
n_control = n_control,
n_treatment = n_treatment
)
}
# Example: HR=0.7, median control=12 months
survival_ssr_events(HR = 0.7, median_control = 12,
accrual_time = 24, follow_up_time = 12)
# events_required=331, p_avg~0.69, N_total=483X. References
Mehta CR, Pocock SJ. Adaptive increase in sample size when interim results are promising: A practical guide with examples. Statistics in Medicine. 2011;30(28):3267–3284.
Kieser M, Friede T. Simple procedures for blinded sample size adjustment that do not affect the type I error rate. Statistics in Medicine. 2003;22(23):3571–3581.
Müller HH, Schäfer H. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches. Biometrics. 2001;57(3):886–891.
Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika. 1981;68(1):316–319.
Gould AL. Interim analyses for monitoring clinical trials that do not materially affect the type I error rate. Statistics in Medicine. 1992;11(1):55–66.
Friede T, et al. Blinded sample size re-estimation in event-driven clinical trials. Pharmaceutical Statistics. 2019;18(5):578–588.
Chen YHJ, DeMets DL, Lan KKG. Increasing the sample size when the unblinded interim result is promising. Statistics in Medicine. 2004;23(7):1023–1038.
FDA. Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry. 2019.
Ready to Calculate?
Use the SSR calculators to compute recalculated sample sizes, conditional power, zone classification, and sensitivity tables.
Related Documentation
Blinded SSR Technical Reference
Full mathematical derivations, API specification, and validation benchmarks.
Unblinded SSR Technical Reference
Combination test theory, zone classification, and API specification.
Complete Guide to GSD
Stopping boundaries and interim analysis planning—complementary to SSR.