Unblinded Sample Size Re-estimation (SSR)
Technical documentation for unblinded interim sample size re-estimation using the promising zone approach. This page covers the Mehta & Pocock (2011) framework, the inverse-normal combination test, zone classification, conditional power analysis, regulatory alignment, and Monte Carlo validation.
Contents
1. Overview & Motivation
While blinded SSR re-estimates nuisance parameters (variance, pooled rate), unblinded SSR goes further: it uses the observed treatment effect from unblinded interim data to decide whether to increase the sample size.
This is particularly valuable when the treatment effect was overestimated at planning. A trial designed with 90% power under an optimistic effect may have much lower conditional power at interim if the observed effect is smaller. Unblinded SSR can rescue such trials by increasing the sample size in the “promising zone”—where conditional power is too low to succeed at the current N but the effect is real enough to justify continued enrollment.
Key trade-off: Unblinded SSR provides more information for the re-estimation decision but requires the inverse-normal combination test to control Type I error. The FDA requires pre-specification of the SSR rule and simulation evidence of Type I error control (FDA Guidance 2019, Section IV.B).
2. Promising Zone Framework
Mehta & Pocock (2011) proposed a four-zone decision framework based on the conditional power (CP) at interim, computed under the current sample size plan:
Favorable Zone: CP ≥ (default 0.8)
The trial is already on track. Keep the current sample size.
Promising Zone: ≤ CP < (default 0.3–0.8)
Increase N to achieve target power under the observed effect. This is where SSR has the most value.
Unfavorable Zone: ≤ CP < (default 0.1–0.3)
Effect too small for reasonable N increase. Continue without change; consider futility stopping if pre-specified.
Futility Zone: CP < (default 0.1)
Very low likelihood of success. Stopping for futility is recommended.
Threshold selection: The default thresholds (0.1, 0.3, 0.8) follow Mehta & Pocock (2011). The promising zone boundaries should be pre-specified in the protocol. Narrowing the promising zone reduces the frequency of sample size increases but may miss opportunities to rescue underpowered trials.
3. Inverse-Normal Combination Test
When the sample size is adjusted based on unblinded interim data, the standard z-test at the final analysis may inflate the Type I error. The inverse-normal combination test (Lehmacher & Wassmer 1999; Müller & Schäfer 2001) provides rigorous Type I error control regardless of the re-estimation rule.
The Method
The trial is divided into two stages at the interim. Independent z-statistics (stage 1) and (stage 2) are computed from the data in each stage. The combined test statistic is:
where the weights are determined by the information fraction at interim:
Under , and independently, so because . The critical value is simply :
Key insight: The combination test controls Type I error at exactly regardless of the sample size modification rule, because and are computed from non-overlapping data subsets and the weights are fixed before unblinding. The SSR decision (whether and how much to increase N) can depend on any function of without affecting the null distribution of .
Survival note: For survival endpoints, the information fraction is based on events (not patients): . The combination test weights therefore reflect the event-based information fraction, ensuring proper weighting of stage-wise logrank statistics.
Example Weights
| Interim Fraction | ||
|---|---|---|
| 0.25 | 0.500 | 0.866 |
| 0.50 | 0.707 | 0.707 |
| 0.75 | 0.866 | 0.500 |
4. Re-estimation Algorithm
The unblinded SSR procedure follows these steps:
Initial Design
Compute using the planned effect size and nuisance parameters. For survival, compute events via the Schoenfeld formula, then .
Stage 1 Analysis (Unblinded)
At interim, compute the unblinded stage 1 z-statistic and the observed treatment effect . For survival, is the logrank z-statistic and the observed HR is computed from unblinded interim data.
Conditional Power
Compute CP under the current N: .
Zone Classification
Classify CP into one of four zones (favorable, promising, unfavorable, futility) using pre-specified thresholds.
Sample Size Decision
If promising: compute needed to achieve target power under the observed effect (fixed-design formula). Otherwise: keep . For survival, recompute required events from the observed HR: , then using the planned event probability.
Constrain
Apply interim floor () and protocol cap (). For continuous/binary, enforce even parity. For survival, split by allocation ratio into per-arm sizes.
Final Analysis
Compute from stage 2 data and evaluate .
Re-estimation formula: In the promising zone, the required N is computed using the fixed-design sample size formula under the observed effect (Mehta & Pocock 2011). This replaces the naïve approach of inverting the conditional power formula, which is non-monotonic when and would always drive to the upper bound.
5. Conditional Power & Zones
Conditional power is computed under two scenarios: the observed effect (used for zone classification) and the planned effect (for comparison). The formula is the same as for blinded SSR:
The stage 1 z-statistic differs from the blinded case because it uses unblinded per-arm estimates:
Continuous
where is the pooled variance from both arms at interim.
Binary
where is the pooled proportion from both arms.
Survival
The logrank z-statistic from unblinded interim survival data (treatment vs. control).
Survival note: For survival endpoints, the information ratio uses events: (not ).
Zone Decision Table
| Zone | CP Range | Action |
|---|---|---|
| Favorable | CP ≥ 0.8 | Keep current N |
| Promising | 0.3 ≤ CP < 0.8 | Increase N to achieve target power |
| Unfavorable | 0.1 ≤ CP < 0.3 | Continue without increase |
| Futility | CP < 0.1 | Consider stopping for futility |
6. Sensitivity Analysis
Continuous & Binary Endpoints
Unlike blinded SSR which varies the nuisance parameter, unblinded SSR sensitivity analysis varies the observed treatment effect as multiples of the planned effect: 25%, 50%, 75%, 100% (planned), 125%, and 150%.
For each hypothetical effect size, the table reports:
- The resulting zone classification
- Conditional power at the current N
- Recalculated N (increased only in the promising zone)
- Inflation factor relative to the initial design
Protocol tip: The sensitivity table demonstrates the operating characteristics of the SSR rule across a range of plausible effect sizes. Include it in the SAP to show the DMC the expected behaviour of the re-estimation procedure.
Survival Endpoints
For survival endpoints, the sensitivity analysis varies the observed hazard ratio on the log scale using multipliers 0.50, 0.65, 0.75, 0.85, 1.0 (planned), and 1.15. The scenario HR is computed as , so multiplier scales . For example, with a planned HR = 0.70, yields (weaker effect) and yields (stronger effect).
In the promising zone, the required events are recomputed from the scenario HR via the Schoenfeld formula, and is derived using the planned event probability. The sensitivity table shows how the zone classification, required events, and recalculated shift as the true HR departs from the planned value.
7. Statistical Assumptions
Allocation: 1:1 randomization by default. Survival endpoints support unequal allocation via allocation_ratio.
Independent stages: Data in stage 1 and stage 2 are independent, which is guaranteed by non-overlapping patient sets and the combination test structure.
Fixed weights: The combination test weights are determined by the pre-specified information fraction and do not change based on interim results.
Common variance (continuous): Homoscedasticity across arms is assumed for the z-test.
Normal approximation: Both stage-wise z-statistics require adequate sample sizes for the normal approximation. For survival, the logrank test is used.
Exponential survival model: Survival SSR assumes exponential event times with uniform accrual and proportional hazards throughout the trial.
Single interim look: One pre-specified point for unblinded assessment and re-estimation.
Pre-specified zones: Zone thresholds must be fixed before unblinding and documented in the protocol.
8. Limitations & When Not to Use
Potential unblinding risk: The DMC or independent statistician accesses unblinded data. This requires a firewall to prevent operational bias from knowledge of interim results.
Power loss from combination test: The inverse-normal combination test can be slightly less powerful than the standard z-test when no sample size increase occurs. The loss is typically small (<1–2%).
Regulatory complexity: Unlike blinded SSR, unblinded SSR requires simulation evidence of Type I error control and more detailed pre-specification in the protocol.
Exponential model assumption: Survival SSR assumes exponential event times and proportional hazards. Non-proportional hazards, cure-rate models, or complex censoring patterns may require external simulation.
Unequal allocation (continuous/binary): Unequal allocation ratios are currently supported for survival endpoints only.
Use blinded SSR first: If the primary uncertainty is the nuisance parameter (not the effect size), prefer blinded SSR for its simpler regulatory path.
9. Regulatory Considerations
Unblinded SSR requires more regulatory documentation than blinded SSR. The FDA Guidance (2019, Section IV.B) specifies that the re-estimation rule must be pre-specified and Type I error control demonstrated through simulation.
Documentation Checklist
Pre-specify the zone thresholds () and the re-estimation procedure in the protocol.
Document the combination test (inverse-normal) with the pre-specified weights and critical value.
Provide simulation evidence demonstrating Type I error control at the nominal level across the parameter space.
Define the maximum sample size cap and justify it based on feasibility, budget, and expected operating characteristics.
Include the sensitivity analysis table showing zone classifications and recalculated N across effect size scenarios.
Describe the operational firewall ensuring that unblinded results are accessible only to the independent statistician/DMC.
Automated Warnings
Combination test disabled: Flags that Type I error may be inflated and the analysis is exploratory only.
Futility zone: Recommends stopping per the pre-specified decision rule.
Substantial increase (>50%): Flags feasibility impact on budget and timeline.
Cap binding: Notes achievable power may be below target and cap justification is required.
10. Monte Carlo Validation
Simulation validation is particularly important for unblinded SSR because the FDA requires simulation evidence of Type I error control. The Tier 2 Monte Carlo engine simulates:
- Generate stage 1 data under true parameters (unblinded per arm)
- Compute , conditional power, and zone
- If promising: recalculate N using the observed effect
- Generate stage 2 data to the adjusted target
- Compute from stage 2 data only
- Evaluate
- Repeat 1,000–100,000 times
Reported Metrics
Type I Error
Rejection rate under using the combination test. Should be , validating the combination test's error control.
Empirical Power
Rejection rate under . The combination test may be slightly less powerful than the unadjusted z-test.
Final N Distribution
Shows the distribution of actual sample sizes across simulations, reflecting how often and how much N increases.
Discordance Check
If simulated power deviates from analytical by >3%, a warning is raised.
Reproducibility: Every simulation is seeded and the seed is stored. The combination test is applied in simulation exactly as it would be in the trial, providing a true end-to-end validation of the adaptive procedure.
11. API Reference
POST /api/v1/calculators/ssr-unblinded
Computes unblinded sample size re-estimation with promising zone classification and optional Monte Carlo simulation validation.
Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| endpoint_type | string | "continuous" | "continuous", "binary", or "survival" |
| alpha | float | 0.025 | One-sided significance level (0, 1) |
| power | float | 0.90 | Target power (0.5, 1) |
| mean_difference | float | 0.3 | Planned treatment effect (continuous, >0) |
| initial_variance | float | 1.0 | Planned variance (continuous, >0) |
| control_rate | float? | null | Control arm rate (binary, (0,1)) |
| treatment_rate | float? | null | Treatment arm rate (0, 1); must be > control_rate |
| interim_fraction | float | 0.5 | Information fraction at interim (0.1, 0.9) |
| n_max_factor | float | 2.0 | Maximum inflation factor [1.0, 5.0] |
| cp_futility | float | 0.1 | Futility threshold [0, 1); must be < cp_promising_lower |
| cp_promising_lower | float | 0.3 | Promising zone lower bound (0, 1); must be < cp_promising_upper |
| cp_promising_upper | float | 0.8 | Favorable zone lower bound (0, 1) |
| observed_effect | float? | null | Observed treatment effect (continuous) |
| observed_variance | float? | null | Observed pooled variance (continuous, >0) |
| observed_control_rate | float? | null | Observed control rate (binary, (0,1)) |
| observed_treatment_rate | float? | null | Observed treatment rate (binary, (0,1)) |
| hazard_ratio | float? | null | Assumed hazard ratio (survival, (0,2), ≠1) |
| median_control | float? | null | Median control survival in months (>0) |
| accrual_time | float? | null | Accrual period in months (>0) |
| follow_up_time | float? | null | Follow-up after accrual in months (≥0) |
| dropout_rate | float | 0.0 | Annual dropout rate [0, 1) |
| allocation_ratio | float | 1.0 | Randomization ratio treatment:control (>0, survival only) |
| observed_hazard_ratio | float? | null | Observed HR at interim (survival, >0) |
| simulate | bool | false | Enable Monte Carlo simulation tier |
| simulation_seed | int? | null | Seed for reproducibility; auto-generated if omitted |
| n_simulations | int | 10000 | Number of simulations [1000, 100000] |
Example Request (Continuous Endpoint)
{
"endpoint_type": "continuous",
"alpha": 0.025,
"power": 0.90,
"mean_difference": 0.3,
"initial_variance": 1.0,
"interim_fraction": 0.5,
"n_max_factor": 2.0,
"cp_futility": 0.1,
"cp_promising_lower": 0.3,
"cp_promising_upper": 0.8,
"observed_effect": 0.2,
"observed_variance": 1.1,
"simulate": true,
"n_simulations": 10000
}Example Request (Survival Endpoint)
{
"endpoint_type": "survival",
"alpha": 0.025,
"power": 0.90,
"hazard_ratio": 0.7,
"median_control": 12,
"accrual_time": 24,
"follow_up_time": 12,
"allocation_ratio": 1.0,
"interim_fraction": 0.5,
"n_max_factor": 2.0,
"observed_hazard_ratio": 0.75,
"simulate": true,
"n_simulations": 10000
}Response Fields
| Field | Description |
|---|---|
| zone | Classified zone: favorable, promising, unfavorable, futility |
| z1 | Stage 1 z-statistic |
| conditional_power | CP under observed effect at current N |
| conditional_power_planned | CP under planned effect for comparison |
| recalculated_n_per_arm | Adjusted sample size per arm |
| combination_weight_1 | Stage 1 combination test weight |
| combination_weight_2 | Stage 2 combination test weight |
| adjusted_critical_value | Critical value for the combination test |
| recalculation_scenarios | Sensitivity table with 6 effect size scenarios |
| regulatory_notes | Context-specific regulatory guidance |
| events_required | Required events from Schoenfeld formula (survival only) |
| event_probability | Planned weighted event probability (survival only) |
| observed_event_probability | Event probability used for N conversion (currently planned value; survival only) |
| initial_n_control | Initial control arm size (survival with allocation_ratio) |
| initial_n_treatment | Initial treatment arm size (survival with allocation_ratio) |
| recalculated_n_control | Recalculated control arm size (survival with allocation_ratio) |
| recalculated_n_treatment | Recalculated treatment arm size (survival with allocation_ratio) |
12. Technical References
- Mehta CR, Pocock SJ. Adaptive increase in sample size when interim results are promising: a practical guide with examples. Statistics in Medicine. 2011;30(28):3267–3284.
- Chen YHJ, DeMets DL, Lan KKG. Increasing the sample size when the unblinded interim result is promising. Statistics in Medicine. 2004;23(7):1023–1038.
- Lehmacher W, Wassmer G. Adaptive sample size calculations in group sequential trials. Biometrics. 1999;55(4):1286–1290.
- Müller HH, Schäfer H. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches. Biometrics. 2001;57(3):886–891.
- FDA. Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry. 2019. Section IV.B.
- EMA. Reflection Paper on Methodological Issues in Confirmatory Clinical Trials Planned with an Adaptive Design. CHMP/EWP/2459/02. 2007.
- Cui L, Hung HMJ, Wang SJ. Modification of sample size in group sequential clinical trials. Biometrics. 1999;55(3):853–857.
- Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika. 1981;68(1):316–319.
- Gould AL. Interim analyses for monitoring clinical trials that do not materially affect the type I error rate. Statistics in Medicine. 1992;11(1):55–66.
- Friede T, et al. Blinded sample size re-estimation in event-driven clinical trials. Pharmaceutical Statistics. 2019;18(5):578–588.
Related Documentation
Blinded SSR
When the treatment effect is well-characterized and only the nuisance parameter is uncertain, blinded SSR provides a simpler regulatory path.
Complete Guide to SSR
Practitioner guide with blinded vs. unblinded decision framework, worked examples, SAP language, and R code.
Group Sequential Design
Interim monitoring with early stopping rules. GSD can be combined with SSR for trials that need both efficacy/futility boundaries and sample size flexibility.