Docs/Unblinded SSR

Unblinded Sample Size Re-estimation (SSR)

Technical documentation for unblinded interim sample size re-estimation using the promising zone approach. This page covers the Mehta & Pocock (2011) framework, the inverse-normal combination test, zone classification, conditional power analysis, regulatory alignment, and Monte Carlo validation.

1. Overview & Motivation

While blinded SSR re-estimates nuisance parameters (variance, pooled rate), unblinded SSR goes further: it uses the observed treatment effect from unblinded interim data to decide whether to increase the sample size.

This is particularly valuable when the treatment effect was overestimated at planning. A trial designed with 90% power under an optimistic effect may have much lower conditional power at interim if the observed effect is smaller. Unblinded SSR can rescue such trials by increasing the sample size in the “promising zone”—where conditional power is too low to succeed at the current N but the effect is real enough to justify continued enrollment.

Key trade-off: Unblinded SSR provides more information for the re-estimation decision but requires the inverse-normal combination test to control Type I error. The FDA requires pre-specification of the SSR rule and simulation evidence of Type I error control (FDA Guidance 2019, Section IV.B).

2. Promising Zone Framework

Mehta & Pocock (2011) proposed a four-zone decision framework based on the conditional power (CP) at interim, computed under the current sample size plan:

Favorable Zone: CP ≥ $\gamma_U$ (default 0.8)

The trial is already on track. Keep the current sample size.

Promising Zone: $\gamma_L$ ≤ CP < $\gamma_U$ (default 0.3–0.8)

Increase N to achieve target power under the observed effect. This is where SSR has the most value.

Unfavorable Zone: $\gamma_F$ ≤ CP < $\gamma_L$ (default 0.1–0.3)

Effect too small for reasonable N increase. Continue without change; consider futility stopping if pre-specified.

Futility Zone: CP < $\gamma_F$ (default 0.1)

Very low likelihood of success. Stopping for futility is recommended.

Threshold selection: The default thresholds (0.1, 0.3, 0.8) follow Mehta & Pocock (2011). The promising zone boundaries should be pre-specified in the protocol. Narrowing the promising zone reduces the frequency of sample size increases but may miss opportunities to rescue underpowered trials.

3. Inverse-Normal Combination Test

When the sample size is adjusted based on unblinded interim data, the standard z-test at the final analysis may inflate the Type I error. The inverse-normal combination test (Lehmacher & Wassmer 1999; Müller & Schäfer 2001) provides rigorous Type I error control regardless of the re-estimation rule.

The Method

The trial is divided into two stages at the interim. Independent z-statistics $Z_1$ (stage 1) and $Z_2$ (stage 2) are computed from the data in each stage. The combined test statistic is:

Z_{\text{comb}} = w_1 Z_1 + w_2 Z_2

where the weights are determined by the information fraction at interim:

w_1 = \sqrt{t}, \quad w_2 = \sqrt{1-t}, \quad w_1^2 + w_2^2 = 1

Under $H_0$ , $Z_1 \sim N(0,1)$ and $Z_2 \sim N(0,1)$ independently, so $Z_{\text{comb}} \sim N(0,1)$ because $w_1^2 + w_2^2 = 1$ . The critical value is simply $z_\alpha$ :

\text{Reject } H_0 \iff Z_{\text{comb}} > z_\alpha

Key insight: The combination test controls Type I error at exactly $\alpha$ regardless of the sample size modification rule, because $Z_1$ and $Z_2$ are computed from non-overlapping data subsets and the weights are fixed before unblinding. The SSR decision (whether and how much to increase N) can depend on any function of $Z_1$ without affecting the null distribution of $Z_{\text{comb}}$ .

Survival note: For survival endpoints, the information fraction $t$ is based on events (not patients): $t = d_{\text{interim}} / d_{\text{total}}$ . The combination test weights $w_1, w_2$ therefore reflect the event-based information fraction, ensuring proper weighting of stage-wise logrank statistics.

Example Weights

Interim Fraction	$w_1$	$w_2$
0.25	0.500	0.866
0.50	0.707	0.707
0.75	0.866	0.500

4. Re-estimation Algorithm

The unblinded SSR procedure follows these steps:

Initial Design

Compute $N_0 = 2n$ using the planned effect size and nuisance parameters. For survival, compute $d$ events via the Schoenfeld formula, then $N_0 = \lceil d \,/\, \bar{P}(\text{event}) \rceil$ .

Stage 1 Analysis (Unblinded)

At interim, compute the unblinded stage 1 z-statistic $Z_1$ and the observed treatment effect $\hat{\delta}$ . For survival, $Z_1$ is the logrank z-statistic and the observed HR is computed from unblinded interim data.

Conditional Power

Compute CP under the current N: $\text{CP} = \Phi(Z_1\sqrt{R} - z_\alpha\sqrt{R-1})$ .

Zone Classification

Classify CP into one of four zones (favorable, promising, unfavorable, futility) using pre-specified thresholds.

Sample Size Decision

If promising: compute $N_1$ needed to achieve target power under the observed effect (fixed-design formula). Otherwise: keep $N_0$ . For survival, recompute required events from the observed HR: $d_{\text{new}} = \lceil (z_\alpha + z_\beta)^2 (1+r)^2 \,/\, (r \cdot \log^2(\widehat{\text{HR}})) \rceil$ , then $N_1 = \lceil d_{\text{new}} \,/\, \bar{P}(\text{event}) \rceil$ using the planned event probability.

Constrain

Apply interim floor ( $N_1 \geq n_{\text{interim}}$ ) and protocol cap ( $N_1 \leq N_0 \cdot f_{\max}$ ). For continuous/binary, enforce even parity. For survival, split by allocation ratio into per-arm sizes.

Final Analysis

Compute $Z_2$ from stage 2 data and evaluate $Z_{\text{comb}} = w_1 Z_1 + w_2 Z_2 > z_\alpha$ .

Re-estimation formula: In the promising zone, the required N is computed using the fixed-design sample size formula under the observed effect (Mehta & Pocock 2011). This replaces the naïve approach of inverting the conditional power formula, which is non-monotonic when $Z_1 < z_\alpha$ and would always drive to the upper bound.

5. Conditional Power & Zones

Conditional power is computed under two scenarios: the observed effect (used for zone classification) and the planned effect (for comparison). The formula is the same as for blinded SSR:

\text{CP} = \Phi\left( Z_1 \sqrt{R} - z_\alpha \sqrt{R - 1} \right)

The stage 1 z-statistic differs from the blinded case because it uses unblinded per-arm estimates:

Continuous

Z_1 = \frac{\bar{X}_T - \bar{X}_C}{\sqrt{S^2_p (1/n_{C,1} + 1/n_{T,1})}}

where $S^2_p$ is the pooled variance from both arms at interim.

Binary

Z_1 = \frac{\hat{p}_{T,1} - \hat{p}_{C,1}}{\sqrt{\hat{\bar{p}}_1(1-\hat{\bar{p}}_1)(1/n_{C,1} + 1/n_{T,1})}}

where $\hat{\bar{p}}_1$ is the pooled proportion from both arms.

Survival

Z_1 = \text{logrank } z\text{-statistic}

The logrank z-statistic from unblinded interim survival data (treatment vs. control).

Survival note: For survival endpoints, the information ratio uses events: $R = d_{\text{total}} / d_{\text{interim}}$ (not $N_{\text{total}} / N_{\text{interim}}$ ).

Zone Decision Table

Zone	CP Range	Action
Favorable	CP ≥ 0.8	Keep current N
Promising	0.3 ≤ CP < 0.8	Increase N to achieve target power
Unfavorable	0.1 ≤ CP < 0.3	Continue without increase
Futility	CP < 0.1	Consider stopping for futility

6. Sensitivity Analysis

Continuous & Binary Endpoints

Unlike blinded SSR which varies the nuisance parameter, unblinded SSR sensitivity analysis varies the observed treatment effect as multiples of the planned effect: 25%, 50%, 75%, 100% (planned), 125%, and 150%.

For each hypothetical effect size, the table reports:

The resulting zone classification
Conditional power at the current N
Recalculated N (increased only in the promising zone)
Inflation factor relative to the initial design

Protocol tip: The sensitivity table demonstrates the operating characteristics of the SSR rule across a range of plausible effect sizes. Include it in the SAP to show the DMC the expected behaviour of the re-estimation procedure.

Survival Endpoints

For survival endpoints, the sensitivity analysis varies the observed hazard ratio on the log scale using multipliers 0.50, 0.65, 0.75, 0.85, 1.0 (planned), and 1.15. The scenario HR is computed as $\text{HR}_{\text{scenario}} = \text{HR}^{m}$ , so multiplier $m$ scales $\log(\text{HR})$ . For example, with a planned HR = 0.70, $m = 0.5$ yields $0.70^{0.5} \approx 0.84$ (weaker effect) and $m = 1.15$ yields $0.70^{1.15} \approx 0.66$ (stronger effect).

In the promising zone, the required events $d$ are recomputed from the scenario HR via the Schoenfeld formula, and $N = \lceil d_{\text{new}} / \bar{P}(\text{event}) \rceil$ is derived using the planned event probability. The sensitivity table shows how the zone classification, required events, and recalculated $N$ shift as the true HR departs from the planned value.

7. Statistical Assumptions

Allocation: 1:1 randomization by default. Survival endpoints support unequal allocation via allocation_ratio.

Independent stages: Data in stage 1 and stage 2 are independent, which is guaranteed by non-overlapping patient sets and the combination test structure.

Fixed weights: The combination test weights $w_1, w_2$ are determined by the pre-specified information fraction and do not change based on interim results.

Common variance (continuous): Homoscedasticity across arms is assumed for the z-test.

Normal approximation: Both stage-wise z-statistics require adequate sample sizes for the normal approximation. For survival, the logrank test is used.

Exponential survival model: Survival SSR assumes exponential event times with uniform accrual and proportional hazards throughout the trial.

Single interim look: One pre-specified point for unblinded assessment and re-estimation.

Pre-specified zones: Zone thresholds must be fixed before unblinding and documented in the protocol.

8. Limitations & When Not to Use

Potential unblinding risk: The DMC or independent statistician accesses unblinded data. This requires a firewall to prevent operational bias from knowledge of interim results.

Power loss from combination test: The inverse-normal combination test can be slightly less powerful than the standard z-test when no sample size increase occurs. The loss is typically small (<1–2%).

Regulatory complexity: Unlike blinded SSR, unblinded SSR requires simulation evidence of Type I error control and more detailed pre-specification in the protocol.

Exponential model assumption: Survival SSR assumes exponential event times and proportional hazards. Non-proportional hazards, cure-rate models, or complex censoring patterns may require external simulation.

Unequal allocation (continuous/binary): Unequal allocation ratios are currently supported for survival endpoints only.

Use blinded SSR first: If the primary uncertainty is the nuisance parameter (not the effect size), prefer blinded SSR for its simpler regulatory path.

9. Regulatory Considerations

Unblinded SSR requires more regulatory documentation than blinded SSR. The FDA Guidance (2019, Section IV.B) specifies that the re-estimation rule must be pre-specified and Type I error control demonstrated through simulation.

Documentation Checklist

Pre-specify the zone thresholds ( $\gamma_F, \gamma_L, \gamma_U$ ) and the re-estimation procedure in the protocol.

Document the combination test (inverse-normal) with the pre-specified weights and critical value.

Provide simulation evidence demonstrating Type I error control at the nominal level across the parameter space.

Define the maximum sample size cap and justify it based on feasibility, budget, and expected operating characteristics.

Include the sensitivity analysis table showing zone classifications and recalculated N across effect size scenarios.

Describe the operational firewall ensuring that unblinded results are accessible only to the independent statistician/DMC.

Automated Warnings

Combination test disabled: Flags that Type I error may be inflated and the analysis is exploratory only.

Futility zone: Recommends stopping per the pre-specified decision rule.

Substantial increase (>50%): Flags feasibility impact on budget and timeline.

Cap binding: Notes achievable power may be below target and cap justification is required.

10. Monte Carlo Validation

Simulation validation is particularly important for unblinded SSR because the FDA requires simulation evidence of Type I error control. The Tier 2 Monte Carlo engine simulates:

Generate stage 1 data under true parameters (unblinded per arm)
Compute $Z_1$ , conditional power, and zone
If promising: recalculate N using the observed effect
Generate stage 2 data to the adjusted target
Compute $Z_2$ from stage 2 data only
Evaluate $Z_{\text{comb}} = w_1 Z_1 + w_2 Z_2 > z_\alpha$
Repeat 1,000–100,000 times

Reported Metrics

Type I Error

Rejection rate under $H_0$ using the combination test. Should be $\leq \alpha$ , validating the combination test's error control.

Empirical Power

Rejection rate under $H_1$ . The combination test may be slightly less powerful than the unadjusted z-test.

Final N Distribution

Shows the distribution of actual sample sizes across simulations, reflecting how often and how much N increases.

Discordance Check

If simulated power deviates from analytical by >3%, a warning is raised.

Reproducibility: Every simulation is seeded and the seed is stored. The combination test is applied in simulation exactly as it would be in the trial, providing a true end-to-end validation of the adaptive procedure.

11. API Reference

POST /api/v1/calculators/ssr-unblinded

Computes unblinded sample size re-estimation with promising zone classification and optional Monte Carlo simulation validation.

Request Parameters

Parameter	Type	Default	Description
endpoint_type	string	"continuous"	"continuous", "binary", or "survival"
alpha	float	0.025	One-sided significance level (0, 1)
power	float	0.90	Target power (0.5, 1)
mean_difference	float	0.3	Planned treatment effect (continuous, >0)
initial_variance	float	1.0	Planned variance (continuous, >0)
control_rate	float?	null	Control arm rate (binary, (0,1))
treatment_rate	float?	null	Treatment arm rate (0, 1); must be > control_rate
interim_fraction	float	0.5	Information fraction at interim (0.1, 0.9)
n_max_factor	float	2.0	Maximum inflation factor [1.0, 5.0]
cp_futility	float	0.1	Futility threshold [0, 1); must be < cp_promising_lower
cp_promising_lower	float	0.3	Promising zone lower bound (0, 1); must be < cp_promising_upper
cp_promising_upper	float	0.8	Favorable zone lower bound (0, 1)
observed_effect	float?	null	Observed treatment effect (continuous)
observed_variance	float?	null	Observed pooled variance (continuous, >0)
observed_control_rate	float?	null	Observed control rate (binary, (0,1))
observed_treatment_rate	float?	null	Observed treatment rate (binary, (0,1))
hazard_ratio	float?	null	Assumed hazard ratio (survival, (0,2), ≠1)
median_control	float?	null	Median control survival in months (>0)
accrual_time	float?	null	Accrual period in months (>0)
follow_up_time	float?	null	Follow-up after accrual in months (≥0)
dropout_rate	float	0.0	Annual dropout rate [0, 1)
allocation_ratio	float	1.0	Randomization ratio treatment:control (>0, survival only)
observed_hazard_ratio	float?	null	Observed HR at interim (survival, >0)
simulate	bool	false	Enable Monte Carlo simulation tier
simulation_seed	int?	null	Seed for reproducibility; auto-generated if omitted
n_simulations	int	10000	Number of simulations [1000, 100000]

Example Request (Continuous Endpoint)

{
  "endpoint_type": "continuous",
  "alpha": 0.025,
  "power": 0.90,
  "mean_difference": 0.3,
  "initial_variance": 1.0,
  "interim_fraction": 0.5,
  "n_max_factor": 2.0,
  "cp_futility": 0.1,
  "cp_promising_lower": 0.3,
  "cp_promising_upper": 0.8,
  "observed_effect": 0.2,
  "observed_variance": 1.1,
  "simulate": true,
  "n_simulations": 10000
}

Example Request (Survival Endpoint)

{
  "endpoint_type": "survival",
  "alpha": 0.025,
  "power": 0.90,
  "hazard_ratio": 0.7,
  "median_control": 12,
  "accrual_time": 24,
  "follow_up_time": 12,
  "allocation_ratio": 1.0,
  "interim_fraction": 0.5,
  "n_max_factor": 2.0,
  "observed_hazard_ratio": 0.75,
  "simulate": true,
  "n_simulations": 10000
}

Response Fields

Field	Description
zone	Classified zone: favorable, promising, unfavorable, futility
z1	Stage 1 z-statistic
conditional_power	CP under observed effect at current N
conditional_power_planned	CP under planned effect for comparison
recalculated_n_per_arm	Adjusted sample size per arm
combination_weight_1	Stage 1 combination test weight
combination_weight_2	Stage 2 combination test weight
adjusted_critical_value	Critical value for the combination test
recalculation_scenarios	Sensitivity table with 6 effect size scenarios
regulatory_notes	Context-specific regulatory guidance
events_required	Required events $d$ from Schoenfeld formula (survival only)
event_probability	Planned weighted event probability (survival only)
observed_event_probability	Event probability used for N conversion (currently planned value; survival only)
initial_n_control	Initial control arm size (survival with allocation_ratio)
initial_n_treatment	Initial treatment arm size (survival with allocation_ratio)
recalculated_n_control	Recalculated control arm size (survival with allocation_ratio)
recalculated_n_treatment	Recalculated treatment arm size (survival with allocation_ratio)

12. Technical References

Mehta CR, Pocock SJ. Adaptive increase in sample size when interim results are promising: a practical guide with examples. Statistics in Medicine. 2011;30(28):3267–3284.
Chen YHJ, DeMets DL, Lan KKG. Increasing the sample size when the unblinded interim result is promising. Statistics in Medicine. 2004;23(7):1023–1038.
Lehmacher W, Wassmer G. Adaptive sample size calculations in group sequential trials. Biometrics. 1999;55(4):1286–1290.
Müller HH, Schäfer H. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches. Biometrics. 2001;57(3):886–891.
FDA. Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry. 2019. Section IV.B.
EMA. Reflection Paper on Methodological Issues in Confirmatory Clinical Trials Planned with an Adaptive Design. CHMP/EWP/2459/02. 2007.
Cui L, Hung HMJ, Wang SJ. Modification of sample size in group sequential clinical trials. Biometrics. 1999;55(3):853–857.
Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika. 1981;68(1):316–319.
Gould AL. Interim analyses for monitoring clinical trials that do not materially affect the type I error rate. Statistics in Medicine. 1992;11(1):55–66.
Friede T, et al. Blinded sample size re-estimation in event-driven clinical trials. Pharmaceutical Statistics. 2019;18(5):578–588.