Docs/Unblinded SSR

Unblinded Sample Size Re-estimation (SSR)

Technical documentation for unblinded interim sample size re-estimation using the promising zone approach. This page covers the Mehta & Pocock (2011) framework, the inverse-normal combination test, zone classification, conditional power analysis, regulatory alignment, and Monte Carlo validation.

1. Overview & Motivation

While blinded SSR re-estimates nuisance parameters (variance, pooled rate), unblinded SSR goes further: it uses the observed treatment effect from unblinded interim data to decide whether to increase the sample size.

This is particularly valuable when the treatment effect was overestimated at planning. A trial designed with 90% power under an optimistic effect may have much lower conditional power at interim if the observed effect is smaller. Unblinded SSR can rescue such trials by increasing the sample size in the “promising zone”—where conditional power is too low to succeed at the current N but the effect is real enough to justify continued enrollment.

Key trade-off: Unblinded SSR provides more information for the re-estimation decision but requires the inverse-normal combination test to control Type I error. The FDA requires pre-specification of the SSR rule and simulation evidence of Type I error control (FDA Guidance 2019, Section IV.B).

2. Promising Zone Framework

Mehta & Pocock (2011) proposed a four-zone decision framework based on the conditional power (CP) at interim, computed under the current sample size plan:

Favorable Zone: CP ≥ γU\gamma_U (default 0.8)

The trial is already on track. Keep the current sample size.

Promising Zone: γL\gamma_L ≤ CP < γU\gamma_U (default 0.3–0.8)

Increase N to achieve target power under the observed effect. This is where SSR has the most value.

Unfavorable Zone: γF\gamma_F ≤ CP < γL\gamma_L (default 0.1–0.3)

Effect too small for reasonable N increase. Continue without change; consider futility stopping if pre-specified.

Futility Zone: CP < γF\gamma_F (default 0.1)

Very low likelihood of success. Stopping for futility is recommended.

Threshold selection: The default thresholds (0.1, 0.3, 0.8) follow Mehta & Pocock (2011). The promising zone boundaries should be pre-specified in the protocol. Narrowing the promising zone reduces the frequency of sample size increases but may miss opportunities to rescue underpowered trials.

3. Inverse-Normal Combination Test

When the sample size is adjusted based on unblinded interim data, the standard z-test at the final analysis may inflate the Type I error. The inverse-normal combination test (Lehmacher & Wassmer 1999; Müller & Schäfer 2001) provides rigorous Type I error control regardless of the re-estimation rule.

The Method

The trial is divided into two stages at the interim. Independent z-statistics Z1Z_1 (stage 1) and Z2Z_2 (stage 2) are computed from the data in each stage. The combined test statistic is:

Zcomb=w1Z1+w2Z2Z_{\text{comb}} = w_1 Z_1 + w_2 Z_2

where the weights are determined by the information fraction at interim:

w1=t,w2=1t,w12+w22=1w_1 = \sqrt{t}, \quad w_2 = \sqrt{1-t}, \quad w_1^2 + w_2^2 = 1

Under H0H_0, Z1N(0,1)Z_1 \sim N(0,1) and Z2N(0,1)Z_2 \sim N(0,1) independently, so ZcombN(0,1)Z_{\text{comb}} \sim N(0,1) because w12+w22=1w_1^2 + w_2^2 = 1. The critical value is simply zαz_\alpha:

Reject H0    Zcomb>zα\text{Reject } H_0 \iff Z_{\text{comb}} > z_\alpha

Key insight: The combination test controls Type I error at exactly α\alpha regardless of the sample size modification rule, because Z1Z_1 and Z2Z_2 are computed from non-overlapping data subsets and the weights are fixed before unblinding. The SSR decision (whether and how much to increase N) can depend on any function of Z1Z_1 without affecting the null distribution of ZcombZ_{\text{comb}}.

Survival note: For survival endpoints, the information fraction tt is based on events (not patients): t=dinterim/dtotalt = d_{\text{interim}} / d_{\text{total}}. The combination test weights w1,w2w_1, w_2 therefore reflect the event-based information fraction, ensuring proper weighting of stage-wise logrank statistics.

Example Weights

Interim Fractionw1w_1w2w_2
0.250.5000.866
0.500.7070.707
0.750.8660.500

4. Re-estimation Algorithm

The unblinded SSR procedure follows these steps:

1

Initial Design

Compute N0=2nN_0 = 2n using the planned effect size and nuisance parameters. For survival, compute dd events via the Schoenfeld formula, then N0=d/Pˉ(event)N_0 = \lceil d \,/\, \bar{P}(\text{event}) \rceil.

2

Stage 1 Analysis (Unblinded)

At interim, compute the unblinded stage 1 z-statistic Z1Z_1 and the observed treatment effect δ^\hat{\delta}. For survival, Z1Z_1 is the logrank z-statistic and the observed HR is computed from unblinded interim data.

3

Conditional Power

Compute CP under the current N: CP=Φ(Z1RzαR1)\text{CP} = \Phi(Z_1\sqrt{R} - z_\alpha\sqrt{R-1}).

4

Zone Classification

Classify CP into one of four zones (favorable, promising, unfavorable, futility) using pre-specified thresholds.

5

Sample Size Decision

If promising: compute N1N_1 needed to achieve target power under the observed effect (fixed-design formula). Otherwise: keep N0N_0. For survival, recompute required events from the observed HR: dnew=(zα+zβ)2(1+r)2/(rlog2(HR^))d_{\text{new}} = \lceil (z_\alpha + z_\beta)^2 (1+r)^2 \,/\, (r \cdot \log^2(\widehat{\text{HR}})) \rceil, then N1=dnew/Pˉ(event)N_1 = \lceil d_{\text{new}} \,/\, \bar{P}(\text{event}) \rceil using the planned event probability.

6

Constrain

Apply interim floor (N1ninterimN_1 \geq n_{\text{interim}}) and protocol cap (N1N0fmaxN_1 \leq N_0 \cdot f_{\max}). For continuous/binary, enforce even parity. For survival, split by allocation ratio into per-arm sizes.

7

Final Analysis

Compute Z2Z_2 from stage 2 data and evaluate Zcomb=w1Z1+w2Z2>zαZ_{\text{comb}} = w_1 Z_1 + w_2 Z_2 > z_\alpha.

Re-estimation formula: In the promising zone, the required N is computed using the fixed-design sample size formula under the observed effect (Mehta & Pocock 2011). This replaces the naïve approach of inverting the conditional power formula, which is non-monotonic when Z1<zαZ_1 < z_\alpha and would always drive to the upper bound.

5. Conditional Power & Zones

Conditional power is computed under two scenarios: the observed effect (used for zone classification) and the planned effect (for comparison). The formula is the same as for blinded SSR:

CP=Φ(Z1RzαR1)\text{CP} = \Phi\left( Z_1 \sqrt{R} - z_\alpha \sqrt{R - 1} \right)

The stage 1 z-statistic differs from the blinded case because it uses unblinded per-arm estimates:

Continuous

Z1=XˉTXˉCSp2(1/nC,1+1/nT,1)Z_1 = \frac{\bar{X}_T - \bar{X}_C}{\sqrt{S^2_p (1/n_{C,1} + 1/n_{T,1})}}

where Sp2S^2_p is the pooled variance from both arms at interim.

Binary

Z1=p^T,1p^C,1pˉ^1(1pˉ^1)(1/nC,1+1/nT,1)Z_1 = \frac{\hat{p}_{T,1} - \hat{p}_{C,1}}{\sqrt{\hat{\bar{p}}_1(1-\hat{\bar{p}}_1)(1/n_{C,1} + 1/n_{T,1})}}

where pˉ^1\hat{\bar{p}}_1 is the pooled proportion from both arms.

Survival

Z1=logrank z-statisticZ_1 = \text{logrank } z\text{-statistic}

The logrank z-statistic from unblinded interim survival data (treatment vs. control).

Survival note: For survival endpoints, the information ratio uses events: R=dtotal/dinterimR = d_{\text{total}} / d_{\text{interim}} (not Ntotal/NinterimN_{\text{total}} / N_{\text{interim}}).

Zone Decision Table

ZoneCP RangeAction
FavorableCP ≥ 0.8Keep current N
Promising0.3 ≤ CP < 0.8Increase N to achieve target power
Unfavorable0.1 ≤ CP < 0.3Continue without increase
FutilityCP < 0.1Consider stopping for futility

6. Sensitivity Analysis

Continuous & Binary Endpoints

Unlike blinded SSR which varies the nuisance parameter, unblinded SSR sensitivity analysis varies the observed treatment effect as multiples of the planned effect: 25%, 50%, 75%, 100% (planned), 125%, and 150%.

For each hypothetical effect size, the table reports:

  • The resulting zone classification
  • Conditional power at the current N
  • Recalculated N (increased only in the promising zone)
  • Inflation factor relative to the initial design

Protocol tip: The sensitivity table demonstrates the operating characteristics of the SSR rule across a range of plausible effect sizes. Include it in the SAP to show the DMC the expected behaviour of the re-estimation procedure.

Survival Endpoints

For survival endpoints, the sensitivity analysis varies the observed hazard ratio on the log scale using multipliers 0.50, 0.65, 0.75, 0.85, 1.0 (planned), and 1.15. The scenario HR is computed as HRscenario=HRm\text{HR}_{\text{scenario}} = \text{HR}^{m}, so multiplier mm scales log(HR)\log(\text{HR}). For example, with a planned HR = 0.70, m=0.5m = 0.5 yields 0.700.50.840.70^{0.5} \approx 0.84 (weaker effect) and m=1.15m = 1.15 yields 0.701.150.660.70^{1.15} \approx 0.66 (stronger effect).

In the promising zone, the required events dd are recomputed from the scenario HR via the Schoenfeld formula, and N=dnew/Pˉ(event)N = \lceil d_{\text{new}} / \bar{P}(\text{event}) \rceil is derived using the planned event probability. The sensitivity table shows how the zone classification, required events, and recalculated NN shift as the true HR departs from the planned value.

7. Statistical Assumptions

Allocation: 1:1 randomization by default. Survival endpoints support unequal allocation via allocation_ratio.

Independent stages: Data in stage 1 and stage 2 are independent, which is guaranteed by non-overlapping patient sets and the combination test structure.

Fixed weights: The combination test weights w1,w2w_1, w_2 are determined by the pre-specified information fraction and do not change based on interim results.

Common variance (continuous): Homoscedasticity across arms is assumed for the z-test.

Normal approximation: Both stage-wise z-statistics require adequate sample sizes for the normal approximation. For survival, the logrank test is used.

Exponential survival model: Survival SSR assumes exponential event times with uniform accrual and proportional hazards throughout the trial.

Single interim look: One pre-specified point for unblinded assessment and re-estimation.

Pre-specified zones: Zone thresholds must be fixed before unblinding and documented in the protocol.

8. Limitations & When Not to Use

Potential unblinding risk: The DMC or independent statistician accesses unblinded data. This requires a firewall to prevent operational bias from knowledge of interim results.

Power loss from combination test: The inverse-normal combination test can be slightly less powerful than the standard z-test when no sample size increase occurs. The loss is typically small (<1–2%).

Regulatory complexity: Unlike blinded SSR, unblinded SSR requires simulation evidence of Type I error control and more detailed pre-specification in the protocol.

Exponential model assumption: Survival SSR assumes exponential event times and proportional hazards. Non-proportional hazards, cure-rate models, or complex censoring patterns may require external simulation.

Unequal allocation (continuous/binary): Unequal allocation ratios are currently supported for survival endpoints only.

Use blinded SSR first: If the primary uncertainty is the nuisance parameter (not the effect size), prefer blinded SSR for its simpler regulatory path.

9. Regulatory Considerations

Unblinded SSR requires more regulatory documentation than blinded SSR. The FDA Guidance (2019, Section IV.B) specifies that the re-estimation rule must be pre-specified and Type I error control demonstrated through simulation.

Documentation Checklist

Pre-specify the zone thresholds (γF,γL,γU\gamma_F, \gamma_L, \gamma_U) and the re-estimation procedure in the protocol.

Document the combination test (inverse-normal) with the pre-specified weights and critical value.

Provide simulation evidence demonstrating Type I error control at the nominal level across the parameter space.

Define the maximum sample size cap and justify it based on feasibility, budget, and expected operating characteristics.

Include the sensitivity analysis table showing zone classifications and recalculated N across effect size scenarios.

Describe the operational firewall ensuring that unblinded results are accessible only to the independent statistician/DMC.

Automated Warnings

Combination test disabled: Flags that Type I error may be inflated and the analysis is exploratory only.

Futility zone: Recommends stopping per the pre-specified decision rule.

Substantial increase (>50%): Flags feasibility impact on budget and timeline.

Cap binding: Notes achievable power may be below target and cap justification is required.

10. Monte Carlo Validation

Simulation validation is particularly important for unblinded SSR because the FDA requires simulation evidence of Type I error control. The Tier 2 Monte Carlo engine simulates:

  1. Generate stage 1 data under true parameters (unblinded per arm)
  2. Compute Z1Z_1, conditional power, and zone
  3. If promising: recalculate N using the observed effect
  4. Generate stage 2 data to the adjusted target
  5. Compute Z2Z_2 from stage 2 data only
  6. Evaluate Zcomb=w1Z1+w2Z2>zαZ_{\text{comb}} = w_1 Z_1 + w_2 Z_2 > z_\alpha
  7. Repeat 1,000–100,000 times

Reported Metrics

Type I Error

Rejection rate under H0H_0 using the combination test. Should be α\leq \alpha, validating the combination test's error control.

Empirical Power

Rejection rate under H1H_1. The combination test may be slightly less powerful than the unadjusted z-test.

Final N Distribution

Shows the distribution of actual sample sizes across simulations, reflecting how often and how much N increases.

Discordance Check

If simulated power deviates from analytical by >3%, a warning is raised.

Reproducibility: Every simulation is seeded and the seed is stored. The combination test is applied in simulation exactly as it would be in the trial, providing a true end-to-end validation of the adaptive procedure.

11. API Reference

POST /api/v1/calculators/ssr-unblinded

Computes unblinded sample size re-estimation with promising zone classification and optional Monte Carlo simulation validation.

Request Parameters

ParameterTypeDefaultDescription
endpoint_typestring"continuous""continuous", "binary", or "survival"
alphafloat0.025One-sided significance level (0, 1)
powerfloat0.90Target power (0.5, 1)
mean_differencefloat0.3Planned treatment effect (continuous, >0)
initial_variancefloat1.0Planned variance (continuous, >0)
control_ratefloat?nullControl arm rate (binary, (0,1))
treatment_ratefloat?nullTreatment arm rate (0, 1); must be > control_rate
interim_fractionfloat0.5Information fraction at interim (0.1, 0.9)
n_max_factorfloat2.0Maximum inflation factor [1.0, 5.0]
cp_futilityfloat0.1Futility threshold [0, 1); must be < cp_promising_lower
cp_promising_lowerfloat0.3Promising zone lower bound (0, 1); must be < cp_promising_upper
cp_promising_upperfloat0.8Favorable zone lower bound (0, 1)
observed_effectfloat?nullObserved treatment effect (continuous)
observed_variancefloat?nullObserved pooled variance (continuous, >0)
observed_control_ratefloat?nullObserved control rate (binary, (0,1))
observed_treatment_ratefloat?nullObserved treatment rate (binary, (0,1))
hazard_ratiofloat?nullAssumed hazard ratio (survival, (0,2), ≠1)
median_controlfloat?nullMedian control survival in months (>0)
accrual_timefloat?nullAccrual period in months (>0)
follow_up_timefloat?nullFollow-up after accrual in months (≥0)
dropout_ratefloat0.0Annual dropout rate [0, 1)
allocation_ratiofloat1.0Randomization ratio treatment:control (>0, survival only)
observed_hazard_ratiofloat?nullObserved HR at interim (survival, >0)
simulateboolfalseEnable Monte Carlo simulation tier
simulation_seedint?nullSeed for reproducibility; auto-generated if omitted
n_simulationsint10000Number of simulations [1000, 100000]

Example Request (Continuous Endpoint)

{
  "endpoint_type": "continuous",
  "alpha": 0.025,
  "power": 0.90,
  "mean_difference": 0.3,
  "initial_variance": 1.0,
  "interim_fraction": 0.5,
  "n_max_factor": 2.0,
  "cp_futility": 0.1,
  "cp_promising_lower": 0.3,
  "cp_promising_upper": 0.8,
  "observed_effect": 0.2,
  "observed_variance": 1.1,
  "simulate": true,
  "n_simulations": 10000
}

Example Request (Survival Endpoint)

{
  "endpoint_type": "survival",
  "alpha": 0.025,
  "power": 0.90,
  "hazard_ratio": 0.7,
  "median_control": 12,
  "accrual_time": 24,
  "follow_up_time": 12,
  "allocation_ratio": 1.0,
  "interim_fraction": 0.5,
  "n_max_factor": 2.0,
  "observed_hazard_ratio": 0.75,
  "simulate": true,
  "n_simulations": 10000
}

Response Fields

FieldDescription
zoneClassified zone: favorable, promising, unfavorable, futility
z1Stage 1 z-statistic
conditional_powerCP under observed effect at current N
conditional_power_plannedCP under planned effect for comparison
recalculated_n_per_armAdjusted sample size per arm
combination_weight_1Stage 1 combination test weight
combination_weight_2Stage 2 combination test weight
adjusted_critical_valueCritical value for the combination test
recalculation_scenariosSensitivity table with 6 effect size scenarios
regulatory_notesContext-specific regulatory guidance
events_requiredRequired events dd from Schoenfeld formula (survival only)
event_probabilityPlanned weighted event probability (survival only)
observed_event_probabilityEvent probability used for N conversion (currently planned value; survival only)
initial_n_controlInitial control arm size (survival with allocation_ratio)
initial_n_treatmentInitial treatment arm size (survival with allocation_ratio)
recalculated_n_controlRecalculated control arm size (survival with allocation_ratio)
recalculated_n_treatmentRecalculated treatment arm size (survival with allocation_ratio)

12. Technical References

  1. Mehta CR, Pocock SJ. Adaptive increase in sample size when interim results are promising: a practical guide with examples. Statistics in Medicine. 2011;30(28):3267–3284.
  2. Chen YHJ, DeMets DL, Lan KKG. Increasing the sample size when the unblinded interim result is promising. Statistics in Medicine. 2004;23(7):1023–1038.
  3. Lehmacher W, Wassmer G. Adaptive sample size calculations in group sequential trials. Biometrics. 1999;55(4):1286–1290.
  4. Müller HH, Schäfer H. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches. Biometrics. 2001;57(3):886–891.
  5. FDA. Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry. 2019. Section IV.B.
  6. EMA. Reflection Paper on Methodological Issues in Confirmatory Clinical Trials Planned with an Adaptive Design. CHMP/EWP/2459/02. 2007.
  7. Cui L, Hung HMJ, Wang SJ. Modification of sample size in group sequential clinical trials. Biometrics. 1999;55(3):853–857.
  8. Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika. 1981;68(1):316–319.
  9. Gould AL. Interim analyses for monitoring clinical trials that do not materially affect the type I error rate. Statistics in Medicine. 1992;11(1):55–66.
  10. Friede T, et al. Blinded sample size re-estimation in event-driven clinical trials. Pharmaceutical Statistics. 2019;18(5):578–588.