Docs/Group Sequential Design

Group Sequential Design (GSD)

Technical documentation for interim monitoring with early stopping rules. This page covers the error spending framework, conditional power, futility monitoring, DMC considerations, regulatory alignment, spending function selection, and validation benchmarks against industry-standard software.

1. Theoretical Foundation

Group Sequential Designs (GSD) allow for the interim monitoring of a clinical trial to permit early stopping for efficacy (success) or futility (failure). Zetyra utilizes the α\alpha-spending function approach (Lan & DeMets, 1983), which maintains the overall family-wise error rate (FWER) while allowing for flexible timing of interim analyses.

The Error Spending Framework

To prevent the inflation of the Type I Error rate, Zetyra spends a portion of the total significance level (α\alpha) at each look. The backend is powered by the peer-reviewed gsDesign R package (Anderson, 2023), which is widely used in regulatory submissions.

Alpha-Spending Function Definition

For information fraction t[0,1]t \in [0, 1], the spending functionα(t)\alpha^*(t) defines cumulative Type I error spent:

α(t)={22Φ(zα/2/t)O’Brien-FlemingαtPocockα1eγt1eγHwang-Shih-DeCani\alpha^*(t) = \begin{cases} 2 - 2\Phi(z_{\alpha/2}/\sqrt{t}) & \text{O'Brien-Fleming} \\ \alpha \cdot t & \text{Pocock} \\ \alpha \cdot \frac{1 - e^{-\gamma t}}{1 - e^{-\gamma}} & \text{Hwang-Shih-DeCani} \end{cases}

Key principle: By carefully allocating how much of the total α=0.05\alpha = 0.05 is “spent” at each interim analysis, we can perform multiple hypothesis tests while maintaining the overall Type I error rate. The Lan-DeMets framework allows flexible timing—actual analysis times need not match planned times.

2. Conditional Power & Futility Monitoring

Conditional Power Definition

At interim analysis kk, conditional power (CP) is the probability of achieving statistical significance at the final analysis, given the data observed so far.

CPk(θ)=P(ZK>z1αKZk=zk,θ)CP_k(\theta) = P\left(Z_K > z_{1-\alpha_K} \mid Z_k = z_k, \theta\right)

Where ZkZ_k is the observed test statistic at look kk, and θ\theta is the assumed treatment effect for the remaining patients.

Effect Size Assumptions for CP

Under Current Trend

θ=θ^k\theta = \hat{\theta}_k (MLE from interim data). Most commonly used for futility assessment.

Under Design Alternative

θ=θ1\theta = \theta_1 (original design assumption). Conservative for futility decisions.

Under Null

θ=0\theta = 0. Used to verify Type I error control; rarely used for decision-making.

Futility Boundary Types

Non-Binding Futility (Default)

The trial may continue even if the futility boundary is crossed. Type I error is calculated assuming the trial always continues.

  • Advantage: Preserves nominal Type I error regardless of decision
  • Advantage: Provides flexibility for DMC judgment
  • Disadvantage: Slightly larger sample size than binding

Binding Futility

The trial must stop if the futility boundary is crossed. Type I error accounts for mandatory stopping.

  • Advantage: Smaller sample size (uses alpha “saved” from futility)
  • Disadvantage: Less flexibility—must stop even if external evidence changes
  • Disadvantage: Rarely recommended by FDA for pivotal trials

DMC Decision Point: When to Stop for Futility

Common thresholds in practice:

  • CP < 5% under current trend: Strong evidence of futility
  • CP < 10% under design alternative: Futility if even optimistic assumptions fail
  • CP < 20%: Consider stopping if operational costs are high

Note: These are guidelines, not rules. DMC should consider clinical context, safety data, and external evidence.

Sample Size Re-estimation (SSR) Considerations

Zetyra Does NOT Support Adaptive SSR

Zetyra's GSD calculator provides classical group sequential designs with fixed maximum sample sizes. Blinded or unblinded sample size re-estimation (SSR) based on interim variance or effect estimates requires additional statistical methodology:

  • Blinded SSR: Re-estimate pooled variance without unblinding—generally acceptable but requires pre-specification
  • Unblinded SSR: Requires combination tests (e.g., inverse normal method) or conditional error functions to maintain Type I error
  • Promising Zone designs: Proschan & Hunsberger (1995) methodology not currently implemented

Reference: FDA Adaptive Designs Guidance (2019), Section IV.C.

3. Statistical Assumptions & Requirements

Prespecification (Mandatory)

Per ICH E9 (1998) and FDA Adaptive Designs Guidance (2019), the following must be documented in the Statistical Analysis Plan (SAP) before unblinding:

  • • Number of interim analyses
  • • Planned timing (information fractions or calendar time)
  • • Alpha-spending function with all parameters
  • • Efficacy and futility boundaries (binding vs. non-binding)
  • • Decision rules and DMC charter references

Independent Increments

Test statistics at successive looks must follow a multivariate normal distribution with independent increments:

(Z1,Z2,,ZK)N(θ(I1,I2,,IK),Σ)(Z_1, Z_2, \ldots, Z_K) \sim N\left(\theta \cdot (\sqrt{I_1}, \sqrt{I_2}, \ldots, \sqrt{I_K}), \Sigma\right)

where Σjk=Ij/Ik\Sigma_{jk} = \sqrt{I_j / I_k} for jkj \leq k.

Practical Implication: This holds when subjects are randomized independently and outcomes are measured without bias. Violations can occur with:

  • • Time-varying treatment effects
  • • Informative censoring (survival endpoints)
  • • Cluster randomization without proper adjustment

Information Time vs. Calendar Time

Analyses should be timed by information fraction, not calendar time:

tk=IkIK=nkN (for continuous endpoints)t_k = \frac{I_k}{I_K} = \frac{n_k}{N} \text{ (for continuous endpoints)}

For survival endpoints, information fraction is based on number of events:tk=dk/Dt_k = d_k / D where dkd_k is observed events and DD is target events.

4. Spending Function Selection Guide

The choice of spending function affects how aggressively alpha is allocated to early looks. This decision should balance statistical power, expected sample size savings, and regulatory acceptability.

FunctionEarly StoppingPower ImpactWhen to Choose

O'Brien-Fleming

α(t)=22Φ(zα/2/t)\alpha^*(t) = 2 - 2\Phi(z_{\alpha/2}/\sqrt{t})

Conservative (Hard)

<0.1% alpha at 50% info

Minimal (~1-2%)Default for Phase III. Preserves power for final analysis while allowing early stopping only for overwhelming efficacy. FDA's preferred choice.

Pocock

α(t)=αt\alpha^*(t) = \alpha \cdot t

Aggressive (Easy)

~2.5% alpha at 50% info

LowerTime-critical trials. When early termination is high priority (e.g., pandemic response, diseases with rapid progression). Requires larger max N.

Hwang-Shih-DeCani

γ=4\gamma = -4 (OBF-like)

ConservativeMinimalApproximately O'Brien-Fleming behavior with closed-form formula.

Hwang-Shih-DeCani

γ=1\gamma = 1 (Pocock-like)

AggressiveModerateApproximately Pocock behavior with closed-form formula.

Hwang-Shih-DeCani

γ=0\gamma = 0

LinearModerateCompromise option. Linear spending between OBF and Pocock. Good when moderate early stopping is desired.

“Lower” power means Pocock requires a larger maximum sample size than O'Brien-Fleming to achieve the same power. At the same max N, Pocock has lower power.

Recommendation: For most Phase III confirmatory trials, O'Brien-Fleming is the standard choice. It preserves statistical power while allowing early stopping only when treatment effects are substantially larger than planned. Pocock should be reserved for situations where early stopping has exceptional operational or ethical value.

Limitations & When Not to Use GSD

Group sequential designs add operational complexity. Consider whether the benefits outweigh the costs in your specific trial context.

Small Trials (N < 100)

GSD overhead (DMC meetings, unblinding logistics) may exceed sample size savings. With small N, the expected number of subjects saved is often <10.

Very Short Trials (<6 months)

By the time you organize a DMC meeting and analyze interim data, enrollment may be nearly complete. The operational delay can negate early stopping benefits.

Long-Term Endpoints (e.g., 5-year survival)

Early looks occur when most patients haven't reached the endpoint. Information fraction lags calendar time significantly, limiting utility of early interims.

Regulatory Complexity

Some regulatory pathways (especially first-in-class drugs) may scrutinize GSD more heavily. If your trial already faces regulatory hurdles, a fixed design may be simpler to defend.

Rule of thumb: GSD is most beneficial when (1) the trial is large enough that even a 10-20% reduction in expected N is meaningful, (2) enrollment is slow enough that interim analyses can influence the trial, and (3) there's genuine uncertainty about treatment effect magnitude.

5. Handling Timing Deviations

One key advantage of the Lan-DeMets error spending approach is its flexibility in handling deviations from planned interim analysis timing. Unlike the original Pocock and O'Brien-Fleming group sequential boundaries (which require equal spacing), the spending function approach maintains Type I error control even when actual information fractions differ from planned values.

Recalculating Boundaries at Actual Information Times

When the actual information fraction tkactualt_k^{actual} differs from planned tkplannedt_k^{planned}:

  1. 1. Calculate α(tkactual)\alpha^*(t_k^{actual}) using the pre-specified spending function
  2. 2. Compute incremental alpha: αk=α(tkactual)α(tk1actual)\alpha_k = \alpha^*(t_k^{actual}) - \alpha^*(t_{k-1}^{actual})
  3. 3. Derive boundary zkz_k to spend exactly αk\alpha_k given correlation structure

Acceptable Deviations

  • • Interim at 48% vs. planned 50% information
  • • Slight delays due to enrollment variability
  • • Event-driven analyses arriving early/late

The spending function automatically adjusts boundaries to maintain Type I error control.

Problematic Deviations

  • • Skipping a planned interim entirely
  • • Adding unplanned interim analyses
  • • Changing the spending function after unblinding

These require protocol amendments and may raise regulatory concerns.

Documentation Requirement

When actual timing differs from planned, document in the interim analysis report: (1) the reason for the deviation, (2) the actual information fraction achieved, (3) the recalculated boundary using the pre-specified spending function, and (4) confirmation that the spending function was not modified.

6. DMC Perspective

The Data Monitoring Committee (DMC) reviews unblinded interim data and makes recommendations about trial continuation. While the statistical boundaries provide quantitative guidance, DMC decisions involve broader considerations.

What the DMC Sees at Each Interim Analysis

Efficacy Assessment

  • • Test statistic (Z-score) vs. efficacy boundary
  • • Point estimate and confidence interval
  • • Conditional power under current trend and design alternative
  • • Predicted probability of success at final analysis

Futility Assessment

  • • Test statistic vs. futility boundary (if specified)
  • • Conditional power (if <10-20%, strong futility signal)
  • • Trend in treatment effect across time
  • • Comparison to external trials or historical data

Possible DMC Recommendations

Continue as Planned

Data are within expected range; no boundary crossed. This is the most common outcome at early interims.

Stop for Efficacy

Efficacy boundary crossed with strong treatment effect. Consider external validity, subgroup consistency, and safety profile before recommending.

Stop for Futility

Conditional power extremely low (<5-10%). Continuing would expose patients to experimental treatment with minimal scientific benefit.

Stop for Safety

Unacceptable adverse event profile. This decision is independent of efficacy boundaries and takes precedence.

Beyond the Boundaries: Clinical Judgment

Statistical boundaries are guidelines, not mandates. The DMC may consider:

  • External evidence: Results from competing trials or emerging safety signals
  • Subgroup heterogeneity: Is the effect driven by a specific population?
  • Clinical meaningfulness: Is the effect size clinically relevant, even if statistically significant?
  • Regulatory context: Would early stopping be accepted by the target regulatory agency?

7. Practical Considerations

Enrollment Overrun

In fast-enrolling trials, patients may be randomized between the data cutoff for an interim analysis and the DMC meeting/decision. This “pipeline” enrollment must be considered.

Handling Overrun

  • Option 1: Include in analysis — Analyze all randomized patients. Information fraction may exceed planned interim (e.g., 55% vs. 50%). The spending function handles this automatically.
  • Option 2: Exclude pipeline patients — Analyze only patients enrolled before cutoff. Cleaner information fraction but may raise questions about selection.
  • Recommendation: Pre-specify the approach in the DMC charter. Option 1 is generally preferred for intent-to-treat integrity.

Boundary Conversion: Z-scores, P-values, and Confidence Intervals

Boundaries can be expressed in multiple equivalent forms:

Z-score Boundary

Z = 2.963

Raw test statistic threshold

Nominal P-value

p = 0.0030

p=2(1Φ(z))p = 2(1 - \Phi(z)) (two-sided)

Repeated CI

99.70% CI

(1p)×100%(1 - p) \times 100\% coverage

Important: The “nominal p-value” at an interim is NOT directly comparable to the conventional 0.05 threshold. Always compare to the boundary p-value for that specific interim analysis.

Alpha Allocation for Efficacy vs. Futility

When designing asymmetric boundaries with different spending functions for efficacy and futility:

  • Efficacy: Typically O'Brien-Fleming to preserve power
  • Futility: Often more aggressive (HSD with γ=2\gamma = -2) to enable earlier stopping when the trial is unlikely to succeed
  • Non-binding futility: Does not “spend” alpha; Type I error calculated assuming trial continues regardless of futility boundary

Number of Interim Analyses

# LooksTypical UseSample Size Inflation (OBF)Considerations
2Small trials, limited budget for DMC~1%Simple; limited opportunity for early stopping
3Standard Phase III~2%Balance of flexibility and simplicity
4-5Large, long trials; safety monitoring~3%More operational burden; diminishing returns
ContinuousAlpha spending only (no fixed schedule)~4%Maximum flexibility but complex operations

8. Regulatory Documentation Checklist

Per FDA Guidance on Adaptive Designs (2019), the following must be pre-specified in your SAP:

Number and timing of analyses

Fixed or information-based.

Example SAP text: "Three interim analyses at 33%, 67%, and 100% of target events (200, 400, and 600 events respectively)."

Alpha-spending function

Including all parameters.

Example SAP text: "O'Brien-Fleming spending function for efficacy (α = 0.025 one-sided). Hwang-Shih-DeCani with γ = −2 for futility (non-binding)."

Stopping boundaries

Both efficacy and futility, if applicable.

Example SAP text: "Efficacy boundaries: Z = 3.471, 2.454, 2.004 at looks 1-3. Futility boundary: Stop if conditional power < 10% under current trend at any interim."

Binding vs. non-binding futility

Explicitly state commitment level.

Example SAP text: "Futility boundaries are non-binding. The Type I error rate is calculated assuming the trial continues regardless of futility boundary crossing."

Decision rules

Criteria for each possible outcome.

Example SAP text: "If efficacy boundary crossed: DMC recommends early stopping. If futility boundary crossed: DMC may recommend stopping but is not required."

DMC Charter reference

Who makes interim decisions and with what information.

Example SAP text: "Per DMC Charter v2.0 (dated XX/XX/XXXX), the DMC will review unblinded efficacy and safety data. Sponsor will receive blinded recommendations only."

Handling of timing deviations

Procedure if actual timing differs from planned.

Example SAP text: "If actual information fraction differs from planned by more than 10%, boundaries will be recalculated using the pre-specified spending function evaluated at actual information times."

Final analysis adjustment

Specification for p-values and confidence intervals.

Example SAP text: "Final p-values will be calculated using the stagewise ordering. Confidence intervals will use the repeated confidence interval approach of Jennison & Turnbull."

9. Validation Appendix: GSD Benchmarking

Calculations are benchmarked against PASS 2024 and nQuery 8.9. Differences of ±1\pm 1 subject arise from rounding conventions to achieve integer NN per arm.

Case 1: O'Brien-Fleming (3-Look, Efficacy Only)

Inputs: α=0.05\alpha=0.05 (two-sided), Power=0.80, 1:1 Allocation, Continuous Endpoint, δ=0.25\delta=0.25, σ=1.0\sigma=1.0.

LookInfo FractionEfficacy ZNominal pCumulative α\alphaFutility Z
133%3.4710.00050.0005
267%2.4540.01410.0140
3100%2.0040.04510.0500

Cross-Software Comparison (Max N Total):

PASS 2024

847

nQuery 8.9

845

Zetyra

846

Case 2: Pocock (3-Look, Efficacy Only)

Inputs: Same as Case 1. Pocock spending function.

LookInfo FractionEfficacy ZNominal pCumulative α\alpha
133%2.2890.02210.0167
267%2.2890.02210.0333
3100%2.2890.02210.0500

Cross-Software Comparison (Max N Total):

PASS 2024

869

nQuery 8.9

867

Zetyra

868

Case 3: Hwang-Shih-DeCani with Futility (4-Look, Asymmetric)

Inputs: α=0.05\alpha=0.05 (two-sided), β=0.10\beta=0.10 (90% power), Efficacy: HSD γ=4\gamma=-4, Futility: HSD γ=2\gamma=-2 (non-binding).

LookInfo FractionEfficacy ZFutility ZCum. α\alpha (Eff.)Cum. β\beta (Fut.)
125%4.333-0.2510.00010.0189
250%2.9630.5630.00300.0487
375%2.3591.1560.01510.0771
4100%2.0142.0140.05000.1000

Cross-Software Comparison (Max N Total):

PASS 2024

1,052

nQuery 8.9

1,050

Zetyra

1,051

Note: Futility boundaries shown as Z-scores. Negative Z at early looks indicates trial continues even if treatment appears slightly worse than control (provides opportunity for regression to true effect).

11. API Quick Reference

POST /api/v1/calculators/gsd

Key Parameters

ParameterTypeDescription
kintNumber of analyses/looks (default: 3)
spending_functionstring"OBrienFleming" | "Pocock" | "HwangShihDecani"
effect_sizefloatStandardized effect size (default: 0.3)
alpha, powerfloatDefaults: 0.025, 0.90
test_typestring"one_sided" | "two_sided"

Key Response Fields

  • max_sample_size — Maximum sample size needed
  • expected_sample_size — Expected sample under H₁
  • boundaries.efficacy — Z-score boundaries for efficacy stopping
  • boundaries.futility — Z-score boundaries for futility stopping
  • stopping_probabilities — Cumulative stopping probabilities
View full API documentation →

12. Technical References

  1. [1]U.S. Food and Drug Administration (2019). Adaptive Design Clinical Trials for Drugs and Biologics: Guidance for Industry. PDF
  2. [2]ICH E9 (1998). Statistical Principles for Clinical Trials. PDF
  3. [3]Anderson, K. M. (2023). gsDesign: An R Package for Sequential Clinical Trial Design. CRAN
  4. [4]Jennison, C., & Turnbull, B. W. (2000). Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC.
  5. [5]Lan, K. K. G., & DeMets, D. L. (1983). Discrete Sequential Boundaries for Clinical Trials. Biometrika, 70(3), 659-663.
  6. [6]O'Brien, P. C., & Fleming, T. R. (1979). A Multiple Testing Procedure for Clinical Trials. Biometrics, 35(3), 549-556.
  7. [7]Pocock, S. J. (1977). Group Sequential Methods in the Design and Analysis of Clinical Trials. Biometrika, 64(2), 191-199.
  8. [8]Hwang, I. K., Shih, W. J., & De Cani, J. S. (1990). Group Sequential Designs Using a Family of Type I Error Probability Spending Functions. Statistics in Medicine, 9(12), 1439-1445.
  9. [9]Proschan, M. A., & Hunsberger, S. A. (1995). Designed Extension of Studies Based on Conditional Power. Biometrics, 51(4), 1315-1324.
  10. [10]DeMets, D. L., & Lan, K. K. G. (1994). Interim Analysis: The Alpha Spending Function Approach. Statistics in Medicine, 13(13-14), 1341-1352.

Ready to design?

Configure your Group Sequential trial with Zetyra's GSD calculator.

Open GSD Calculator