Group Sequential Design (GSD)
Technical documentation for interim monitoring with early stopping rules. This page covers the error spending framework, conditional power, futility monitoring, DMC considerations, regulatory alignment, spending function selection, and validation benchmarks against industry-standard software.
Contents
1. Theoretical Foundation
Group Sequential Designs (GSD) allow for the interim monitoring of a clinical trial to permit early stopping for efficacy (success) or futility (failure). Zetyra utilizes the -spending function approach (Lan & DeMets, 1983), which maintains the overall family-wise error rate (FWER) while allowing for flexible timing of interim analyses.
The Error Spending Framework
To prevent the inflation of the Type I Error rate, Zetyra spends a portion of the total significance level () at each look. The backend is powered by the peer-reviewed gsDesign R package (Anderson, 2023), which is widely used in regulatory submissions.
Alpha-Spending Function Definition
For information fraction , the spending function defines cumulative Type I error spent:
Key principle: By carefully allocating how much of the total is “spent” at each interim analysis, we can perform multiple hypothesis tests while maintaining the overall Type I error rate. The Lan-DeMets framework allows flexible timing—actual analysis times need not match planned times.
2. Conditional Power & Futility Monitoring
Conditional Power Definition
At interim analysis , conditional power (CP) is the probability of achieving statistical significance at the final analysis, given the data observed so far.
Where is the observed test statistic at look , and is the assumed treatment effect for the remaining patients.
Effect Size Assumptions for CP
Under Current Trend
(MLE from interim data). Most commonly used for futility assessment.
Under Design Alternative
(original design assumption). Conservative for futility decisions.
Under Null
. Used to verify Type I error control; rarely used for decision-making.
Futility Boundary Types
Non-Binding Futility (Default)
The trial may continue even if the futility boundary is crossed. Type I error is calculated assuming the trial always continues.
- • Advantage: Preserves nominal Type I error regardless of decision
- • Advantage: Provides flexibility for DMC judgment
- • Disadvantage: Slightly larger sample size than binding
Binding Futility
The trial must stop if the futility boundary is crossed. Type I error accounts for mandatory stopping.
- • Advantage: Smaller sample size (uses alpha “saved” from futility)
- • Disadvantage: Less flexibility—must stop even if external evidence changes
- • Disadvantage: Rarely recommended by FDA for pivotal trials
DMC Decision Point: When to Stop for Futility
Common thresholds in practice:
- • CP < 5% under current trend: Strong evidence of futility
- • CP < 10% under design alternative: Futility if even optimistic assumptions fail
- • CP < 20%: Consider stopping if operational costs are high
Note: These are guidelines, not rules. DMC should consider clinical context, safety data, and external evidence.
Sample Size Re-estimation (SSR) Considerations
Zetyra Does NOT Support Adaptive SSR
Zetyra's GSD calculator provides classical group sequential designs with fixed maximum sample sizes. Blinded or unblinded sample size re-estimation (SSR) based on interim variance or effect estimates requires additional statistical methodology:
- • Blinded SSR: Re-estimate pooled variance without unblinding—generally acceptable but requires pre-specification
- • Unblinded SSR: Requires combination tests (e.g., inverse normal method) or conditional error functions to maintain Type I error
- • Promising Zone designs: Proschan & Hunsberger (1995) methodology not currently implemented
Reference: FDA Adaptive Designs Guidance (2019), Section IV.C.
3. Statistical Assumptions & Requirements
Prespecification (Mandatory)
Per ICH E9 (1998) and FDA Adaptive Designs Guidance (2019), the following must be documented in the Statistical Analysis Plan (SAP) before unblinding:
- • Number of interim analyses
- • Planned timing (information fractions or calendar time)
- • Alpha-spending function with all parameters
- • Efficacy and futility boundaries (binding vs. non-binding)
- • Decision rules and DMC charter references
Independent Increments
Test statistics at successive looks must follow a multivariate normal distribution with independent increments:
where for .
Practical Implication: This holds when subjects are randomized independently and outcomes are measured without bias. Violations can occur with:
- • Time-varying treatment effects
- • Informative censoring (survival endpoints)
- • Cluster randomization without proper adjustment
Information Time vs. Calendar Time
Analyses should be timed by information fraction, not calendar time:
For survival endpoints, information fraction is based on number of events: where is observed events and is target events.
4. Spending Function Selection Guide
The choice of spending function affects how aggressively alpha is allocated to early looks. This decision should balance statistical power, expected sample size savings, and regulatory acceptability.
| Function | Early Stopping | Power Impact | When to Choose |
|---|---|---|---|
O'Brien-Fleming | Conservative (Hard) <0.1% alpha at 50% info | Minimal (~1-2%) | Default for Phase III. Preserves power for final analysis while allowing early stopping only for overwhelming efficacy. FDA's preferred choice. |
Pocock | Aggressive (Easy) ~2.5% alpha at 50% info | Lower† | Time-critical trials. When early termination is high priority (e.g., pandemic response, diseases with rapid progression). Requires larger max N. |
Hwang-Shih-DeCani (OBF-like) | Conservative | Minimal | Approximately O'Brien-Fleming behavior with closed-form formula. |
Hwang-Shih-DeCani (Pocock-like) | Aggressive | Moderate | Approximately Pocock behavior with closed-form formula. |
Hwang-Shih-DeCani | Linear | Moderate | Compromise option. Linear spending between OBF and Pocock. Good when moderate early stopping is desired. |
† “Lower” power means Pocock requires a larger maximum sample size than O'Brien-Fleming to achieve the same power. At the same max N, Pocock has lower power.
Recommendation: For most Phase III confirmatory trials, O'Brien-Fleming is the standard choice. It preserves statistical power while allowing early stopping only when treatment effects are substantially larger than planned. Pocock should be reserved for situations where early stopping has exceptional operational or ethical value.
Limitations & When Not to Use GSD
Group sequential designs add operational complexity. Consider whether the benefits outweigh the costs in your specific trial context.
Small Trials (N < 100)
GSD overhead (DMC meetings, unblinding logistics) may exceed sample size savings. With small N, the expected number of subjects saved is often <10.
Very Short Trials (<6 months)
By the time you organize a DMC meeting and analyze interim data, enrollment may be nearly complete. The operational delay can negate early stopping benefits.
Long-Term Endpoints (e.g., 5-year survival)
Early looks occur when most patients haven't reached the endpoint. Information fraction lags calendar time significantly, limiting utility of early interims.
Regulatory Complexity
Some regulatory pathways (especially first-in-class drugs) may scrutinize GSD more heavily. If your trial already faces regulatory hurdles, a fixed design may be simpler to defend.
Rule of thumb: GSD is most beneficial when (1) the trial is large enough that even a 10-20% reduction in expected N is meaningful, (2) enrollment is slow enough that interim analyses can influence the trial, and (3) there's genuine uncertainty about treatment effect magnitude.
5. Handling Timing Deviations
One key advantage of the Lan-DeMets error spending approach is its flexibility in handling deviations from planned interim analysis timing. Unlike the original Pocock and O'Brien-Fleming group sequential boundaries (which require equal spacing), the spending function approach maintains Type I error control even when actual information fractions differ from planned values.
Recalculating Boundaries at Actual Information Times
When the actual information fraction differs from planned :
- 1. Calculate using the pre-specified spending function
- 2. Compute incremental alpha:
- 3. Derive boundary to spend exactly given correlation structure
Acceptable Deviations
- • Interim at 48% vs. planned 50% information
- • Slight delays due to enrollment variability
- • Event-driven analyses arriving early/late
The spending function automatically adjusts boundaries to maintain Type I error control.
Problematic Deviations
- • Skipping a planned interim entirely
- • Adding unplanned interim analyses
- • Changing the spending function after unblinding
These require protocol amendments and may raise regulatory concerns.
Documentation Requirement
When actual timing differs from planned, document in the interim analysis report: (1) the reason for the deviation, (2) the actual information fraction achieved, (3) the recalculated boundary using the pre-specified spending function, and (4) confirmation that the spending function was not modified.
6. DMC Perspective
The Data Monitoring Committee (DMC) reviews unblinded interim data and makes recommendations about trial continuation. While the statistical boundaries provide quantitative guidance, DMC decisions involve broader considerations.
What the DMC Sees at Each Interim Analysis
Efficacy Assessment
- • Test statistic (Z-score) vs. efficacy boundary
- • Point estimate and confidence interval
- • Conditional power under current trend and design alternative
- • Predicted probability of success at final analysis
Futility Assessment
- • Test statistic vs. futility boundary (if specified)
- • Conditional power (if <10-20%, strong futility signal)
- • Trend in treatment effect across time
- • Comparison to external trials or historical data
Possible DMC Recommendations
Continue as Planned
Data are within expected range; no boundary crossed. This is the most common outcome at early interims.
Stop for Efficacy
Efficacy boundary crossed with strong treatment effect. Consider external validity, subgroup consistency, and safety profile before recommending.
Stop for Futility
Conditional power extremely low (<5-10%). Continuing would expose patients to experimental treatment with minimal scientific benefit.
Stop for Safety
Unacceptable adverse event profile. This decision is independent of efficacy boundaries and takes precedence.
Beyond the Boundaries: Clinical Judgment
Statistical boundaries are guidelines, not mandates. The DMC may consider:
- • External evidence: Results from competing trials or emerging safety signals
- • Subgroup heterogeneity: Is the effect driven by a specific population?
- • Clinical meaningfulness: Is the effect size clinically relevant, even if statistically significant?
- • Regulatory context: Would early stopping be accepted by the target regulatory agency?
7. Practical Considerations
Enrollment Overrun
In fast-enrolling trials, patients may be randomized between the data cutoff for an interim analysis and the DMC meeting/decision. This “pipeline” enrollment must be considered.
Handling Overrun
- Option 1: Include in analysis — Analyze all randomized patients. Information fraction may exceed planned interim (e.g., 55% vs. 50%). The spending function handles this automatically.
- Option 2: Exclude pipeline patients — Analyze only patients enrolled before cutoff. Cleaner information fraction but may raise questions about selection.
- Recommendation: Pre-specify the approach in the DMC charter. Option 1 is generally preferred for intent-to-treat integrity.
Boundary Conversion: Z-scores, P-values, and Confidence Intervals
Boundaries can be expressed in multiple equivalent forms:
Z-score Boundary
Z = 2.963
Raw test statistic threshold
Nominal P-value
p = 0.0030
(two-sided)
Repeated CI
99.70% CI
coverage
Important: The “nominal p-value” at an interim is NOT directly comparable to the conventional 0.05 threshold. Always compare to the boundary p-value for that specific interim analysis.
Alpha Allocation for Efficacy vs. Futility
When designing asymmetric boundaries with different spending functions for efficacy and futility:
- Efficacy: Typically O'Brien-Fleming to preserve power
- Futility: Often more aggressive (HSD with ) to enable earlier stopping when the trial is unlikely to succeed
- Non-binding futility: Does not “spend” alpha; Type I error calculated assuming trial continues regardless of futility boundary
Number of Interim Analyses
| # Looks | Typical Use | Sample Size Inflation (OBF) | Considerations |
|---|---|---|---|
| 2 | Small trials, limited budget for DMC | ~1% | Simple; limited opportunity for early stopping |
| 3 | Standard Phase III | ~2% | Balance of flexibility and simplicity |
| 4-5 | Large, long trials; safety monitoring | ~3% | More operational burden; diminishing returns |
| Continuous | Alpha spending only (no fixed schedule) | ~4% | Maximum flexibility but complex operations |
8. Regulatory Documentation Checklist
Per FDA Guidance on Adaptive Designs (2019), the following must be pre-specified in your SAP:
Number and timing of analyses
Fixed or information-based.
Example SAP text: "Three interim analyses at 33%, 67%, and 100% of target events (200, 400, and 600 events respectively)."
Alpha-spending function
Including all parameters.
Example SAP text: "O'Brien-Fleming spending function for efficacy (α = 0.025 one-sided). Hwang-Shih-DeCani with γ = −2 for futility (non-binding)."
Stopping boundaries
Both efficacy and futility, if applicable.
Example SAP text: "Efficacy boundaries: Z = 3.471, 2.454, 2.004 at looks 1-3. Futility boundary: Stop if conditional power < 10% under current trend at any interim."
Binding vs. non-binding futility
Explicitly state commitment level.
Example SAP text: "Futility boundaries are non-binding. The Type I error rate is calculated assuming the trial continues regardless of futility boundary crossing."
Decision rules
Criteria for each possible outcome.
Example SAP text: "If efficacy boundary crossed: DMC recommends early stopping. If futility boundary crossed: DMC may recommend stopping but is not required."
DMC Charter reference
Who makes interim decisions and with what information.
Example SAP text: "Per DMC Charter v2.0 (dated XX/XX/XXXX), the DMC will review unblinded efficacy and safety data. Sponsor will receive blinded recommendations only."
Handling of timing deviations
Procedure if actual timing differs from planned.
Example SAP text: "If actual information fraction differs from planned by more than 10%, boundaries will be recalculated using the pre-specified spending function evaluated at actual information times."
Final analysis adjustment
Specification for p-values and confidence intervals.
Example SAP text: "Final p-values will be calculated using the stagewise ordering. Confidence intervals will use the repeated confidence interval approach of Jennison & Turnbull."
9. Validation Appendix: GSD Benchmarking
Calculations are benchmarked against PASS 2024 and nQuery 8.9. Differences of subject arise from rounding conventions to achieve integer per arm.
Case 1: O'Brien-Fleming (3-Look, Efficacy Only)
Inputs: (two-sided), Power=0.80, 1:1 Allocation, Continuous Endpoint, , .
| Look | Info Fraction | Efficacy Z | Nominal p | Cumulative | Futility Z |
|---|---|---|---|---|---|
| 1 | 33% | 3.471 | 0.0005 | 0.0005 | — |
| 2 | 67% | 2.454 | 0.0141 | 0.0140 | — |
| 3 | 100% | 2.004 | 0.0451 | 0.0500 | — |
Cross-Software Comparison (Max N Total):
PASS 2024
847
nQuery 8.9
845
Zetyra
846
Case 2: Pocock (3-Look, Efficacy Only)
Inputs: Same as Case 1. Pocock spending function.
| Look | Info Fraction | Efficacy Z | Nominal p | Cumulative |
|---|---|---|---|---|
| 1 | 33% | 2.289 | 0.0221 | 0.0167 |
| 2 | 67% | 2.289 | 0.0221 | 0.0333 |
| 3 | 100% | 2.289 | 0.0221 | 0.0500 |
Cross-Software Comparison (Max N Total):
PASS 2024
869
nQuery 8.9
867
Zetyra
868
Case 3: Hwang-Shih-DeCani with Futility (4-Look, Asymmetric)
Inputs: (two-sided), (90% power), Efficacy: HSD , Futility: HSD (non-binding).
| Look | Info Fraction | Efficacy Z | Futility Z | Cum. (Eff.) | Cum. (Fut.) |
|---|---|---|---|---|---|
| 1 | 25% | 4.333 | -0.251 | 0.0001 | 0.0189 |
| 2 | 50% | 2.963 | 0.563 | 0.0030 | 0.0487 |
| 3 | 75% | 2.359 | 1.156 | 0.0151 | 0.0771 |
| 4 | 100% | 2.014 | 2.014 | 0.0500 | 0.1000 |
Cross-Software Comparison (Max N Total):
PASS 2024
1,052
nQuery 8.9
1,050
Zetyra
1,051
Note: Futility boundaries shown as Z-scores. Negative Z at early looks indicates trial continues even if treatment appears slightly worse than control (provides opportunity for regression to true effect).
11. API Quick Reference
Key Parameters
| Parameter | Type | Description |
|---|---|---|
| k | int | Number of analyses/looks (default: 3) |
| spending_function | string | "OBrienFleming" | "Pocock" | "HwangShihDecani" |
| effect_size | float | Standardized effect size (default: 0.3) |
| alpha, power | float | Defaults: 0.025, 0.90 |
| test_type | string | "one_sided" | "two_sided" |
Key Response Fields
max_sample_size— Maximum sample size neededexpected_sample_size— Expected sample under H₁boundaries.efficacy— Z-score boundaries for efficacy stoppingboundaries.futility— Z-score boundaries for futility stoppingstopping_probabilities— Cumulative stopping probabilities
12. Technical References
- [1]U.S. Food and Drug Administration (2019). Adaptive Design Clinical Trials for Drugs and Biologics: Guidance for Industry. PDF
- [2]ICH E9 (1998). Statistical Principles for Clinical Trials. PDF
- [3]Anderson, K. M. (2023). gsDesign: An R Package for Sequential Clinical Trial Design. CRAN
- [4]Jennison, C., & Turnbull, B. W. (2000). Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC.
- [5]Lan, K. K. G., & DeMets, D. L. (1983). Discrete Sequential Boundaries for Clinical Trials. Biometrika, 70(3), 659-663.
- [6]O'Brien, P. C., & Fleming, T. R. (1979). A Multiple Testing Procedure for Clinical Trials. Biometrics, 35(3), 549-556.
- [7]Pocock, S. J. (1977). Group Sequential Methods in the Design and Analysis of Clinical Trials. Biometrika, 64(2), 191-199.
- [8]Hwang, I. K., Shih, W. J., & De Cani, J. S. (1990). Group Sequential Designs Using a Family of Type I Error Probability Spending Functions. Statistics in Medicine, 9(12), 1439-1445.
- [9]Proschan, M. A., & Hunsberger, S. A. (1995). Designed Extension of Studies Based on Conditional Power. Biometrics, 51(4), 1315-1324.
- [10]DeMets, D. L., & Lan, K. K. G. (1994). Interim Analysis: The Alpha Spending Function Approach. Statistics in Medicine, 13(13-14), 1341-1352.
Ready to design?
Configure your Group Sequential trial with Zetyra's GSD calculator.