Umbrella Trial Design
One disease, multiple biomarker-defined sub-studies, each testing a different treatment against a shared control arm. Frequentist and Bayesian analysis with binary, continuous, and survival endpoints and Monte Carlo operating characteristics.
Contents
1. Overview & Motivation
An umbrella trial enrolls patients with a single disease (typically one tumor type) and stratifies them into biomarker-defined sub-studies, each testing a different targeted treatment against a shared control arm. The central question is whether each biomarker-matched therapy improves outcomes relative to the common standard of care within that disease population.
Biomarker Stratification
Patients are screened for a panel of molecular biomarkers at enrollment. Each biomarker-positive subgroup is assigned to a treatment tailored to that alteration. This ensures the right drug reaches the right patient subpopulation, guided by the molecular profile of the disease.
Shared Control Benefit
All sub-studies share a single control arm drawn from the same disease population. This reduces total enrollment compared to running independent trials for each biomarker-treatment pair, while maintaining concurrent controls for each comparison.
Key distinction: An umbrella trial tests many treatments within one disease, stratified by biomarker. A basket trial tests one treatment across many indications sharing a molecular target. A platform trial adds or drops arms adaptively over time. This calculator addresses the umbrella design specifically.
Landmark Examples
- •LUNG-MAP (S1400): Squamous cell lung cancer patients screened for multiple genomic alterations, each assigned to a matched targeted therapy vs. standard docetaxel. A pioneering public-private partnership demonstrating the umbrella paradigm at scale.
- •NCI-MATCH (EAY131): While primarily a basket trial, certain disease-specific cohorts within MATCH operate as umbrella sub-studies, screening for multiple actionable mutations within a given histology.
- •BATTLE (2011): Biomarker-integrated Approaches of Targeted Therapy for Lung Cancer Elimination. One of the first adaptive umbrella trials in NSCLC, randomizing patients to matched therapies based on real-time tumor profiling.
- •plasmaMATCH (2020): Breast cancer umbrella trial using ctDNA-based biomarker selection to allocate patients to matched targeted therapies, demonstrating feasibility of liquid biopsy-guided umbrella designs.
When to Use an Umbrella Trial
- •Single disease with molecular heterogeneity: The disease population harbors multiple distinct biomarker subgroups, each with a biologically plausible targeted therapy candidate.
- •Shared standard of care: All subgroups have the same control treatment (e.g., standard chemotherapy), making a pooled control arm scientifically defensible.
- •Efficient screening infrastructure: A centralized molecular screening platform can assign patients to sub-studies at enrollment, enabling real-time biomarker-driven randomization.
- •Regulatory efficiency: A single master protocol supports multiple treatment-biomarker evaluations under one IND, with shared infrastructure for data monitoring, safety reporting, and regulatory interactions.
3. Frequentist Analysis
Per-Sub-Study Hypothesis Testing
Each sub-study tests its own null hypothesis against a one-sided alternative. The test statistic and rejection criterion depend on the endpoint type.
Binary Endpoint
For binary outcomes, a two-sample z-test compares the treatment response rate to the shared control rate :
where is the pooled proportion under the null, and are the number of responders in the treatment and control arms, and , are the respective sample sizes.
Continuous Endpoint
For continuous outcomes, a two-sample t-test (or z-test with known common standard deviation ) is used:
where and are the sample means for treatment arm and the shared control, respectively.
Survival Endpoint
For time-to-event outcomes under proportional hazards, the log-rank z-statistic for sub-study is approximated via the Schoenfeld formula:
where is the number of events in sub-study (treatment + control) and is the estimated hazard ratio. The required events per sub-study are computed from the Schoenfeld formula:
where is the multiplicity-adjusted significance level for sub-study .
Multiplicity Correction
Because sub-studies are tested simultaneously against a shared control, multiplicity adjustment is needed to control the family-wise error rate (FWER). Three options are available:
Bonferroni Correction
The simplest approach divides the overall significance level equally across all sub-studies:
This strongly controls FWER at level regardless of the correlation structure among test statistics (which arises from the shared control). It is conservative when sub-studies are positively correlated, which they are in umbrella trials due to the common control arm.
Holm Step-Down
The Holm procedure is a uniformly more powerful alternative to Bonferroni. Order the p-values as :
Testing proceeds sequentially from the smallest p-value. The first time a p-value fails the threshold, that hypothesis and all remaining hypotheses are retained. Holm controls FWER at level and is never less powerful than Bonferroni.
No Correction
Each sub-study is tested at the unadjusted level. This does not control FWER but maximizes per-sub-study power. Appropriate when each sub-study is viewed as a separate confirmatory question rather than a family of hypotheses.
Dunnett adjustment: For umbrella trials with a shared control, the Dunnett procedure exploits the known correlation structure among test statistics and is more powerful than Bonferroni or Holm. This calculator uses Bonferroni/Holm as conservative alternatives; Dunnett adjustment may be implemented in future versions.
4. Bayesian Analysis
The Bayesian analysis computes the posterior probability that treatment is superior to control, using conjugate or asymptotic posteriors depending on the endpoint type. A “Go” decision is declared when this posterior exceeds the decision threshold (default 0.975).
Binary Endpoint: Beta-Binomial Conjugate
For binary outcomes, each arm receives an independent Beta prior. The posterior for treatment arm and the shared control are:
where are the shared Beta prior hyperparameters (default: , i.e., uniform prior). The decision criterion is:
This probability is computed analytically via numerical integration of the product of Beta densities, or equivalently through the regularized incomplete beta function.
Continuous Endpoint: Normal-Normal Conjugate
For continuous outcomes with known common variance , the posterior for each arm mean is:
The posterior on the treatment difference is also normal:
The decision criterion reduces to checking whether the z-score of the posterior mean difference exceeds .
Survival Endpoint: Asymptotic Posterior
For survival outcomes, the posterior is based on the asymptotic normal approximation to the log-hazard ratio:
where is the observed event count. The decision criterion is , which translates to checking whether .
Decision threshold: The default provides a Bayesian analog of the one-sided 0.025 significance level. Higher values (e.g., 0.99) yield more conservative decisions; lower values (e.g., 0.95) are more permissive. Unlike the frequentist approach, the Bayesian decision does not inherently control FWER, though simulation can quantify the operating error rates.
5. Simulation Algorithm
Monte Carlo Operating Characteristics
The umbrella trial simulator estimates operating characteristics by repeating the full design—enrollment, biomarker assignment, shared-control randomization, per-sub-study analysis, and multiplicity correction—across many simulated datasets under both null and alternative scenarios.
Specify the truth
Define the true treatment effect for each sub-study. For binary endpoints, set and . Sub-studies where the treatment effect equals the null are truly inactive; sub-studies with a non-null effect are truly active.
Assign biomarker subgroups
For each simulated trial, generate patients and assign each to a biomarker subgroup according to the prevalence vector . Patients with none of the target biomarkers are assigned to the shared control arm with probability .
Randomize with shared control
Within each biomarker subgroup, randomize patients to treatment or shared control according to the allocation ratio. The control arm pools patients across all sub-studies to form the common comparator.
Generate outcomes
Simulate endpoint data under the true parameters. For binary: draw and . For continuous: draw from . For survival: generate exponential event times with censoring.
Apply per-sub-study test
Run the selected analysis method (frequentist or Bayesian) for each sub-study independently, comparing each treatment arm to the shared control. Compute the test statistic or posterior probability for each sub-study.
Apply multiplicity correction
For frequentist analysis, apply the chosen multiplicity method (Bonferroni, Holm, or none) to the set of p-values. For Bayesian analysis, compare each posterior probability directly against the decision threshold (no explicit multiplicity correction is applied, though the threshold can be calibrated via simulation).
Aggregate metrics
Over all simulations, compute per-sub-study power (for active sub-studies), per-sub-study Type I error (for null sub-studies), FWER, mean number of Go decisions, and total sample size summaries.
Reproducibility: When a seed is provided, each simulation uses a deterministic RNG chain. The engine stores an input_hash (SHA-256 of all parameters) to verify that repeated runs produce identical results.
6. Operating Characteristics
When simulation is enabled, the calculator computes the following metrics across all Monte Carlo replicates:
| Metric | Description |
|---|---|
| per_substudy_power | Proportion of simulations where a truly active sub-study receives a Go decision |
| per_substudy_type1_error | Proportion of simulations where a truly null sub-study receives a false Go decision |
| fwer | Family-wise error rate: P(at least one false Go among null sub-studies) |
| mean_go_decisions | Expected number of Go decisions per trial across all sub-studies |
| mean_correct_go | Expected number of true-positive Go decisions per trial |
| control_n | Average shared control arm size across simulations |
Interpretation Guidance
- •FWER control: With Bonferroni or Holm correction, the frequentist analysis controls FWER at the nominal level. The Bayesian analysis does not explicitly control FWER; the simulation-based FWER should be examined to calibrate the decision threshold.
- •Power-multiplicity tradeoff: Stricter multiplicity correction (Bonferroni) reduces per-sub-study power. Holm provides a modest improvement. No correction maximizes per-sub-study power at the cost of inflated FWER. The Bayesian approach sidesteps formal multiplicity but requires threshold calibration.
- •Scenario sensitivity: Results depend on which sub-studies are truly active vs. null, and on the biomarker prevalence distribution. Always simulate the global null (all sub-studies inactive) to verify FWER, and the global alternative (all active) to assess overall power.
- •Shared control effect: When the shared control arm is large (high or many sub-studies), the control rate estimate has low variance, which benefits all sub-studies. However, any systematic bias in the control arm (e.g., unexpected prognostic imbalance) propagates to all comparisons simultaneously.
7. Statistical Assumptions
All Endpoints
- •Mutually exclusive biomarkers: Each patient belongs to at most one biomarker subgroup. If biomarkers overlap, the prevalences must be defined on the exclusive subgroups.
- •Biomarker prevalence stability: The prevalence of each biomarker is assumed constant throughout the enrollment period. Temporal shifts in referral patterns or screening methods can invalidate this assumption.
- •Shared control independence: The shared control arm is assumed to be representative of the disease population across all biomarker subgroups. This requires that control outcomes do not vary by biomarker status, or that the control arm is large enough to be internally representative.
- •No cross-sub-study treatment switching: Patients remain on their assigned treatment for the duration of the trial. There is no provision for patients crossing over to another sub-study's treatment upon progression.
- •Fixed sample sizes: Total enrollment and the allocation ratio are pre-specified. The calculator does not model adaptive sample size re-estimation.
Binary Endpoint
- •Independent Bernoulli responses: Each patient's outcome is an independent Bernoulli draw with probability (treatment) or (control).
- •Large-sample approximation: The z-test assumes and are each at least 5 for the normal approximation to hold.
Continuous Endpoint
- •Known common variance: All arms share the same known standard deviation . In practice, is estimated from pilot data; misspecification inflates or deflates power estimates.
- •Normal distribution: Individual outcomes are assumed normally distributed. For non-normal data, the z-test is robust for moderate sample sizes by the central limit theorem.
Survival Endpoint
- •Proportional hazards: The hazard ratio is constant over time within each sub-study. Violations (e.g., delayed treatment effect, crossing survival curves) invalidate the log-rank-based analysis.
- •Exponential event times: The simulation assumes exponential survival distributions for both treatment and control arms. The analytical Schoenfeld formula holds more generally under proportional hazards, but the simulation uses exponential draws.
- •Uniform accrual and random censoring: Patients accrue uniformly over the accrual period. Administrative censoring occurs at the end of follow-up. Additional random censoring (dropout) follows an exponential distribution at rate .
8. Limitations & When Not to Use
When an Umbrella Design May Not Be Appropriate
Rare biomarkers with very low prevalence: If one or more biomarker subgroups have prevalence below 5–10%, the corresponding sub-study may enroll too few patients for adequate power even with a shared control. In such cases, consider combining rare subgroups or using a biomarker-enriched design.
Highly unequal sub-study sizes: When biomarker prevalences are very unequal (e.g., one subgroup is 60% of the population and another is 5%), the shared control may be poorly matched to the smaller sub-studies. The control arm reflects the overall disease population, which may differ prognostically from a rare biomarker subgroup.
Cross-sub-study treatment switching: If patients who progress on one treatment are likely to switch to another sub-study's treatment, the independence assumption between sub-studies breaks down. This is a common operational challenge in oncology umbrella trials and can bias treatment effect estimates.
Biomarker-dependent control outcome: The shared control assumption requires that the control treatment has similar efficacy regardless of biomarker status. If certain biomarkers are prognostic (not just predictive), the pooled control rate may not be a valid comparator for each sub-study.
No information borrowing across sub-studies: Unlike basket trials with BHM/EXNEX, this umbrella calculator does not borrow information across sub-studies because each tests a different treatment. The shared control is the only structural link between sub-studies.
Adaptive modifications not modeled: This calculator does not support adaptive features such as adding or dropping sub-studies mid-trial, response-adaptive randomization across arms, or interim analyses with early stopping. For platform-like adaptations, a different design framework is needed.
9. Regulatory Considerations
FDA Master Protocols Guidance (2022)
- •FDA's guidance on master protocols explicitly addresses umbrella trials as a subtype of master protocols. It recommends pre-specifying the biomarker panel, sub-study treatment assignments, shared control strategy, and statistical methods (including multiplicity correction) in the master protocol and SAP.
- •The guidance emphasizes that “the use of a shared control arm should be scientifically justified” and that the control arm should be “representative of the patient population in each sub-study.” Sponsors should provide evidence that the control outcome is not meaningfully affected by biomarker status.
- •FDA recommends simulation-based operating characteristics demonstrating adequate power for each sub-study and FWER control (or quantification) under the proposed multiplicity strategy. This calculator generates exactly these metrics.
- •For sub-studies that reach their own efficacy endpoint, FDA may consider each sub-study as supporting a separate indication-specific or biomarker-specific claim. The master protocol framework facilitates regulatory interactions across all sub-studies.
EMA Considerations
- •EMA has expressed support for master protocols but emphasizes caution with shared control arms, particularly when biomarker subgroups have different prognostic profiles. EMA guidance recommends that the protocol include pre-specified sensitivity analyses comparing results with and without the shared control.
- •European regulatory framework generally views each sub-study within an umbrella trial as a separate confirmatory evaluation. Marketing authorization applications may reference the master protocol but are evaluated on per-sub-study evidence.
Shared Control Acceptability
- •Both FDA and EMA accept shared control arms when: (1) the control treatment is the undisputed standard of care for the disease, (2) biomarker subgroups share similar baseline prognosis under the control, and (3) enrollment is concurrent across sub-studies to avoid temporal bias.
- •Regulators recommend pre-specifying how the shared control will be compared to each treatment arm, including the statistical test, multiplicity correction, and any biomarker-stratified analysis of the control arm to verify prognostic balance.
- •If the shared control assumption is questioned during regulatory review, sponsors should be prepared to present sub-study-specific control arm analyses and sensitivity analyses using only biomarker-matched control patients.
10. API Reference
POST /api/v1/calculators/umbrella
Umbrella trial design and analysis with frequentist or Bayesian methods, supporting binary, continuous, and survival endpoints with optional Monte Carlo simulation for operating characteristics.
Core Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| n_substudies | int | 3 | Number of biomarker-defined sub-studies [2, 8] |
| substudy_names | string[]? | null | Optional labels for each sub-study (length = n_substudies) |
| endpoint_type | string | "binary" | "binary", "continuous", or "survival" |
| analysis_type | string | "frequentist" | "frequentist" or "bayesian" |
| total_n | int | 300 | Total planned enrollment [50, 10000] |
| control_allocation | float | 0.33 | Fraction of total N allocated to shared control (0.1, 0.8) |
| biomarker_prevalences | float[]? | [1/J] × J | Prevalence per sub-study (must sum to ~1.0) |
| multiplicity_method | string | "bonferroni" | "bonferroni", "holm", or "none" |
| alpha | float | 0.025 | One-sided significance level (frequentist) (0, 1) |
| decision_threshold | float | 0.975 | Posterior probability threshold for Go (Bayesian) (0.5, 1.0) |
Binary Endpoint Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| null_rates | float[] | [0.15] × J | Null (control) response rate per sub-study (0, 1) |
| alternative_rates | float[] | [0.35] × J | Alternative (treatment) response rate per sub-study (0, 1) |
| prior_alpha | float | 1.0 | Beta prior alpha (Bayesian only) (>0) |
| prior_beta | float | 1.0 | Beta prior beta (Bayesian only) (>0) |
Continuous Endpoint Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| null_means | float[] | [0.0] × J | Null (control) mean per sub-study |
| alternative_means | float[] | [0.3] × J | Alternative (treatment) mean per sub-study |
| common_sd | float | 1.0 | Common standard deviation across all arms (>0) |
Survival Endpoint Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| median_control | float | 12.0 | Median survival time for control arm (months, >0) |
| hazard_ratios | float[] | [0.7] × J | Treatment-to-control hazard ratio per sub-study (>0) |
| accrual_time | float | 24.0 | Accrual period in months (>0) |
| follow_up_time | float | 12.0 | Additional follow-up after last enrollment (months, ≥0) |
| dropout_rate | float | 0.0 | Annual dropout rate [0, 1) |
Simulation Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| simulate | bool | false | Enable Monte Carlo simulation for operating characteristics |
| simulation_seed | int? | null | Seed for reproducibility; auto-generated if omitted |
| n_simulations | int | 10000 | Number of Monte Carlo simulations [1000, 100000] |
Example Request (Binary, Frequentist)
{
"n_substudies": 3,
"substudy_names": ["EGFR+", "ALK+", "KRAS G12C"],
"endpoint_type": "binary",
"analysis_type": "frequentist",
"total_n": 450,
"control_allocation": 0.33,
"biomarker_prevalences": [0.40, 0.30, 0.30],
"null_rates": [0.15, 0.15, 0.15],
"alternative_rates": [0.35, 0.40, 0.30],
"multiplicity_method": "holm",
"alpha": 0.025,
"simulate": true,
"n_simulations": 10000
}Example Request (Survival, Bayesian)
{
"n_substudies": 4,
"substudy_names": ["HER2+", "PIK3CA", "FGFR", "CDK4/6"],
"endpoint_type": "survival",
"analysis_type": "bayesian",
"total_n": 600,
"control_allocation": 0.30,
"biomarker_prevalences": [0.25, 0.25, 0.25, 0.25],
"median_control": 12.0,
"hazard_ratios": [0.65, 0.70, 0.75, 0.70],
"accrual_time": 24.0,
"follow_up_time": 12.0,
"dropout_rate": 0.05,
"decision_threshold": 0.975,
"simulate": true,
"n_simulations": 10000,
"simulation_seed": 42
}Response Fields
| Field | Description |
|---|---|
| analytical_results.endpoint_type | Endpoint type used ("binary", "continuous", or "survival") |
| analytical_results.analysis_type | Analysis type ("frequentist" or "bayesian") |
| analytical_results.n_substudies | Number of sub-studies in the design |
| analytical_results.substudy_names | Labels for each sub-study |
| analytical_results.total_n | Total planned enrollment |
| analytical_results.control_allocation | Fraction allocated to shared control |
| analytical_results.multiplicity_method | Multiplicity correction applied |
| analytical_results.per_substudy | Per-sub-study results: sample sizes, test statistic/posterior, p-value, Go/No-Go |
| analytical_results.pooled_control | Shared control arm summary: N, response rate or mean or event count |
| analytical_results.n_go_decisions | Number of sub-studies receiving a Go decision |
| analytical_results.design_summary | Human-readable summary of the design configuration |
| analytical_results.regulatory_notes | FDA/EMA guidance citations and recommendations |
| simulation_results | Monte Carlo OC when simulate=true (power, FWER, etc.) |
| metadata | Engine version, input hash, computation time |
11. Technical References
- Park JJH, Siden E, Zorber MJ, Heidebrink JL, Harari O, et al. (2019). Systematic review of basket trials, umbrella trials, and platform trials: a landscape analysis of master protocols. Trials, 20, 572.
- Woodcock J, LaVange LM (2017). Master protocols to study multiple therapies, multiple diseases, or both. New England Journal of Medicine, 377(1), 62–70.
- FDA (2022). Master protocols: Efficient clinical trial design strategies to expedite development of oncology drugs and biologics. Guidance for industry.
- Wason JMS, Abraham JE, Baird RD, Gournaris I, Vallier A-L, et al. (2015). A Bayesian adaptive design for biomarker trials with linked treatments. British Journal of Cancer, 113, 699–705.
- Redman MW, Allegra CJ (2015). The master protocol concept. Seminars in Oncology, 42(5), 724–730.
- Herbst RS, Gandara DR, Hirsch FR, Redman MW, LeBlanc M, et al. (2015). Lung Master Protocol (Lung-MAP)—a biomarker-driven protocol for accelerating development of therapies for squamous cell lung cancer: SWOG S1400. Clinical Cancer Research, 21(7), 1514–1524.
- Redig AJ, Janne PA (2015). Basket trials and the evolution of clinical trial design in an era of genomic medicine. Journal of Clinical Oncology, 33(9), 975–977.
- Holm S (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70.
- Schoenfeld DA (1983). Sample-size formula for the proportional-hazards regression model. Biometrics, 39(2), 499–503.
- EMA (2018). Reflection paper on the use of extrapolation in the development of medicines for paediatrics (applies similar principles to cross-indication extrapolation). EMA/189724/2018.