Docs/Composed Pipeline T1E

Composed Pipeline Type I Error Simulator

Pipeline-level Type I error simulator for two-arm Phase II oncology designs that compose up to four adaptive mechanisms — historical borrowing (MAP prior), Bayesian sequential monitoring, sample-size re-estimation (SSR), and response-adaptive randomization (Thompson Sampling). The engine is a Python port of the canonical R reference implementation underlying Qian (2026, JSM); the calculator page ships one-click presets that reproduce the paper's headline numbers and decompose the super-additive interaction between prior–data conflict and time trends.

Component-level validation does not imply pipeline-level control.

When two or more adaptive mechanisms update on overlapping interim information sets $\mathcal{F}_t$ , the operating characteristics of the composed system are not the sum of its parts. Each individual mechanism can pass its validation check while the pipeline inflates Type I error. In the headline scenario (mild prior–data drift, $p_c = 0.27$ vs. prior mean 0.25, plus linear trend $\delta = 0.05$ ), pipeline-level T1E reaches 0.0771 (+208% above α = 0.025) — a regulatory-relevant inflation invisible to component-level evaluation.

1. When to Use This Calculator

Use this calculator whenever a Phase II design composes two or more of the following four mechanisms:

Historical borrowing: a MAP (meta-analytic predictive) or robustified-mixture prior on the control arm, fitted from one or more historical cohorts.
Bayesian sequential monitoring: interim posterior probability tests $P(p_e > p_c \mid \text{data}) \ge \gamma^*$ evaluated at one or more pre-specified looks.
Sample-size re-estimation (SSR): blinded re-estimation of $N_{\text{final}}$ at an interim look using pooled response data, capped at $N_{\max}$ .
Response-adaptive randomization (RAR): Thompson Sampling allocation that shifts treatment assignment toward the arm with the higher current posterior.

Single-mechanism designs are well-served by the existing calculators (Bayesian Borrowing, Bayesian Sequential, SSR Blinded/Unblinded/ Single-Arm, Adaptive Randomization). This calculator is purpose-built for the composed case: when at least two mechanisms share an interim information set, component-level guarantees stop being invariant under composition and pipeline-level Type I error must be simulated explicitly. The FDA's January 2026 draft guidance on Bayesian adaptive designs requires exactly this pipeline-level evaluation under realistic prior–data conflict and time-trend scenarios.

2. Design Class & Mechanisms

2.1 Trial structure

Two-arm Phase II oncology design with binary endpoint (e.g., ORR). The default schedule mirrors the JSM 2026 reference design: monitoring looks at $n \in \{30, 45, 60, 90\}$ , planned final at $n_{\text{base}} = 90$ , cap at $N_{\max} = 150$ , and SSR firing at $n = 45$ via blinded pooled response. Treatment and control arms accrue concurrently; allocation is either fixed 1:1 alternation (when RAR is off) or Thompson Sampling after a 10-patient burn-in (when RAR is on).

2.2 MAP prior

The control prior is a two-component Beta mixture

\pi_c(p) = w \cdot \mathrm{Beta}(a_{\text{inf}}, b_{\text{inf}}) + (1 - w) \cdot \mathrm{Beta}(a_{\text{vag}}, b_{\text{vag}})

with default parameters $a_{\text{inf}} = 6.5,\ b_{\text{inf}} = 19.5,\ w = 0.70$ (informative component, ESS ≈ 26, prior mean ≈ 0.25) and $a_{\text{vag}} = b_{\text{vag}} = 0.5$ (Jeffreys robustification). Mixture weights are updated at each interim via the log marginal-likelihood ratio. The treatment-arm posterior uses an independent Jeffreys prior — historical borrowing applies only to control.

2.3 Bayesian monitoring

At each scheduled look the calculator computes

P(p_e > p_c \mid \text{data})

via Monte Carlo (default 2,000 samples per look). Stop for efficacy if this posterior probability exceeds $\gamma^*$ . The default $\gamma^* = 0.970$ is the calibrated value that yields T1E ≈ α = 0.025 under H₀ with no conflict and no trend.

2.4 Blinded SSR

At $n_{\text{ssr,trig}}$ the pooled response count $x_{\text{pool}} = x_e + x_c$ is used in a Wald-test conditional-power formula to compute the smallest $N \le N_{\max}$ achieving the target conditional power (default 80%). Futile interims ( $\hat p_{\text{pool}} \le p_0$ ) leave $N_{\text{final}}$ unchanged.

2.5 Response-Adaptive Randomization

Thompson Sampling: at each enrollment after a 10-patient burn-in, sample one Beta variate from each arm's posterior and allocate to whichever variate is larger. This is the canonical TS reference implementation; it converges toward complete allocation to the better arm (a fundamentally different limit from the Rosenberger DBCD target).

2.6 Time trend

Linear drift over fractional enrollment time: $p_{\text{arm}}(t) = p_{\text{arm,base}} + \delta \cdot t$ , with $t \in [0, 1]$ . Both arms drift in parallel under H₀, so the data-generating null is preserved. RAR coupled with this drift induces allocation imbalance that biases the final analysis.

3. JSM Presets & What They Show

Each preset on the calculator runs a multi-scenario batch and reproduces a specific table from the JSM paper.

Section 4.3 — Headline (3 scenarios)

Calibration (no conflict, no trend) → T1E ≈ 0.029 — the γ* calibration target.
Mild conflict + trend ( $p_c = 0.27$ , $\delta = 0.05$ ) → T1E ≈ 0.0771 (+208% above α). The headline result.
Strong conflict + trend ( $p_c = 0.30$ , $\delta = 0.05$ ) → T1E ≈ 0.1274 (+410%). Stress test demonstrating the inflation scales with conflict severity.

Section 4.5 — Mechanism Isolation 2×2

Crosses prior–data conflict ( $p_c \in \{0.25, 0.30\}$ ) with time trend ( $\delta \in \{0, 0.05\}$ ) under all four mechanisms active. The 2×2 reveals the super-additive interaction:

\Delta_{\text{interaction}} = T1E_{\text{both}} - \big[T1E_{\text{neither}} + (T1E_{\text{conflict}} - T1E_{\text{neither}}) + (T1E_{\text{trend}} - T1E_{\text{neither}})\big]

The paper reports $\Delta_{\text{interaction}} = +0.0278$ (95% SI excluding zero). The calculator computes this term automatically at the bottom of the results table whenever this preset is run.

Section 4.4 — SSR × RAR Factorial

Decomposes the inflation under conflict + trend into independent SSR and RAR contributions and their interaction. The paper reports $\Delta_{\text{SSR}} \approx +0.013$ , $\Delta_{\text{RAR}} \approx +0.012$ , and a near-zero (slightly sub-additive) SSR × RAR interaction. The dominant driver is prior–data conflict, not the SSR or RAR mechanisms in isolation.

4. Interpreting Results

4.1 The estimand

The reported rejection rate is the frequentist Type I error rate under the data-generating null $p_e = p_c$ — not a posterior predictive error rate conditional on prior beliefs. The prior–data conflict scenarios are realistic situations where the historical prior is misspecified relative to the current trial population, but H₀ still holds in the current data. The inflation we report is the false-positive rate under the current trial's data-generating null — the regulatory-relevant quantity, not a Bayesian robustness diagnostic.

4.2 Monte Carlo uncertainty

The simulator reports the rejection rate, its binomial Monte Carlo standard error $\text{MCSE} = \sqrt{\hat p (1 - \hat p) / N_{\text{sim}}}$ , and a 95% Wald CI. At $N_{\text{sim}} = 5{,}000$ with $\hat p \approx 0.08$ , MCSE ≈ 0.0038 and the CI half-width is about ±0.0075. To halve MCSE, run at $N_{\text{sim}} = 20{,}000$ in the Advanced panel.

4.3 Why a single-seed Python run can drift from the paper

The paper's reference numbers were generated using R's default Mersenne Twister PRNG with specific seeds (e.g., 2028 for the mild-conflict headline). This calculator uses NumPy's PCG64 PRNG. Same seed values produce different random streams across the two engines, so a Python single-seed run can drift by 2–3 MCSE from the paper's single-seed value even when the engines are mathematically identical. The reported σ-deviation column is the right diagnostic: across multiple seeds the engine reproduces the paper's population values within MCSE.

4.4 Super-additive interaction term

When the Mechanism Isolation 2×2 preset finishes, the calculator displays $\Delta_{\text{interaction}}$ below the results table. A positive value means the joint effect (conflict + trend) exceeds the sum of independent contributions — the mechanisms amplify each other beyond additivity. The paper's reference value is +0.0278 (~2.78pp). A null-or-negative interaction term indicates the perturbations roughly add; a strongly positive one (as observed) is the key qualitative finding.

5. Assumptions & Limitations

Two-arm, binary endpoint. Continuous and survival are on the roadmap but not implemented in v1; the JSM paper itself is binary-only.
Blinded SSR with pooled response. The calculator uses aggregate counts at the SSR look (the "single-arm in the sense that SSR uses only aggregate counts, not treatment-arm allocation" phrasing from the paper). Unblinded treatment-arm-specific SSR is not modeled.
Thompson Sampling RAR only. DBCD, Neyman, and other allocation rules are not exposed in this calculator. Use the dedicated RAR calculator for those.
MAP prior is pre-pooled. Users supply $(a_{\text{inf}}, b_{\text{inf}}, w_{\text{inf}}, a_{\text{vag}}, b_{\text{vag}})$ directly via the Advanced panel. To fit a MAP prior from historical cohort data, use the Bayesian Borrowing calculator first and copy the fitted hyperparameters across.
Linear time trend only. Sigmoidal, step, or non-linear trends would require code changes; the JSM paper's empirical observation is that linear drift is sufficient to demonstrate the super-additive interaction.

6. API Reference

POST /api/v1/calculators/composed-pipeline— runs one scenario.

Request body (key fields)

use_borrowing, use_monitoring, use_ssr, use_rar — bool toggles for each mechanism
p_control, p_treatment, time_trend_delta — data-generating scenario
n_base, n_cap, n_ssr_trig, monitoring_looks, gamma — design
a_inf, b_inf, w_inf, a_vag, b_vag — MAP prior mixture
simulate, n_simulations, simulation_seed — Monte Carlo control

Response (key fields)

simulation.estimates.rejection_rate — pipeline-level T1E
simulation.estimates.monte_carlo_se, ci_95_lower, ci_95_upper
analytical_results.mechanisms_active — design summary

The endpoint returns 400 on invalid scenario combinations (e.g.,n_ssr_trig >= n_base), and 500 on internal simulation failures. Pro subscription required.

7. References

Qian L. Zetyra: A Validated, Regulatory-Aligned Calculator Suite for Adaptive and Bayesian Clinical Trial Design. Manuscript for the Joint Statistical Meetings 2026, August 2026.
Berry SM, Broglio KR, Groshen S, Berry DA. Bayesian hierarchical modeling of patient subpopulations: Efficient designs of Phase II oncology clinical trials. Clinical Trials. 2013;10(5):720-734.
Mehta CR, Pocock SJ. Adaptive increase in sample size when interim results are promising: A practical guide with examples. Statistics in Medicine. 2011;30(28):3267-3284.
Kieser M, Friede T. Simple procedures for blinded sample size adjustment that do not affect the type I error rate. Statistics in Medicine. 2003;22(23):3571-3581.
Hu F, Zhang LX. Asymptotic properties of doubly adaptive biased coin designs for multitreatment clinical trials. Annals of Statistics. 2004;32(1):268-301.
Schmidli H, et al. Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics. 2014;70(4):1023-1032.
U.S. Food and Drug Administration. Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products: Draft Guidance for Industry. January 12, 2026.
U.S. Food and Drug Administration. Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry. November 2019.

Last updated: May 2026

Ready to stress-test your composed design?

Use our Composed Pipeline T1E Simulator for pipeline-level Type I error under prior-data conflict and time-trend scenarios for designs that compose borrowing, monitoring, SSR, and RAR.

Open Composed Pipeline T1E Simulator