Validation & Transparency

Validated. Transparent. Trustworthy.

Name: Zetyra
Price: 99 USD
Rating: 4.9 (47 reviews)
Author: Zetyra

Zetyra's calculators collectively pass 896 automated validation tests against industry gold standards. Our GSD module matches gsDesign within 0.04 z-score units. Our Bayesian Toolkit is validated end-to-end—from prior elicitation through sequential monitoring. Calculators are cross-checked against 11 published clinical trials — Salk (1954), HPTN 083, HeartMate II, PACIFIC, MONALEESA-7, DAPA-HF, REBYOTA / PUNCH CD2 & CD3, I-SPY 2, STAMPEDE, REMAP-CAP, NCT03377023 — plus the Leyrat 2024 cluster-RCT worked example. Every table on this page links directly to the open-source script that produced the number.

Version 2.3•Last updated April 2026

1Overview

896 automated tests passing

Max deviation: 0.034 z-score

Benchmarked against gsDesign, pwr, scipy

11 published clinical trials used in validation

Open source (MIT license)

42 scripts, continuous integration

Unlike proprietary alternatives, our validation is public.

Our complete validation suite—42 scripts, 896 tests—runs continuously on GitHub Actions. Anyone can verify our accuracy, examine our methodology, and reproduce our results. No black boxes—just glass boxes.

View Live Validation Suite

2Results Summary

Table 1

Validation Results by Calculator

Calculator	Tests	Reference	Max Deviation	Status
Two-Sample Sample Size	50	Cohen (1988), Schoenfeld (1981), closed-form normal approx	exact match	Pass
Chi-Square Test	55	scipy.stats.chi2, chi2_contingency, fisher_exact	< 1.5e-6	Pass
Cluster-Randomized Trial	61	Donner & Klar (2000), small-cluster t-correction, live MC sims	MC-based	Pass
Longitudinal / Repeated Measures	50	Diggle et al. (2002), Frison & Pocock (1992), live MC sims	exact matrix form	Pass
Group Sequential Design	30	gsDesign R package	0.034 z-score	Pass
GSD PACIFIC OS	17	Antonia et al. (2018) NEJM	0.022 z-score	Pass
GSD MONALEESA-7 OS	20	Im et al. (2019) NEJM	0.022 z-score	Pass
GSD Survival/TTE	15	Schoenfeld (1983), gsDesign	< 0.001	Pass
GSD Survival gsDesign	36	gsDesign R (boundaries, alpha spending)	0.034 z-score	Pass
CUPED	12	Analytical formulas	< 0.001	Pass
CUPED Simulation	43	MC simulation, Deng et al. (2013)	< 0.02	Pass
Beta-Binomial Conjugate	9	Lee & Liu (2008) PPoS + Gelman et al. (2013)	exact (conjugate formula)	Pass
Normal-Normal Conjugate	8	Spiegelhalter et al. (2004) + Gelman et al. (2013)	exact (conjugate formula)	Pass
Bayesian Survival	21	Normal-Normal on log(HR)	< 0.01	Pass
Bayesian Survival Benchmark	25	Conjugate oracle, MC PP cross-val	< 0.03	Pass
Prior Elicitation	22	ESS formula, scipy.optimize	< 0.001	Pass
Bayesian Borrowing	18	Power prior, Cochran's Q	< 0.01	Pass
Bayesian Sample Size	26	Binomial CI, MC search (binary + continuous)	CI-based	Pass
Bayesian Two-Arm	24	Binomial CI, MC search (binary + continuous)	CI-based	Pass
Bayesian Sequential	20	Zhou & Ji (2024)	< 0.0001	Pass
Bayesian Sequential Table 3	27	Zhou & Ji (2024) Table 3 + R code	< 0.005	Pass
Bayesian Sequential Survival	24	Zhou & Ji (2024) + Schoenfeld	< 0.0001	Pass
Bayesian Seq. Survival Benchmark	24	Zhou & Ji formula + Type I error	< 0.02	Pass
SSR Blinded	20	Conditional power formulas	< 0.001	Pass
SSR Unblinded	21	Zone classification, CP	< 0.001	Pass
SSR Single-Arm (Phase II ORR)	13	Beta-Binomial conjugate, Lee & Liu (2008), Saville et al. (2014)	< 0.001	Pass
NCT03377023 Replication (Nivo+Ipi+Nintedanib NSCLC)	13	Real Bayesian Phase II w/ published interim+final outcomes (Moffitt)	SAP boundary: PPoS(r₁=2)=0.31 > 0.20; PPoS(r₁=1)=0.08 ≤ 0.20	Pass
SSR gsDesign Benchmark	14	gsDesign R, reference formulas	0 (exact)	Pass
RAR (Adaptive Randomization)	20	Rosenberger optimal, DBCD, Thompson	< 0.001	Pass
Minimization (Pocock-Simon)	17	Imbalance reduction benchmark	MC-based	Pass
Basket Trial	21	Independent, BHM, EXNEX	< 0.001	Pass
Umbrella Trial	21	Freq/Bayesian × 3 endpoints	MC-based	Pass
Platform Trial (MAMS)	24	Boundaries, staggered entry, control	MC-based	Pass
I-SPY 2 Replication	10	Barker et al. (2009), pCR rates	< 0.001	Pass
STAMPEDE Replication	9	Sydes et al. (2012), MAMS OS/FFS	MC-based	Pass
REMAP-CAP Replication	8	Angus et al. (2020), Bayesian	MC-based	Pass
Salk 1954 Polio Trial Replication	8	Francis Report (1955): 200,745 vaccine vs 201,229 placebo; 33 vs 115 paralytic cases	χ² and Fisher exact: < 1e-10	Pass
DAPA-HF Replication	11	McMurray et al. (2019) EJHF: HR=0.80, α=0.025 (one-sided), 90% power, 844 events target	Schoenfeld events: 845 vs published 844	Pass
Leyrat 2024 Primary-Care CRT	6	Leyrat, Eldridge, Taljaard, Hemming (2024): p0=0.50 → p1=0.65, ICC=0.05, m=46, 24 clusters	Total clusters: exact match; N: 1102 vs 1104	Pass
Offline References	23	Pure math (no API)	< 1e-10	Pass
Total	896	42 scripts		All Pass

3Bayesian ToolkitNEW

248 tests across 13 scripts.

Each of the 6 Bayesian calculators has a dedicated test suite. Tests cover 8 categories of validation:

Analytical Correctness

Conjugate posteriors, boundary formulas, and ESS derivations compared against closed-form references

MC Calibration

Type I error and power checked with Clopper-Pearson binomial CIs that scale with simulation count

Schema Contracts

Response keys, types, and value bounds validated for every API call with strict/non-strict lower bounds

Input Guards

Invalid inputs (negative rates, out-of-range priors) return 400/422 with the offending field named

Boundary Conditions

Extreme priors (ESS=1 to 1000), zero/all events, single-look designs, near-zero and near-one rates

Invariants & Properties

Higher power → larger n, larger effect → smaller n, higher discount → higher ESS, monotone boundaries

Seed Reproducibility

Same seed produces identical MC results across repeated calls for sample size and two-arm designs

Symmetry

Null hypothesis gives same type I error regardless of label swap; identical studies yield I²=0

4Real-World Trial Replications

HPTN 083

Phase 3 HIV Prevention Trial

Design4-look O'Brien-Fleming

Boundaries tested4

Max deviation0.012 z-score

Early looks match reference to ≤ 0.001. The 0.012 max at look 4 reflects accumulated MVN integration precision between scipy and R's mvtnorm, the same source that drives the PACIFIC/MONALEESA-7 gaps.

HeartMate II

LVAD Clinical Trial

Design3-look O'Brien-Fleming

Info fractions[0.27, 0.67, 1.00]

StatusAll properties verified

PACIFIC

SURVIVAL

Durvalumab, Stage III NSCLC (OS)

Design3-look Lan-DeMets OBF

Published boundaryp < 0.00274 (z = 2.78)

Max deviation0.022 z-score

Looks 1–2 match reference exactly (0.000). Look 3 deviation (0.022) is from MVN integration precision.

MONALEESA-7

SURVIVAL

Ribociclib, HR+ Breast Cancer (OS)

Design3-look Lan-DeMets OBF

Published boundariesz = 3.60, 2.32

Max deviation0.022 z-score *

* Looks 1–2 match reference exactly (0.000). The 0.022 gap vs the published paper boundary at look 2 reflects a discrepancy in the published values—both Zetyra and our independent Lan-DeMets reference agree.

I-SPY 2

BASKET

Adaptive Breast Cancer Trial (pCR endpoint)

DesignBayesian basket, 10 signatures

Drugs validatedVeliparib, Pembrolizumab, Neratinib

StatusAll graduation decisions match

Published pCR rates reproduced via Beta-Binomial conjugate posteriors. Veliparib TNBC (51% vs 26%), Pembrolizumab TNBC (60% vs 22%).

STAMPEDE

PLATFORM

Prostate Cancer MAMS Trial (OS/FFS)

Design5-arm MAMS, 4 stages

Key resultDocetaxel OS HR=0.78

StatusBoundaries and power verified

OBF spending boundaries, docetaxel power, celecoxib futility detection (HR=0.98), and total sample size calculations all verified.

REMAP-CAP

PLATFORM

COVID-19 Bayesian Adaptive Platform

DesignBayesian, 99% posterior threshold

Domains validatedIL-6 RA, Antivirals

StatusSuperiority/futility verified

Tocilizumab superiority (mortality 28% vs 36%) and lopinavir futility correctly detected. Multi-domain staggered entry validated.

NCT03377023 (Nivo + Ipi + Nintedanib NSCLC)

BAYESIANSINGLE-ARM SSR

Phase II at Moffitt Cancer Center — Bayesian two-stage design with predictive-probability futility monitoring; both arms' published interim and final outcomes replicated end-to-end (Chen et al. 2019; JTO 2021; JCO 2023)

Arm A: ICI-naïve (p₀=0.30, p₁=0.50)

Sim power vs published0.778 vs 0.85

Final outcome9/22 (40.9% ORR)

Posterior P(p>p₀)0.880 (under-enrolled)

Arm B: ICI-treated (p₀=0.07, p₁=0.20)

SAP rule at r₁=2/20PPoS 0.31 > 0.20 → continued ✓

Final outcome6/28 (21.4% ORR)

Posterior P(p>p₀)0.997 → success ✓

Why the 7pp power gap? Engine-design mismatch, not a bug. Zetyra's Single-Arm SSR calculator is a sample-size re-estimation tool — it treats the interim as a decision point for final N (initial N ≈ 31 from the normal-approx, re-estimated upward toward the cap of 40 only when the interim data warrant). The NCT03377023 SAP uses a fixed N = 40 two-stage Simon-style rule with no re-estimation. Simulated paths that don't trigger an SSR extension end with fewer than 40 observations, giving fewer trials to hit the ≥17-success threshold — hence the lower simulated power. Our validation accepts ±15pp for this reason. The headline claim is the decision-rule replication: at r₁=2/20, PPoS = 0.31 > 0.20 → continue (trial did ✓); at 6/28 final, P(p>p₀) = 0.997 → success ✓. Independent verification confirms the SAP's own two-stage rule produces 0.846 power at p=0.5, matching the published 0.85 — so the SAP is correct and Zetyra is correct on the decision-rule side; only the engine-design choice differs.

Salk (1954)

CHI-SQUAREBINARY N

Francis Field Trial of poliomyelitis vaccine — the canonical randomized 2×2 teaching example

Placebo-controlled arm200,745 vaccine vs 201,229 placebo

Paralytic cases33 vs 115

χ² (Yates)44.15, p = 3.1×10⁻¹¹

Vaccine efficacy 71.2%. Fisher's exact and Yates-corrected χ² both match scipy to ≤ 1e-10. A-priori required N (34,837/arm) far below the 200k actually enrolled — the trial was decisively over-powered.

DAPA-HF (2019)

SURVIVALSCHOENFELD

Dapagliflozin in HFrEF — event-driven Schoenfeld sample-size replication

DesignHR=0.80, α=0.025 (1-sided), power=0.90

Target events (published)844

Zetyra Schoenfeld845 (ceiling off by 1)

The 1-event gap is ceiling vs rounding: the raw Schoenfeld value is 844.09 (exact z from scipy), which Zetyra ceils up to 845 and the paper presents rounded to 844. Both deliver the designed 90% power. HR / power / allocation-ratio sensitivity all monotone in the expected direction. McMurray et al., Eur J Heart Fail 2019. PMID 30895697.

Leyrat et al. (2024) Primary-Care CRT

CLUSTER-RCT

Methodological worked example: behavior-change counseling delivered at GP-practice level, detecting a health-behavior change from 50% to 65%

Design parameters

p₀ → p₁0.50 → 0.65

ICC, cluster size0.05, m = 46

α, power0.05, 0.80

Published vs Zetyra

Design effect3.25 (exact)

Total clusters24 (exact)

Total N1,102 vs 1,104

Leyrat C, Eldridge S, Taljaard M, Hemming K (2024), J Epidemiol Popul Health 72(1):202198. Sensitivity band at ICC ∈ [0.02, 0.10] brackets the point estimate monotonically: 14 < 24 < 42 clusters. The 2-patient gap in total N comes from ceiling order: Zetyra applies 2 × ⌈n_ind × DE⌉ = 1,102; the paper ceils individual N first (340) then multiplies by DE (340 × 3.25 = 1,105, reported as 1,104). Both deliver the target power; Zetyra's is 2 patients tighter.

REBYOTA (Fecal Microbiota)

BAYESIAN

FDA BLA 125739 — PUNCH CD2 (Phase 2b) & CD3 (Phase 3) for C. difficile infection

PUNCH CD2 (Phase 2b)

Data25/45 responders (55.6%)

Used inPrior, Borrowing, Sample Size

Scenarios11 tests (δ = 0–1)

PUNCH CD3 (Phase 3)

Data126/177 treat, 53/85 placebo

Used inTwo-Arm, Borrowing (MAP)

Cross-phase I²40–90% detected

5GSD Benchmark Details

Boundary accuracy vs gsDesign R package.

Two independent benchmarks validate GSD boundary accuracy against the gsDesign R package: one for standard (non-survival) designs, one for survival/TTE designs with Lan-DeMets spending functions.

Table 2a

Standard GSD Boundaries vs gsDesign (non-survival)

Design	Looks	Max Deviation	Status
O'Brien-Fleming	2	0.0000	Pass
O'Brien-Fleming	3	0.0015	Pass
O'Brien-Fleming	4	0.0117*	Pass
O'Brien-Fleming	5	0.0332*	Pass
Pocock	2	0.0000	Pass
Pocock	3	0.0010	Pass
Pocock	4	0.0033	Pass

* Deviations at later looks (k=4–5) reflect accumulated multivariate normal integration precision differences between scipy and R's mvtnorm. Looks 2–3 match reference to 3–4 decimals. The 0.033 max at OBF k=5 is the headline “0.034 z-score” figure cited elsewhere on this page.

Table 2b

Survival GSD Boundaries vs gsDesign (Lan-DeMets spending)

Design	Looks	Max Deviation	Status
OBF (Lan-DeMets)	3	0.0015	Pass
OBF (Lan-DeMets)	4	0.0117*	Pass
OBF (Lan-DeMets)	5	0.0332*	Pass
Pocock (Lan-DeMets)	3	0.0010	Pass
Pocock (Lan-DeMets)	4	0.0033	Pass

* Deviations occur at later looks (k=4–5) due to accumulated multivariate normal integration precision differences between scipy and R's mvtnorm. Early looks match exactly (0.000). The max deviation of 0.033 is at OBF k=5 final look. Pocock boundaries show negligible deviation at all look counts.

6Methodology

gsDesign

Group Sequential Design

Validated against the gold-standard gsDesign R package. O'Brien-Fleming and Pocock spending functions computed to match FDA submission standards. Survival/TTE via Schoenfeld.

\text{VRF} = 1 - \rho^2

CUPED

Variance reduction validated against analytical formulas. Sample size reduction proportional to baseline-outcome correlation squared.

\text{Beta}(\alpha + x,\, \beta + n - x)

Bayesian Toolkit

6 calculators validated end-to-end: conjugate posteriors, Zhou & Ji boundaries, survival log(HR) mapping, Clopper-Pearson MC calibration, power priors, MAP heterogeneity, and ESS-based elicitation.

\text{Var}(\log \text{HR}) = 4/d

Survival/TTE

Event-driven designs validated via Schoenfeld variance mapping. GSD, Bayesian Sequential, and Bayesian Predictive Power all support time-to-event endpoints with HR-scale outputs.

\text{CP}(z_{\text{interim}}, n)

Sample Size Re-estimation

Blinded and unblinded SSR validated against conditional power formulas. Zone classification, inflation caps, and threshold ordering verified for continuous, binary, and survival endpoints.

\text{PPoS} + \text{Beta}(\alpha + r,\, \beta + n - r)

Single-Arm SSR

Bayesian single-arm Phase II (ORR) validated via Beta-Binomial conjugate posterior + predictive-probability futility monitoring. Decoupled γ_efficacy / γ_final thresholds, full operating-characteristics table with sample-size re-estimation. Replicated end-to-end against NCT03377023 interim and final decisions.

n = (z_\alpha + z_\beta)^2 \, \sigma^2 / \delta^2

Two-Sample & χ² Tests

Two-sample sample size (continuous / binary / survival) validated against closed-form Cohen's d benchmarks and Schoenfeld log-rank events. Pearson χ² with Yates correction, McNemar, and Fisher's exact validated against scipy via a Node bridge that exercises the exact TypeScript module shipped to users.

\text{DE} = 1 + (m - 1)\,\rho

Cluster-Randomized Trial

Design-effect inflation validated against Donner & Klar. Small-cluster t-correction (the z-based formula is anti-conservative at k < 15/arm) verified via live random-intercept MC simulation. ICC sensitivity band recomputed independently at each endpoint. Replicated against the Leyrat 2024 worked example to the integer.

\text{Var}(\hat\beta) = \sigma^2 \, \mathbf{s}^{\top} \Sigma \mathbf{s} / (\mathbf{s}^{\top} \mathbf{s})^2

Longitudinal / Repeated Measures

Exact matrix slope variance under AR(1) and CS replaces the m→∞ asymptotic previously shipped (off by 2–14× in real-world regimes). ANCOVA, endpoint, and change-from-baseline variance formulas verified against Frison & Pocock (1992) Table 2. Empirical power confirmed via live LMM simulations.

\rho^{*} = \sqrt{p} \,/\, \textstyle\sum \sqrt{p}

Adaptive Randomization

RAR (DBCD, Thompson, Neyman) validated against Rosenberger optimal allocation theory. Minimization validated against pure random imbalance benchmarks. Binary, continuous, and survival endpoints.

P(\theta > \theta_0 \mid \text{data})

Master Protocol

Basket (BHM, EXNEX), umbrella, and platform (MAMS) trials validated against conjugate theory and three published trials: I-SPY 2, STAMPEDE, and REMAP-CAP.

7References

1. GSD: Jennison & Turnbull (2000) Group Sequential Methods with Applications to Clinical Trials
2. CUPED: Deng A, Xu Y, Kohavi R, Walker T (2013) Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data (WSDM)
3. Bayesian: Gelman et al. (2013) Bayesian Data Analysis
4. gsDesign: Anderson (2022) gsDesign R package
5. Bayesian Sequential: Zhou T & Ji Y (2024) On Bayesian Sequential Clinical Trial Designs (New England J Statistics in Data Science 2(1))
6. Prior Elicitation: Morita, Thall & Müller (2008) Determining the effective sample size of a parametric prior
7. Survival: Schoenfeld (1983) Sample-size formula for the proportional-hazards regression model
8. SSR: Cui, Hung & Wang (1999) Modification of sample size in group sequential clinical trials
9. RAR: Rosenberger et al. (2001) Optimal adaptive designs for binary response trials
10. Basket: Berry SM, Broglio KR, Groshen S, Berry DA (2013) Bayesian hierarchical modeling of patient subpopulations: Efficient designs of Phase II oncology clinical trials (Clin Trials 10(5):720-734)
11. Platform: Saville BR & Berry SM (2016) Efficiencies of platform clinical trials: A vision of the future (Clin Trials 13(3):358-366)
12. I-SPY 2: Barker AD, Sigman CC, Kelloff GJ, Hylton NM, Berry DA, Esserman LJ (2009) I-SPY 2: an adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy (Clin Pharmacol Ther 86(1):97-100)
13. STAMPEDE: Sydes MR et al. (2012) Flexible trial design in practice — stopping arms for lack-of-benefit and adding research arms mid-trial in STAMPEDE: a multi-arm multi-stage randomized controlled trial (Trials 13:168)
14. REMAP-CAP: Angus DC et al. (2020) Effect of Hydrocortisone on Mortality and Organ Support in Patients With Severe COVID-19: The REMAP-CAP COVID-19 Corticosteroid Domain Randomized Clinical Trial (JAMA 324(13):1317-1329)
15. Sample size textbook: Cohen (1988) Statistical Power Analysis for the Behavioral Sciences, 2nd ed.
16. 2×2 continuity correction: Yates (1934) Contingency tables involving small numbers and the χ² test
17. Cluster RCT: Donner & Klar (2000) Design and Analysis of Cluster Randomization Trials
18. Longitudinal methods: Diggle, Heagerty, Liang & Zeger (2002) Analysis of Longitudinal Data, 2nd ed.
19. Repeated measures: Frison L & Pocock SJ (1992) Repeated measures in clinical trials: analysis using mean summary statistics and its implications for design (Stat Med 11(13):1685-1704)
20. Single-Arm SSR (PPoS): Lee & Liu (2008) A predictive probability design for phase II cancer clinical trials (Clin Trials)
21. CRT worked example: Leyrat C, Eldridge S, Taljaard M, Hemming K (2024) Practical considerations for sample size calculation for cluster randomized trials (J Epidemiol Popul Health 72(1):202198, PMID 38477482)
22. Salk polio trial: Francis T Jr. (1955) Evaluation of the 1954 Field Trial of Poliomyelitis Vaccine: Final Report (U. Michigan)
23. DAPA-HF: McMurray JJV et al. (2019) A trial to evaluate the effect of dapagliflozin on morbidity and mortality in HFrEF (Eur J Heart Fail, PMID 30895697)
24. PACIFIC: Antonia SJ et al. (2018) Overall Survival with Durvalumab after Chemoradiotherapy in Stage III NSCLC (NEJM 379:2342-2350)
25. MONALEESA-7: Im SA et al. (2019) Overall Survival with Ribociclib plus Endocrine Therapy in Breast Cancer (NEJM 381:307-316)
26. NCT03377023 Bayesian design: Chen DT, Schell MJ, Fulp WJ et al. (2019) Application of Bayesian predictive probability for interim futility analysis in single-arm phase II trial (Transl Cancer Res 8(Suppl 4):S404-S420, PMID 31456910)
27. Mehta & Pocock (Promising-Zone SSR): Mehta CR & Pocock SJ (2011) Adaptive increase in sample size when interim results are promising: a practical guide with examples (Stat Med 30:3267–3284)
28. Bayesian PP: Spiegelhalter DJ, Abrams KR & Myles JP (2004) Bayesian Approaches to Clinical Trials and Health-Care Evaluation (Wiley)
29. RAR (DBCD): Hu F & Zhang LX (2004) Asymptotic properties of doubly adaptive biased coin designs for multitreatment clinical trials (Ann Stat 32(1):268–301)
30. EXNEX: Neuenschwander B, Wandel S, Roychoudhury S, Bailey S (2016) Robust exchangeability designs for early phase clinical trials with multiple strata (Pharm Stat 15(2):123–134)
31. Master Protocol: FDA (2022) Master Protocols: Efficient Clinical Trial Design Strategies to Expedite Development of Oncology Drugs and Biologics (Guidance for Industry)

The only clinical trial design platform with public, continuously validated accuracy.

896 tests. 42 scripts. Every calculator validated.

Try Our Calculators Start Free Trial