Validation & Transparency
Validated. Transparent. Trustworthy.
Zetyra's calculators collectively pass 499 automated validation tests against industry gold standards. Our GSD module matches gsDesign within 0.04 z-score units across 5 tested OBF/Pocock configurations (k=3–5). Our Bayesian Toolkit is validated end-to-end—from prior elicitation through sequential monitoring—with schema contracts, MC calibration, and analytical oracle comparisons. Survival/TTE endpoints, SSR, and Bayesian predictive power for survival are all independently validated.
At a Glance
Open Source
Unlike proprietary alternatives, our validation is public.
Our complete validation suite—25 scripts, 499 tests—runs continuously on GitHub Actions. Anyone can verify our accuracy, examine our methodology, and reproduce our results. No black boxes—just glass boxes.
View Live Validation SuiteResults Summary
Table 1
Validation Results by Calculator
| Calculator | Tests | Reference | Max Deviation | Status |
|---|---|---|---|---|
| Group Sequential Design | 30 | gsDesign R package | 0.034 z-score | Pass |
| GSD PACIFIC OS | 17 | Antonia et al. (2018) NEJM | 0.022 z-score | Pass |
| GSD MONALEESA-7 OS | 20 | Im et al. (2019) NEJM | 0.022 z-score | Pass |
| GSD Survival/TTE | 15 | Schoenfeld (1983), gsDesign | < 0.001 | Pass |
| GSD Survival gsDesign | 36 | gsDesign R (boundaries, alpha spending) | 0.034 z-score | Pass |
| CUPED | 12 | Analytical formulas | < 0.001 | Pass |
| CUPED Simulation | 43 | MC simulation, Deng et al. (2013) | < 0.02 | Pass |
| Bayesian Predictive Power | 17 | Conjugate priors | < 0.01 | Pass |
| Bayesian Survival | 21 | Normal-Normal on log(HR) | < 0.01 | Pass |
| Bayesian Survival Benchmark | 25 | Conjugate oracle, MC PP cross-val | < 0.03 | Pass |
| Prior Elicitation | 22 | ESS formula, scipy.optimize | < 0.001 | Pass |
| Bayesian Borrowing | 18 | Power prior, Cochran's Q | < 0.01 | Pass |
| Bayesian Sample Size | 26 | Binomial CI, MC search (binary + continuous) | CI-based | Pass |
| Bayesian Two-Arm | 24 | Binomial CI, MC search (binary + continuous) | CI-based | Pass |
| Bayesian Sequential | 20 | Zhou & Ji (2024) | < 0.0001 | Pass |
| Bayesian Sequential Table 3 | 27 | Zhou & Ji (2024) Table 3 + R code | < 0.005 | Pass |
| Bayesian Sequential Survival | 24 | Zhou & Ji (2024) + Schoenfeld | < 0.0001 | Pass |
| Bayesian Seq. Survival Benchmark | 24 | Zhou & Ji formula + Type I error | < 0.02 | Pass |
| SSR Blinded | 20 | Conditional power formulas | < 0.001 | Pass |
| SSR Unblinded | 21 | Zone classification, CP | < 0.001 | Pass |
| SSR gsDesign Benchmark | 14 | gsDesign R, reference formulas | 0 (exact) | Pass |
| Offline References | 23 | Pure math (no API) | < 1e-10 | Pass |
| Total | 499 | 25 scripts | All Pass |
Bayesian Toolkit
NEW271 tests across 13 scripts
Each of the 6 Bayesian calculators has a dedicated test suite. Tests cover 8 categories of validation:
Analytical Correctness
Conjugate posteriors, boundary formulas, and ESS derivations compared against closed-form references
MC Calibration
Type I error and power checked with Clopper-Pearson binomial CIs that scale with simulation count
Schema Contracts
Response keys, types, and value bounds validated for every API call with strict/non-strict lower bounds
Input Guards
Invalid inputs (negative rates, out-of-range priors) return 400/422 with the offending field named
Boundary Conditions
Extreme priors (ESS=1 to 1000), zero/all events, single-look designs, near-zero and near-one rates
Invariants & Properties
Higher power → larger n, larger effect → smaller n, higher discount → higher ESS, monotone boundaries
Seed Reproducibility
Same seed produces identical MC results across repeated calls for sample size and two-arm designs
Symmetry
Null hypothesis gives same type I error regardless of label swap; identical studies yield I²=0
Real-World Validation
HPTN 083
Phase 3 HIV Prevention Trial
HeartMate II
LVAD Clinical Trial
PACIFIC
SURVIVALDurvalumab, Stage III NSCLC (OS)
Looks 1–2 match reference exactly (0.000). Look 3 deviation (0.022) is from MVN integration precision.
MONALEESA-7
SURVIVALRibociclib, HR+ Breast Cancer (OS)
* Looks 1–2 match reference exactly (0.000). The 0.022 gap vs the published paper boundary at look 2 reflects a discrepancy in the published values—both Zetyra and our independent Lan-DeMets reference agree.
REBYOTA (Fecal Microbiota)
BAYESIANFDA BLA 125739 — PUNCH CD2 (Phase 2b) & CD3 (Phase 3) for C. difficile infection
PUNCH CD2 (Phase 2b)
PUNCH CD3 (Phase 3)
Boundary Accuracy
Two independent benchmarks validate GSD boundary accuracy against the gsDesign R package: one for standard (non-survival) designs, one for survival/TTE designs with Lan-DeMets spending functions.
Table 2a
Standard GSD Boundaries vs gsDesign (non-survival)
| Design | Looks | Max Deviation | Status |
|---|---|---|---|
| O'Brien-Fleming | 2 | 0.0000 | Pass |
| O'Brien-Fleming | 3 | 0.0015 | Pass |
| O'Brien-Fleming | 4 | 0.0117 | Pass |
| O'Brien-Fleming | 5 | 0.0332 | Pass |
| Pocock | 2 | 0.0000 | Pass |
| Pocock | 3 | 0.0010 | Pass |
| Pocock | 4 | 0.0033 | Pass |
Table 2b
Survival GSD Boundaries vs gsDesign (Lan-DeMets spending)
| Design | Looks | Max Deviation | Status |
|---|---|---|---|
| OBF (Lan-DeMets) | 3 | 0.0015 | Pass |
| OBF (Lan-DeMets) | 4 | 0.0117* | Pass |
| OBF (Lan-DeMets) | 5 | 0.0332* | Pass |
| Pocock (Lan-DeMets) | 3 | 0.0010 | Pass |
| Pocock (Lan-DeMets) | 4 | 0.0033 | Pass |
* Deviations occur at later looks (k=4–5) due to accumulated multivariate normal integration precision differences between scipy and R's mvtnorm. Early looks match exactly (0.000). The max deviation of 0.033 is at OBF k=5 final look. Pocock boundaries show negligible deviation at all look counts.
Methodology
gsDesignGroup Sequential Design
Validated against the gold-standard gsDesign R package. O'Brien-Fleming and Pocock spending functions computed to match FDA submission standards. Survival/TTE via Schoenfeld.
VRF = 1 - r²CUPED
Variance reduction validated against analytical formulas. Sample size reduction proportional to baseline-outcome correlation squared.
Beta(a+x, b+n-x)Bayesian Toolkit
6 calculators validated end-to-end: conjugate posteriors, Zhou & Ji boundaries, survival log(HR) mapping, Clopper-Pearson MC calibration, power priors, MAP heterogeneity, and ESS-based elicitation.
Var(log HR) = 4/dSurvival/TTE
Event-driven designs validated via Schoenfeld variance mapping. GSD, Bayesian Sequential, and Bayesian Predictive Power all support time-to-event endpoints with HR-scale outputs.
CP(z, n)Sample Size Re-estimation
Blinded and unblinded SSR validated against conditional power formulas. Zone classification, inflation caps, and threshold ordering verified for continuous, binary, and survival endpoints.
References
- 1. GSD: Jennison & Turnbull (2000) Group Sequential Methods with Applications to Clinical Trials
- 2. CUPED: Deng et al. (2013) Improving the Sensitivity of Online Controlled Experiments (WSDM)
- 3. Bayesian: Gelman et al. (2013) Bayesian Data Analysis
- 4. gsDesign: Anderson (2022) gsDesign R package
- 5. Bayesian Sequential: Zhou & Ji (2024) Bayesian sequential monitoring
- 6. Prior Elicitation: Morita, Thall & Müller (2008) Determining the effective sample size of a parametric prior
- 7. Survival: Schoenfeld (1983) Sample-size formula for the proportional-hazards regression model
- 8. SSR: Cui, Hung & Wang (1999) Modification of sample size in group sequential clinical trials
The only clinical trial design platform with public, continuously validated accuracy.
499 tests. 25 scripts. Every calculator validated.