Validation & Transparency

Validated. Transparent. Trustworthy.

Zetyra's calculators collectively pass 499 automated validation tests against industry gold standards. Our GSD module matches gsDesign within 0.04 z-score units across 5 tested OBF/Pocock configurations (k=3–5). Our Bayesian Toolkit is validated end-to-end—from prior elicitation through sequential monitoring—with schema contracts, MC calibration, and analytical oracle comparisons. Survival/TTE endpoints, SSR, and Bayesian predictive power for survival are all independently validated.

At a Glance

499 automated tests passing
Max deviation: 0.034 z-score
Benchmarked against gsDesign, pwr, scipy
5 real-world clinical trials replicated
Open source (MIT license)
25 scripts, continuous integration

Open Source

Unlike proprietary alternatives, our validation is public.

Our complete validation suite—25 scripts, 499 tests—runs continuously on GitHub Actions. Anyone can verify our accuracy, examine our methodology, and reproduce our results. No black boxes—just glass boxes.

View Live Validation Suite

Results Summary

Table 1

Validation Results by Calculator

CalculatorTestsReferenceMax DeviationStatus
Group Sequential Design30gsDesign R package0.034 z-scorePass
GSD PACIFIC OS17Antonia et al. (2018) NEJM0.022 z-scorePass
GSD MONALEESA-7 OS20Im et al. (2019) NEJM0.022 z-scorePass
GSD Survival/TTE15Schoenfeld (1983), gsDesign< 0.001Pass
GSD Survival gsDesign36gsDesign R (boundaries, alpha spending)0.034 z-scorePass
CUPED12Analytical formulas< 0.001Pass
CUPED Simulation43MC simulation, Deng et al. (2013)< 0.02Pass
Bayesian Predictive Power17Conjugate priors< 0.01Pass
Bayesian Survival21Normal-Normal on log(HR)< 0.01Pass
Bayesian Survival Benchmark25Conjugate oracle, MC PP cross-val< 0.03Pass
Prior Elicitation22ESS formula, scipy.optimize< 0.001Pass
Bayesian Borrowing18Power prior, Cochran's Q< 0.01Pass
Bayesian Sample Size26Binomial CI, MC search (binary + continuous)CI-basedPass
Bayesian Two-Arm24Binomial CI, MC search (binary + continuous)CI-basedPass
Bayesian Sequential20Zhou & Ji (2024)< 0.0001Pass
Bayesian Sequential Table 327Zhou & Ji (2024) Table 3 + R code< 0.005Pass
Bayesian Sequential Survival24Zhou & Ji (2024) + Schoenfeld< 0.0001Pass
Bayesian Seq. Survival Benchmark24Zhou & Ji formula + Type I error< 0.02Pass
SSR Blinded20Conditional power formulas< 0.001Pass
SSR Unblinded21Zone classification, CP< 0.001Pass
SSR gsDesign Benchmark14gsDesign R, reference formulas0 (exact)Pass
Offline References23Pure math (no API)< 1e-10Pass
Total49925 scriptsAll Pass

Bayesian Toolkit

NEW

271 tests across 13 scripts

Each of the 6 Bayesian calculators has a dedicated test suite. Tests cover 8 categories of validation:

Analytical Correctness

Conjugate posteriors, boundary formulas, and ESS derivations compared against closed-form references

MC Calibration

Type I error and power checked with Clopper-Pearson binomial CIs that scale with simulation count

Schema Contracts

Response keys, types, and value bounds validated for every API call with strict/non-strict lower bounds

Input Guards

Invalid inputs (negative rates, out-of-range priors) return 400/422 with the offending field named

Boundary Conditions

Extreme priors (ESS=1 to 1000), zero/all events, single-look designs, near-zero and near-one rates

Invariants & Properties

Higher power → larger n, larger effect → smaller n, higher discount → higher ESS, monotone boundaries

Seed Reproducibility

Same seed produces identical MC results across repeated calls for sample size and two-arm designs

Symmetry

Null hypothesis gives same type I error regardless of label swap; identical studies yield I²=0

Real-World Validation

HPTN 083

Phase 3 HIV Prevention Trial

Design4-look O'Brien-Fleming
Boundaries tested4
Max deviation0.012 z-score

HeartMate II

LVAD Clinical Trial

Design3-look O'Brien-Fleming
Info fractions[0.27, 0.67, 1.00]
StatusAll properties verified

PACIFIC

SURVIVAL

Durvalumab, Stage III NSCLC (OS)

Design3-look Lan-DeMets OBF
Published boundaryp < 0.00274 (z = 2.78)
Max deviation0.022 z-score

Looks 1–2 match reference exactly (0.000). Look 3 deviation (0.022) is from MVN integration precision.

MONALEESA-7

SURVIVAL

Ribociclib, HR+ Breast Cancer (OS)

Design3-look Lan-DeMets OBF
Published boundariesz = 3.60, 2.32
Max deviation0.022 z-score *

* Looks 1–2 match reference exactly (0.000). The 0.022 gap vs the published paper boundary at look 2 reflects a discrepancy in the published values—both Zetyra and our independent Lan-DeMets reference agree.

REBYOTA (Fecal Microbiota)

BAYESIAN

FDA BLA 125739 — PUNCH CD2 (Phase 2b) & CD3 (Phase 3) for C. difficile infection

PUNCH CD2 (Phase 2b)

Data25/45 responders (55.6%)
Used inPrior, Borrowing, Sample Size
Scenarios11 tests (δ = 0–1)

PUNCH CD3 (Phase 3)

Data126/177 treat, 53/85 placebo
Used inTwo-Arm, Borrowing (MAP)
Cross-phase I²40–90% detected

Boundary Accuracy

Two independent benchmarks validate GSD boundary accuracy against the gsDesign R package: one for standard (non-survival) designs, one for survival/TTE designs with Lan-DeMets spending functions.

Table 2a

Standard GSD Boundaries vs gsDesign (non-survival)

DesignLooksMax DeviationStatus
O'Brien-Fleming20.0000Pass
O'Brien-Fleming30.0015Pass
O'Brien-Fleming40.0117Pass
O'Brien-Fleming50.0332Pass
Pocock20.0000Pass
Pocock30.0010Pass
Pocock40.0033Pass

Table 2b

Survival GSD Boundaries vs gsDesign (Lan-DeMets spending)

DesignLooksMax DeviationStatus
OBF (Lan-DeMets)30.0015Pass
OBF (Lan-DeMets)40.0117*Pass
OBF (Lan-DeMets)50.0332*Pass
Pocock (Lan-DeMets)30.0010Pass
Pocock (Lan-DeMets)40.0033Pass

* Deviations occur at later looks (k=4–5) due to accumulated multivariate normal integration precision differences between scipy and R's mvtnorm. Early looks match exactly (0.000). The max deviation of 0.033 is at OBF k=5 final look. Pocock boundaries show negligible deviation at all look counts.

Methodology

gsDesign

Group Sequential Design

Validated against the gold-standard gsDesign R package. O'Brien-Fleming and Pocock spending functions computed to match FDA submission standards. Survival/TTE via Schoenfeld.

VRF = 1 - r²

CUPED

Variance reduction validated against analytical formulas. Sample size reduction proportional to baseline-outcome correlation squared.

Beta(a+x, b+n-x)

Bayesian Toolkit

6 calculators validated end-to-end: conjugate posteriors, Zhou & Ji boundaries, survival log(HR) mapping, Clopper-Pearson MC calibration, power priors, MAP heterogeneity, and ESS-based elicitation.

Var(log HR) = 4/d

Survival/TTE

Event-driven designs validated via Schoenfeld variance mapping. GSD, Bayesian Sequential, and Bayesian Predictive Power all support time-to-event endpoints with HR-scale outputs.

CP(z, n)

Sample Size Re-estimation

Blinded and unblinded SSR validated against conditional power formulas. Zone classification, inflation caps, and threshold ordering verified for continuous, binary, and survival endpoints.

References

  1. 1. GSD: Jennison & Turnbull (2000) Group Sequential Methods with Applications to Clinical Trials
  2. 2. CUPED: Deng et al. (2013) Improving the Sensitivity of Online Controlled Experiments (WSDM)
  3. 3. Bayesian: Gelman et al. (2013) Bayesian Data Analysis
  4. 4. gsDesign: Anderson (2022) gsDesign R package
  5. 5. Bayesian Sequential: Zhou & Ji (2024) Bayesian sequential monitoring
  6. 6. Prior Elicitation: Morita, Thall & Müller (2008) Determining the effective sample size of a parametric prior
  7. 7. Survival: Schoenfeld (1983) Sample-size formula for the proportional-hazards regression model
  8. 8. SSR: Cui, Hung & Wang (1999) Modification of sample size in group sequential clinical trials

The only clinical trial design platform with public, continuously validated accuracy.

499 tests. 25 scripts. Every calculator validated.