Two-Arm Bayesian Trial Design
Technical documentation for sample size determination in two-arm randomized Bayesian trials. Supports superiority and non-inferiority designs with flexible allocation ratios and independent arm priors.
Contents
1. Overview & Design Settings
This calculator determines the sample size for two-arm randomized Bayesian trials comparing a treatment arm to a control arm. It supports both superiority and non-inferiority designs with binary endpoints.
Design Types
Superiority
Test whether treatment is better than control
Non-Inferiority
Test whether treatment is not worse by margin
Comparison Metrics
Risk Difference
Absolute difference in response rates
Relative Risk
Ratio of response rates (superiority only)
Non-Inferiority + Ratio Not Supported
Non-inferiority margins are defined on the difference scale. Ratio-based non-inferiority (e.g., ) requires different margin interpretation and is not currently supported.
2. Statistical Model
Each arm is modeled with an independent Beta-Binomial conjugate prior:
Prior Distributions
Treatment Arm
Control Arm
Likelihood
Given responses in treatment patients and responses in control patients:
Posterior Distributions
Shared vs. Independent Priors
By default, both arms use the same prior (). Independent priors can be specified when arms have different expected rates (e.g., control rate is well-established from historical data).
3. Decision Rules
Superiority Decision Rule
For risk difference comparison:
For relative risk comparison:
Non-Inferiority Decision Rule
With margin :
Monte Carlo Probability Estimation
The posterior probability is computed via Monte Carlo sampling:
# Draw samples from posteriors
θ_T_samples = Beta(α_T + k_T, β_T + n_T - k_T).sample(10000)
θ_C_samples = Beta(α_C + k_C, β_C + n_C - k_C).sample(10000)
# Compute comparison metric
if comparison == "difference":
metric = θ_T_samples - θ_C_samples
threshold = margin # 0 for superiority, -δ for NI
else: # ratio
metric = θ_T_samples / θ_C_samples
threshold = 1.0
# Estimate probability
P_success = mean(metric > threshold)4. Operating Characteristics
Operating characteristics are computed via nested Monte Carlo simulation to evaluate the design under different true parameter values.
Type I Error
Probability of success when null is true
Superiority:
NI:
Power
Probability of success when alternative is true
Superiority:
NI: (treatment equivalent)
Simulation Algorithm
for sim in 1...N_simulations:
# Under null hypothesis
k_T_null = Binomial(n_T, θ_T_null)
k_C_null = Binomial(n_C, θ_C_null)
if P(θ_T > θ_C | k_T_null, k_C_null) >= γ:
type1_count += 1
# Under alternative hypothesis
k_T_alt = Binomial(n_T, θ_T_alt)
k_C_alt = Binomial(n_C, θ_C_alt)
if P(θ_T > θ_C | k_T_alt, k_C_alt) >= γ:
power_count += 1
type1_error = type1_count / N_simulations
power = power_count / N_simulationsComputational Note
The default uses 5,000 outer simulations × 5,000 inner Monte Carlo samples for posterior probability estimation. The final operating characteristics use 3× simulations (15,000) for higher precision.
5. Allocation Ratios
The allocation ratio determines how patients are distributed between arms. Given a control sample size:
Common Allocation Ratios
| Ratio | Distribution | When to Use |
|---|---|---|
| 1:1 | Equal | Most efficient statistically; standard choice |
| 2:1 | More on treatment | Enhance safety data or patient preference for active treatment |
| 1:2 | More on control | Rare; when control data has added value |
| 3:1 | Heavy treatment | Ethical concerns with randomization to control |
Power Impact of Unequal Allocation
Unequal allocation (e.g., 2:1) requires ~10-15% more total patients than 1:1 to achieve the same power. The calculator accounts for this when searching for the optimal sample size.
6. Power Curve Analysis
The power curve shows how the probability of declaring success varies with the true treatment effect. This is computed across a range of effect sizes at the recommended sample size.
Power Curve Grid
For risk difference, power is evaluated at:
Key points on the curve:
- : Type I error rate (should be ≤ 0.05)
- : Power at design alternative (should be ≥ 0.80)
- Crossover point: Effect size where power = 50% (minimum detectable effect)
7. Regulatory Considerations
FDA Guidance on Two-Arm Bayesian Trials
Per the January 2026 guidance, sponsors must demonstrate that Bayesian two-arm designs maintain appropriate frequentist error control and provide operating characteristics under both null and alternative hypotheses.
SAP Documentation Checklist
- Design Type: Superiority or non-inferiority with margin justification
- Priors: Report with ESS and source documentation
- Decision Rule: Threshold , comparison metric, and success criterion
- Sample Size: with allocation ratio justification
- Operating Characteristics: Type I error and power with simulation parameters
- Sensitivity Analysis: Prior sensitivity and sample size robustness
Non-Inferiority Margin Justification
For non-inferiority designs, the margin must be pre-specified and justified based on:
- Clinical significance (smallest effect that matters to patients)
- Historical effect size of active control vs. placebo
- Regulatory precedent in the therapeutic area
8. API Quick Reference
Key Parameters
| Parameter | Type | Description |
|---|---|---|
| control_rate | float | Expected control arm response rate |
| treatment_effect | float | Expected difference θ_T - θ_C |
| design_type | string | "superiority" | "non_inferiority" |
| decision_threshold | float | Posterior probability threshold γ (default: 0.95) |
| allocation_ratio | float | n_T / n_C ratio (default: 1.0) |
Key Response Fields
recommended_n_per_arm— Sample size per armrecommended_n_total— Total sample sizeoperating_characteristics— Type I error and powerdecision_rule— Design type and interpretation