Sample Size for Non-Inferiority and Equivalence Trials
Comprehensive power analysis for clinical trials demonstrating a new treatment is "not worse than" (non-inferiority) or "clinically equivalent to" (equivalence) an active control.
Contents
1. When to Use This Method
Use non-inferiority when:
- •The new treatment is expected to have similar efficacy to an established treatment
- •The new treatment offers advantages: reduced toxicity, lower cost, easier administration, better safety profile, or improved convenience
- •A placebo control is unethical because effective treatment exists
- •You want to show the new treatment is "not unacceptably worse"
Use equivalence when:
- •You need to demonstrate two treatments are clinically interchangeable
- •Generic drug approval (bioequivalence)
- •Biosimilar development
- •Comparing two active treatments where either direction of difference matters
Common Applications
- ✓Generic drug approvals
- ✓Biosimilar development
- ✓New formulation vs. existing formulation
- ✓Less invasive procedure vs. standard surgery
- ✓Shorter or lower dose treatment regimens
Do NOT Use When
- ✗You expect the new treatment to be superior (use superiority design)
- ✗No proven active control exists (use placebo-controlled superiority)
- ✗You cannot justify preservation of active control effect
- ✗The trial is designed to "fail to find a difference"
2. Mathematical Formulation
2.1 Hypothesis Structure
Non-inferiority (one-sided):
- • (treatment is inferior by at least δ)
- • (treatment is non-inferior)
Equivalence (two-sided):
- • (treatments differ by at least δ)
- • (treatments are equivalent)
Where = non-inferiority or equivalence margin.
2.2 Continuous Outcomes
Non-Inferiority:
Per-group sample size
If assuming (no true difference):
Simplified formula
Equivalence (TOST):
Note: uses z₁₋β/₂ because power is split between two one-sided tests
2.3 Binary Outcomes
Non-Inferiority:
General formula
If assuming :
Simplified for equal proportions
Equivalence:
TOST for binary outcomes
2.4 Survival Outcomes (Time-to-Event)
Non-Inferiority:
Events required; δ_HR = non-inferiority margin as hazard ratio
Where = non-inferiority margin expressed as hazard ratio (e.g., 1.3 means new treatment can be up to 30% worse).
2.5 Choosing the Non-Inferiority Margin (δ)
The margin must:
- 1.Preserve clinical benefit: Ensure the new treatment retains a meaningful fraction of the active control's effect vs. placebo
- 2.Be pre-specified: Cannot be chosen after seeing data
- 3.Be clinically justified: Based on what loss of efficacy is acceptable given other benefits
FDA 95-95 Method (Fixed Margin):
M₁ = lower bound of 95% CI for active control effect; M₂ = preservation fraction
Where:
- = lower bound of 95% CI for active control effect vs. placebo (from historical data)
- = preservation fraction (typically 0.5, meaning preserve at least 50% of effect)
2.6 Sample Size Comparison: NI vs. Superiority
For the same parameters, non-inferiority requires substantially larger samples:
| Design | Detectable Difference | n per group (σ=10, 80% power) |
|---|---|---|
| Superiority | Δ = 5 | 64 |
| Non-inferiority | δ = 5 | 86 |
| Non-inferiority | δ = 3 | 235 |
| Non-inferiority | δ = 2 | 527 |
| Equivalence | δ = 5 | 108 |
3. Assumptions
3.1 Core Assumptions
| Assumption | Description | Violation Consequence |
|---|---|---|
| Assay sensitivity | Trial can distinguish effective from ineffective treatment | Cannot infer efficacy vs. placebo; trial uninterpretable |
| Constancy | Active control effect is same as in historical trials | If control effect diminished, false NI conclusion possible |
| Margin validity | δ preserves clinically meaningful benefit | Approval of ineffective treatment ("biocreep") |
| Similar populations | Current trial population matches historical trials | Effect size may not translate |
3.2 Analysis Population Considerations
| Population | Superiority | Non-Inferiority |
|---|---|---|
| ITT | Primary (conservative) | Required but may be anti-conservative |
| Per-Protocol | Secondary | Required; often more conservative for NI |
| Concordance | Preferred | Required—both ITT and PP should show NI |
Critical Insight
In superiority trials, non-adherence biases toward the null (conservative). In non-inferiority trials, non-adherence makes groups look more similar, biasing toward false NI claims (anti-conservative).
3.3 The Biocreep Problem
If a treatment proven non-inferior becomes the active control for the next trial, and that pattern repeats, efficacy can gradually degrade:
Drug A (proven vs. placebo) → Drug B (NI to A) → Drug C (NI to B) → ...
Each step allows small losses, potentially resulting in Drug C being no better than placebo.
Prevention: Rigorous margin selection, three-arm trials (including placebo when ethical), requiring preservation of substantial effect fraction.
4. Regulatory Guidance
FDA Guidance
Non-Inferiority Clinical Trials to Establish Effectiveness (2016): Comprehensive guidance on margin selection, assay sensitivity, constancy assumption, and analysis requirements. Recommends 95-95 method for margin derivation. Requires both ITT and per-protocol analyses. Emphasizes pre-specification of margin with clinical and statistical justification.
ICH E10 (2000): Defines when active control (NI) designs are appropriate and requirements for historical evidence.
Biosimilars Guidance (2015): Specific requirements for equivalence demonstrations in biosimilar development.
EMA Guidance
CHMP Guideline on the Choice of Non-Inferiority Margin (2005): Margin should be "the largest difference that can be accepted as clinically irrelevant." Must be smaller than the smallest effect the active control would have vs. placebo.
Points to Consider on Switching Between Superiority and Non-Inferiority (2000): Pre-specification required. Cannot switch based on observed results.
Key Regulatory Requirements
- 1.Pre-specification: Margin, hypothesis, and analysis methods must be in protocol
- 2.Justification: Clinical and statistical rationale for margin documented
- 3.Historical evidence: Meta-analysis or systematic review supporting active control effect
- 4.Dual analysis: Both ITT and per-protocol required; conclusions must be concordant
- 5.Sensitivity analyses: Explore robustness to constancy assumption violations
5. Validation Against Industry Standards
Continuous Outcomes
| Scenario | Parameters | PASS 2024 | nQuery 9.5 | Zetyra |
|---|---|---|---|---|
| Non-inferiority | δ=5, σ=10, α=0.025, 80% power | 86/group | 86/group | 86/group |
| Non-inferiority | δ=3, σ=10, α=0.025, 80% power | 235/group | 235/group | 235/group |
| Equivalence (TOST) | δ=5, σ=10, α=0.05, 80% power | 108/group | 108/group | 108/group |
Binary Outcomes
| Scenario | Parameters | PASS 2024 | nQuery 9.5 | Zetyra |
|---|---|---|---|---|
| Non-inferiority | p=0.20, δ=0.10, α=0.025, 80% power | 199/group | 199/group | 199/group |
| Non-inferiority | p=0.80, δ=0.10, α=0.025, 80% power | 199/group | 199/group | 199/group |
| Equivalence | p=0.50, δ=0.15, α=0.05, 80% power | 174/group | 174/group | 174/group |
Survival Outcomes
| Scenario | Parameters | PASS 2024 | nQuery 9.5 | Zetyra |
|---|---|---|---|---|
| Non-inferiority | δ_HR=1.30, α=0.025, 80% power | 376 events | 376 events | 376 events |
| Non-inferiority | δ_HR=1.25, α=0.025, 80% power | 559 events | 560 events | 559 events |
Minor variations (±1 subject) may occur due to rounding and continuity correction options.
6. Example SAP Language
Non-Inferiority Trial (Continuous Outcome)
Non-Inferiority Trial (Binary Outcome)
Equivalence Trial (Bioequivalence)
7. R Code
Non-Inferiority: Continuous Outcomes
# Basic formula
ni_continuous <- function(delta, sd, alpha = 0.025, power = 0.80,
true_diff = 0) {
z_alpha <- qnorm(1 - alpha) # One-sided
z_beta <- qnorm(power)
n <- 2 * sd^2 * (z_alpha + z_beta)^2 / (delta - abs(true_diff))^2
ceiling(n)
}
# Example: δ = 5, σ = 10, assuming no true difference
ni_continuous(delta = 5, sd = 10)
# Result: 86 per group
# Smaller margin requires larger sample
ni_continuous(delta = 3, sd = 10)
# Result: 235 per groupEquivalence: Continuous Outcomes (TOST)
equiv_continuous <- function(delta, sd, alpha = 0.05, power = 0.80,
true_diff = 0) {
z_alpha <- qnorm(1 - alpha) # For each one-sided test
z_beta <- qnorm(1 - (1-power)/2) # Split power for TOST
n <- 2 * sd^2 * (z_alpha + z_beta)^2 / (delta - abs(true_diff))^2
ceiling(n)
}
# Example: δ = 5, σ = 10
equiv_continuous(delta = 5, sd = 10)
# Result: 108 per groupNon-Inferiority: Binary and Survival Outcomes
# Binary outcomes
ni_binary <- function(p, delta, alpha = 0.025, power = 0.80) {
z_alpha <- qnorm(1 - alpha)
z_beta <- qnorm(power)
n <- 2 * p * (1 - p) * (z_alpha + z_beta)^2 / delta^2
ceiling(n)
}
# Example: p = 0.20, δ = 0.10 (10 percentage points)
ni_binary(p = 0.20, delta = 0.10)
# Result: 199 per group
# Survival outcomes
ni_survival <- function(hr_margin, alpha = 0.025, power = 0.80) {
z_alpha <- qnorm(1 - alpha)
z_beta <- qnorm(power)
D <- 4 * (z_alpha + z_beta)^2 / (log(hr_margin))^2
ceiling(D)
}
# Example: NI margin HR = 1.3
ni_survival(hr_margin = 1.30)
# Result: 376 events
# Tighter margin requires more events
ni_survival(hr_margin = 1.25)
# Result: 559 eventsUsing TrialSize and gsDesign Packages
library(TrialSize) # Non-inferiority for two means TwoSampleMean.NIS( alpha = 0.025, # One-sided beta = 0.20, # 80% power sigma = 10, # Standard deviation k = 1, # Allocation ratio delta = 0, # True difference (assumed 0) margin = 5 # NI margin ) # Using gsDesign for survival NI library(gsDesign) nSurv( lambdaC = log(2)/12, # Control hazard (median 12 months) hr = 1.0, # Assumed true HR (no difference) hr0 = 1.3, # NI margin eta = 0, # Dropout rate gamma = 10, # Accrual rate/month R = 24, # Accrual period T = 48, # Total duration ratio = 1, # Allocation ratio alpha = 0.025, # One-sided beta = 0.20, # 80% power sided = 1 # One-sided test )
Bioequivalence (PowerTOST) and Margin Calculation
# Using PowerTOST package for bioequivalence
library(PowerTOST)
sampleN.TOST(
alpha = 0.05, # Two one-sided tests at 0.05 each
targetpower = 0.80, # 80% power
logscale = TRUE, # Log-transformed data (standard for PK)
theta0 = 0.95, # Expected true ratio
theta1 = 0.80, # Lower equivalence limit
theta2 = 1.25, # Upper equivalence limit
CV = 0.25, # Within-subject CV
design = "2x2" # Crossover design
)
# Margin calculation (95-95 method)
calculate_ni_margin <- function(historical_effect, historical_se,
preservation = 0.5) {
# M1: Lower bound of 95% CI for historical effect
M1 <- historical_effect - 1.96 * historical_se
# M2: Preservation fraction
M2 <- preservation
# NI margin
delta <- M1 * M2
list(
M1 = M1,
M2 = M2,
margin = delta,
interpretation = paste0("Margin of ", round(delta, 2),
" preserves at least ", M2*100,
"% of historical effect")
)
}
# Example: Historical effect = 10 units, SE = 2
calculate_ni_margin(historical_effect = 10, historical_se = 2)
# M1 = 6.08, margin = 3.048. References
FDA Guidance. Non-Inferiority Clinical Trials to Establish Effectiveness. 2016.
ICH E10. Choice of Control Group in Clinical Trials. 2000.
CHMP. Guideline on the Choice of the Non-Inferiority Margin. EMEA/CPMP/EWP/2158/99. 2005.
Snapinn S, Jiang Q. Controlling the type I error rate in non-inferiority trials. Statistics in Medicine. 2008;27(3):371-381.
Flight L, Julious SA. Practical guide to sample size calculations: non-inferiority and equivalence trials. Pharmaceutical Statistics. 2016;15(1):80-89.
D'Agostino RB, Massaro JM, Sullivan LM. Non-inferiority trials: design concepts and issues. Statistics in Medicine. 2003;22(2):169-186.
Blackwelder WC. "Proving the null hypothesis" in clinical trials. Controlled Clinical Trials. 1982;3(4):345-353.
Ready to Calculate?
Use our Sample Size Calculator to determine subjects needed for your non-inferiority or equivalence trial.