Docs/Guides/Non-Inferiority & Equivalence

Sample Size for Non-Inferiority and Equivalence Trials

Comprehensive power analysis for clinical trials demonstrating a new treatment is "not worse than" (non-inferiority) or "clinically equivalent to" (equivalence) an active control.

1. When to Use This Method

Use non-inferiority when:

  • The new treatment is expected to have similar efficacy to an established treatment
  • The new treatment offers advantages: reduced toxicity, lower cost, easier administration, better safety profile, or improved convenience
  • A placebo control is unethical because effective treatment exists
  • You want to show the new treatment is "not unacceptably worse"

Use equivalence when:

  • You need to demonstrate two treatments are clinically interchangeable
  • Generic drug approval (bioequivalence)
  • Biosimilar development
  • Comparing two active treatments where either direction of difference matters

Common Applications

  • Generic drug approvals
  • Biosimilar development
  • New formulation vs. existing formulation
  • Less invasive procedure vs. standard surgery
  • Shorter or lower dose treatment regimens

Do NOT Use When

  • You expect the new treatment to be superior (use superiority design)
  • No proven active control exists (use placebo-controlled superiority)
  • You cannot justify preservation of active control effect
  • The trial is designed to "fail to find a difference"

2. Mathematical Formulation

2.1 Hypothesis Structure

Non-inferiority (one-sided):

  • H0:μTμCδH_0: \mu_T - \mu_C \leq -\delta (treatment is inferior by at least δ)
  • H1:μTμC>δH_1: \mu_T - \mu_C > -\delta (treatment is non-inferior)

Equivalence (two-sided):

  • H0:μTμCδH_0: |\mu_T - \mu_C| \geq \delta (treatments differ by at least δ)
  • H1:μTμC<δH_1: |\mu_T - \mu_C| < \delta (treatments are equivalent)

Where δ\delta = non-inferiority or equivalence margin.

2.2 Continuous Outcomes

Non-Inferiority:

n=2σ2(z1α+z1β)2(δμTμC)2n = \frac{2\sigma^2(z_{1-\alpha} + z_{1-\beta})^2}{(\delta - |\mu_T - \mu_C|)^2}

Per-group sample size

If assuming μT=μC\mu_T = \mu_C (no true difference):

n=2σ2(z1α+z1β)2δ2n = \frac{2\sigma^2(z_{1-\alpha} + z_{1-\beta})^2}{\delta^2}

Simplified formula

Equivalence (TOST):

n=2σ2(z1α+z1β/2)2(δμTμC)2n = \frac{2\sigma^2(z_{1-\alpha} + z_{1-\beta/2})^2}{(\delta - |\mu_T - \mu_C|)^2}

Note: uses z₁₋β/₂ because power is split between two one-sided tests

2.3 Binary Outcomes

Non-Inferiority:

n=[z1α2pˉ(1pˉ)+z1βpT(1pT)+pC(1pC)]2(δpTpC)2n = \frac{[z_{1-\alpha}\sqrt{2\bar{p}(1-\bar{p})} + z_{1-\beta}\sqrt{p_T(1-p_T) + p_C(1-p_C)}]^2}{(\delta - |p_T - p_C|)^2}

General formula

If assuming pT=pC=pp_T = p_C = p:

n=2p(1p)(z1α+z1β)2δ2n = \frac{2p(1-p)(z_{1-\alpha} + z_{1-\beta})^2}{\delta^2}

Simplified for equal proportions

Equivalence:

n=2p(1p)(z1α+z1β/2)2(δpTpC)2n = \frac{2p(1-p)(z_{1-\alpha} + z_{1-\beta/2})^2}{(\delta - |p_T - p_C|)^2}

TOST for binary outcomes

2.4 Survival Outcomes (Time-to-Event)

Non-Inferiority:

D=4(z1α+z1β)2[log(δHR)]2D = \frac{4(z_{1-\alpha} + z_{1-\beta})^2}{[\log(\delta_{HR})]^2}

Events required; δ_HR = non-inferiority margin as hazard ratio

Where δHR\delta_{HR} = non-inferiority margin expressed as hazard ratio (e.g., 1.3 means new treatment can be up to 30% worse).

2.5 Choosing the Non-Inferiority Margin (δ)

The margin must:

  1. 1.Preserve clinical benefit: Ensure the new treatment retains a meaningful fraction of the active control's effect vs. placebo
  2. 2.Be pre-specified: Cannot be chosen after seeing data
  3. 3.Be clinically justified: Based on what loss of efficacy is acceptable given other benefits

FDA 95-95 Method (Fixed Margin):

δ=M1×M2\delta = M_1 \times M_2

M₁ = lower bound of 95% CI for active control effect; M₂ = preservation fraction

Where:

  • M1M_1 = lower bound of 95% CI for active control effect vs. placebo (from historical data)
  • M2M_2 = preservation fraction (typically 0.5, meaning preserve at least 50% of effect)

2.6 Sample Size Comparison: NI vs. Superiority

For the same parameters, non-inferiority requires substantially larger samples:

DesignDetectable Differencen per group (σ=10, 80% power)
SuperiorityΔ = 564
Non-inferiorityδ = 586
Non-inferiorityδ = 3235
Non-inferiorityδ = 2527
Equivalenceδ = 5108

3. Assumptions

3.1 Core Assumptions

AssumptionDescriptionViolation Consequence
Assay sensitivityTrial can distinguish effective from ineffective treatmentCannot infer efficacy vs. placebo; trial uninterpretable
ConstancyActive control effect is same as in historical trialsIf control effect diminished, false NI conclusion possible
Margin validityδ preserves clinically meaningful benefitApproval of ineffective treatment ("biocreep")
Similar populationsCurrent trial population matches historical trialsEffect size may not translate

3.2 Analysis Population Considerations

PopulationSuperiorityNon-Inferiority
ITTPrimary (conservative)Required but may be anti-conservative
Per-ProtocolSecondaryRequired; often more conservative for NI
ConcordancePreferredRequired—both ITT and PP should show NI

Critical Insight

In superiority trials, non-adherence biases toward the null (conservative). In non-inferiority trials, non-adherence makes groups look more similar, biasing toward false NI claims (anti-conservative).

3.3 The Biocreep Problem

If a treatment proven non-inferior becomes the active control for the next trial, and that pattern repeats, efficacy can gradually degrade:

Drug A (proven vs. placebo) → Drug B (NI to A) → Drug C (NI to B) → ...

Each step allows small losses, potentially resulting in Drug C being no better than placebo.

Prevention: Rigorous margin selection, three-arm trials (including placebo when ethical), requiring preservation of substantial effect fraction.

4. Regulatory Guidance

FDA Guidance

Non-Inferiority Clinical Trials to Establish Effectiveness (2016): Comprehensive guidance on margin selection, assay sensitivity, constancy assumption, and analysis requirements. Recommends 95-95 method for margin derivation. Requires both ITT and per-protocol analyses. Emphasizes pre-specification of margin with clinical and statistical justification.

ICH E10 (2000): Defines when active control (NI) designs are appropriate and requirements for historical evidence.

Biosimilars Guidance (2015): Specific requirements for equivalence demonstrations in biosimilar development.

EMA Guidance

CHMP Guideline on the Choice of Non-Inferiority Margin (2005): Margin should be "the largest difference that can be accepted as clinically irrelevant." Must be smaller than the smallest effect the active control would have vs. placebo.

Points to Consider on Switching Between Superiority and Non-Inferiority (2000): Pre-specification required. Cannot switch based on observed results.

Key Regulatory Requirements

  1. 1.Pre-specification: Margin, hypothesis, and analysis methods must be in protocol
  2. 2.Justification: Clinical and statistical rationale for margin documented
  3. 3.Historical evidence: Meta-analysis or systematic review supporting active control effect
  4. 4.Dual analysis: Both ITT and per-protocol required; conclusions must be concordant
  5. 5.Sensitivity analyses: Explore robustness to constancy assumption violations

5. Validation Against Industry Standards

Continuous Outcomes

ScenarioParametersPASS 2024nQuery 9.5Zetyra
Non-inferiorityδ=5, σ=10, α=0.025, 80% power86/group86/group86/group
Non-inferiorityδ=3, σ=10, α=0.025, 80% power235/group235/group235/group
Equivalence (TOST)δ=5, σ=10, α=0.05, 80% power108/group108/group108/group

Binary Outcomes

ScenarioParametersPASS 2024nQuery 9.5Zetyra
Non-inferiorityp=0.20, δ=0.10, α=0.025, 80% power199/group199/group199/group
Non-inferiorityp=0.80, δ=0.10, α=0.025, 80% power199/group199/group199/group
Equivalencep=0.50, δ=0.15, α=0.05, 80% power174/group174/group174/group

Survival Outcomes

ScenarioParametersPASS 2024nQuery 9.5Zetyra
Non-inferiorityδ_HR=1.30, α=0.025, 80% power376 events376 events376 events
Non-inferiorityδ_HR=1.25, α=0.025, 80% power559 events560 events559 events

Minor variations (±1 subject) may occur due to rounding and continuity correction options.

6. Example SAP Language

Non-Inferiority Trial (Continuous Outcome)

Sample Size Justification This is a non-inferiority trial comparing [new treatment] to [active control]. The primary endpoint is [outcome] at Week [X]. Non-inferiority margin: Based on historical trials (Author et al., Year), the active control demonstrated a treatment effect of [effect] (95% CI: [lower, upper]) compared to placebo. Using the 95-95 method with 50% preservation, the non-inferiority margin is set at δ = [margin] units. This margin ensures that if non-inferiority is demonstrated, the new treatment retains at least 50% of the active control's benefit over placebo. Sample size: Assuming a standard deviation of [σ] (from Author et al., Year), no true difference between treatments, a one-sided significance level of 0.025, and 80% power, [n] subjects per group are required. To account for [X]% dropout, we will enroll [N*] subjects per group. Analysis: Non-inferiority will be concluded if the lower bound of the two-sided 95% confidence interval for the treatment difference (new − control) is greater than −[δ]. Both intention-to-treat and per-protocol populations will be analyzed; conclusions require concordance across both.

Non-Inferiority Trial (Binary Outcome)

Sample Size Justification The primary endpoint is the proportion of subjects achieving [response criterion] at Week [X]. This is a non-inferiority trial comparing [new treatment] to [active control]. Based on historical data, the response rate with the active control is approximately [p]%. We assume the new treatment will achieve a similar response rate. The non-inferiority margin is set at δ = [margin]% (absolute difference), based on [clinical justification]. Using a one-sided test with α = 0.025 and 80% power, [n] subjects per group are required. Accounting for [X]% dropout, target enrollment is [N*] per group.

Equivalence Trial (Bioequivalence)

Sample Size Justification This is a bioequivalence study comparing [test formulation] to [reference formulation] using a 2×2 crossover design. The primary endpoints are AUC and Cmax. Bioequivalence will be concluded if the 90% confidence interval for the geometric mean ratio (test/reference) falls entirely within [0.80, 1.25] for both endpoints. Assuming within-subject CV of [X]% for AUC (from Author et al., Year), a true ratio of 1.0, and 80% power, [n] subjects are required to complete the study. Accounting for [X]% dropout, we will enroll [N*] subjects.

7. R Code

Non-Inferiority: Continuous Outcomes

# Basic formula
ni_continuous <- function(delta, sd, alpha = 0.025, power = 0.80,
                          true_diff = 0) {
  z_alpha <- qnorm(1 - alpha)  # One-sided
  z_beta <- qnorm(power)

  n <- 2 * sd^2 * (z_alpha + z_beta)^2 / (delta - abs(true_diff))^2
  ceiling(n)
}

# Example: δ = 5, σ = 10, assuming no true difference
ni_continuous(delta = 5, sd = 10)
# Result: 86 per group

# Smaller margin requires larger sample
ni_continuous(delta = 3, sd = 10)
# Result: 235 per group

Equivalence: Continuous Outcomes (TOST)

equiv_continuous <- function(delta, sd, alpha = 0.05, power = 0.80,
                             true_diff = 0) {
  z_alpha <- qnorm(1 - alpha)      # For each one-sided test
  z_beta <- qnorm(1 - (1-power)/2) # Split power for TOST

  n <- 2 * sd^2 * (z_alpha + z_beta)^2 / (delta - abs(true_diff))^2
  ceiling(n)
}

# Example: δ = 5, σ = 10
equiv_continuous(delta = 5, sd = 10)
# Result: 108 per group

Non-Inferiority: Binary and Survival Outcomes

# Binary outcomes
ni_binary <- function(p, delta, alpha = 0.025, power = 0.80) {
  z_alpha <- qnorm(1 - alpha)
  z_beta <- qnorm(power)

  n <- 2 * p * (1 - p) * (z_alpha + z_beta)^2 / delta^2
  ceiling(n)
}

# Example: p = 0.20, δ = 0.10 (10 percentage points)
ni_binary(p = 0.20, delta = 0.10)
# Result: 199 per group

# Survival outcomes
ni_survival <- function(hr_margin, alpha = 0.025, power = 0.80) {
  z_alpha <- qnorm(1 - alpha)
  z_beta <- qnorm(power)

  D <- 4 * (z_alpha + z_beta)^2 / (log(hr_margin))^2
  ceiling(D)
}

# Example: NI margin HR = 1.3
ni_survival(hr_margin = 1.30)
# Result: 376 events

# Tighter margin requires more events
ni_survival(hr_margin = 1.25)
# Result: 559 events

Using TrialSize and gsDesign Packages

library(TrialSize)

# Non-inferiority for two means
TwoSampleMean.NIS(
  alpha = 0.025,      # One-sided
  beta = 0.20,        # 80% power
  sigma = 10,         # Standard deviation
  k = 1,              # Allocation ratio
  delta = 0,          # True difference (assumed 0)
  margin = 5          # NI margin
)

# Using gsDesign for survival NI
library(gsDesign)

nSurv(
  lambdaC = log(2)/12,   # Control hazard (median 12 months)
  hr = 1.0,              # Assumed true HR (no difference)
  hr0 = 1.3,             # NI margin
  eta = 0,               # Dropout rate
  gamma = 10,            # Accrual rate/month
  R = 24,                # Accrual period
  T = 48,                # Total duration
  ratio = 1,             # Allocation ratio
  alpha = 0.025,         # One-sided
  beta = 0.20,           # 80% power
  sided = 1              # One-sided test
)

Bioequivalence (PowerTOST) and Margin Calculation

# Using PowerTOST package for bioequivalence
library(PowerTOST)

sampleN.TOST(
  alpha = 0.05,           # Two one-sided tests at 0.05 each
  targetpower = 0.80,     # 80% power
  logscale = TRUE,        # Log-transformed data (standard for PK)
  theta0 = 0.95,          # Expected true ratio
  theta1 = 0.80,          # Lower equivalence limit
  theta2 = 1.25,          # Upper equivalence limit
  CV = 0.25,              # Within-subject CV
  design = "2x2"          # Crossover design
)

# Margin calculation (95-95 method)
calculate_ni_margin <- function(historical_effect, historical_se,
                                 preservation = 0.5) {
  # M1: Lower bound of 95% CI for historical effect
  M1 <- historical_effect - 1.96 * historical_se

  # M2: Preservation fraction
  M2 <- preservation

  # NI margin
  delta <- M1 * M2

  list(
    M1 = M1,
    M2 = M2,
    margin = delta,
    interpretation = paste0("Margin of ", round(delta, 2),
                           " preserves at least ", M2*100,
                           "% of historical effect")
  )
}

# Example: Historical effect = 10 units, SE = 2
calculate_ni_margin(historical_effect = 10, historical_se = 2)
# M1 = 6.08, margin = 3.04

8. References

FDA Guidance. Non-Inferiority Clinical Trials to Establish Effectiveness. 2016.

ICH E10. Choice of Control Group in Clinical Trials. 2000.

CHMP. Guideline on the Choice of the Non-Inferiority Margin. EMEA/CPMP/EWP/2158/99. 2005.

Snapinn S, Jiang Q. Controlling the type I error rate in non-inferiority trials. Statistics in Medicine. 2008;27(3):371-381.

Flight L, Julious SA. Practical guide to sample size calculations: non-inferiority and equivalence trials. Pharmaceutical Statistics. 2016;15(1):80-89.

D'Agostino RB, Massaro JM, Sullivan LM. Non-inferiority trials: design concepts and issues. Statistics in Medicine. 2003;22(2):169-186.

Blackwelder WC. "Proving the null hypothesis" in clinical trials. Controlled Clinical Trials. 1982;3(4):345-353.

Ready to Calculate?

Use our Sample Size Calculator to determine subjects needed for your non-inferiority or equivalence trial.

Sample Size Calculator

Related Guides