Docs/Guides/Binary Outcomes

Sample Size for Binary Outcomes

Comprehensive power analysis for clinical trials with dichotomous endpoints (e.g., response rates, event incidence, success/failure).

1. When to Use This Method

Use this methodology when:

  • Your primary endpoint is a binary outcome (yes/no, success/failure, event/no event)
  • You are comparing proportions between two or more groups
  • You need to power a superiority, non-inferiority, or equivalence trial

Common Applications

Response rates (responder vs. non-responder)
Mortality or adverse event incidence
Cure rates (cured vs. not cured)
Conversion rates (A/B testing)
Disease recurrence (yes/no)

Do NOT Use When

  • • Your outcome is continuous (use means comparison method)
  • • Your outcome is time-to-event with censoring (use survival analysis method)
  • • Your outcome is a count with no upper bound (use Poisson method)
  • • You have paired/matched binary data (use McNemar's test method)

2. Mathematical Formulation

2.1 Two-Sample Parallel Design (Superiority)

For a randomized trial comparing intervention (pIp_I) to control (pCp_C), the sample size per group:

n=[z1α/22pˉ(1pˉ)+z1βpI(1pI)+pC(1pC)]2(pIpC)2n = \frac{[z_{1-\alpha/2}\sqrt{2\bar{p}(1-\bar{p})} + z_{1-\beta}\sqrt{p_I(1-p_I) + p_C(1-p_C)}]^2}{(p_I - p_C)^2}
SymbolDescription
pˉ\bar{p}Pooled proportion under H₀: (pI+pC)/2(p_I + p_C)/2
z1α/2z_{1-\alpha/2}Critical value for Type I error (1.96 for α = 0.05, two-sided)
z1βz_{1-\beta}Critical value for power (0.84 for 80%; 1.28 for 90%)

Simplified approximation (equal groups):

n=2pˉ(1pˉ)(z1α/2+z1β)2(pIpC)2n = \frac{2\bar{p}(1-\bar{p})(z_{1-\alpha/2} + z_{1-\beta})^2}{(p_I - p_C)^2}

2.2 Unequal Allocation

For allocation ratio k=nC/nIk = n_C/n_I:

nI=[z1α/2(1+1/k)pˉ(1pˉ)+z1βpI(1pI)+pC(1pC)/k]2(pIpC)2n_I = \frac{[z_{1-\alpha/2}\sqrt{(1+1/k)\bar{p}(1-\bar{p})} + z_{1-\beta}\sqrt{p_I(1-p_I) + p_C(1-p_C)/k}]^2}{(p_I - p_C)^2}

With nC=k×nIn_C = k \times n_I.

2.3 Non-Inferiority Design

For testing whether the new treatment is no worse than control by margin δ\delta:

n=[z1α+z1β]2[pI(1pI)+pC(1pC)](pIpC+δ)2n = \frac{[z_{1-\alpha} + z_{1-\beta}]^2 [p_I(1-p_I) + p_C(1-p_C)]}{(p_I - p_C + \delta)^2}

Note: One-sided α (typically 0.025) is standard for non-inferiority. Non-inferiority margins are typically small, resulting in substantially larger sample sizes than superiority trials.

2.4 Equivalence Design

For testing whether treatments differ by no more than ±δ\pm\delta:

n=[z1α+z1β/2]2[pI(1pI)+pC(1pC)](δpIpC)2n = \frac{[z_{1-\alpha} + z_{1-\beta/2}]^2 [p_I(1-p_I) + p_C(1-p_C)]}{(\delta - |p_I - p_C|)^2}

2.5 Clustered Designs

When observations are nested within clusters, apply the variance inflation factor (design effect):

nclustered=nsimple×[1+(m1)ρ]n_{\text{clustered}} = n_{\text{simple}} \times [1 + (m-1)\rho]

For unequal cluster sizes, adjust using coefficient of variation (CV):

nclustered=nsimple×[1+(m1)ρ]×[1+CV2]n_{\text{clustered}} = n_{\text{simple}} \times [1 + (m-1)\rho] \times [1 + CV^2]

2.6 Continuity Correction

For small samples or proportions near 0 or 1, apply continuity correction:

ncorrected=n4(1+1+4npIpC)2n_{\text{corrected}} = \frac{n}{4}\left(1 + \sqrt{1 + \frac{4}{n|p_I - p_C|}}\right)^2

2.7 Dropout Adjustment

Inflate sample size to account for anticipated dropout:

N=N(1d)2N^* = \frac{N}{(1 - d)^2}

Where dd = expected dropout rate.

3. Assumptions

3.1 Core Assumptions

AssumptionTestable CriterionViolation Consequence
IndependenceStudy design ensures no clusteringSevere: inflated Type I error if ignored
Fixed proportionsEvent rates stable over enrollment periodModerate: time-varying rates may require stratification
Large samplen×p5n \times p \geq 5 and n×(1p)5n \times (1-p) \geq 5Use exact methods (Fisher's) if violated
No confoundingRandomization successfulBias in effect estimate

3.2 Parameter Estimates

Control rate (pCp_C)

Should come from prior studies, pilot data, or published literature. Consider secular trends—rates may have changed since historical studies.

Treatment effect

Can be specified as absolute difference (pIpCp_I - p_C), relative risk (pI/pCp_I/p_C), or odds ratio. Ensure clinical relevance, not just statistical detectability.

Event Rate Impact on Sample Size

Control Rate25% Relative ReductionRequired n/group (80% power)
40%40% → 30%356
20%20% → 15%906
10%10% → 7.5%1,996
5%5% → 3.75%4,182

4. Regulatory Guidance

FDA

ICH E9 (Statistical Principles for Clinical Trials)

Requires prospective sample size justification with clearly stated assumptions for event rates and effect sizes.

FDA Guidance on Non-Inferiority Trials (2016)

Non-inferiority margin must preserve a clinically meaningful fraction of the active control effect. Recommends the 95-95 method or fixed margin approach.

FDA Guidance on Multiple Endpoints (2022)

When multiple binary endpoints are co-primary, apply multiplicity adjustment (e.g., Bonferroni: α/k), which increases required sample size.

EMA

CHMP Guideline on Non-Inferiority (2005)

Margin selection must be justified based on historical evidence of active control efficacy vs. placebo.

EMA Points to Consider on Switching

Pre-specification required for switching between superiority and non-inferiority; cannot switch post-hoc based on results.

Key Citations

  1. ICH E9: Statistical Principles for Clinical Trials (1998)
  2. FDA Guidance: Non-Inferiority Clinical Trials to Establish Effectiveness (2016)
  3. FDA Guidance: Multiple Endpoints in Clinical Trials (2022)
  4. CHMP: Guideline on the Choice of the Non-Inferiority Margin (2005)

5. Validation Against Industry Standards

ScenarioParametersPASS 2024nQuery 9.5ZetyraStatus
Two-proportion (superiority)p₁=0.30, p₂=0.20, α=0.05, power=0.80294/group294/group294/group✓ Match
Two-proportion (superiority)p₁=0.30, p₂=0.20, α=0.05, power=0.90392/group393/group392/group✓ Match
Non-inferiorityp₁=p₂=0.20, δ=0.10, α=0.025, power=0.80199/group199/group199/group✓ Match
Cluster RCTp=0.25, ICC=0.05, m=20582/group583/group582/group✓ Match

Minor variations (±1 subject) may occur due to rounding conventions and continuity correction options.

6. Example SAP Language

Superiority Trial

The primary endpoint is the proportion of subjects achieving [response criterion] at Week [X]. Based on prior studies (Author et al., Year), the expected response rate in the control group is [p_C]%. We hypothesize that the intervention will achieve a response rate of [p_I]%, representing an absolute improvement of [difference]%.

Using a two-sided chi-square test with α = 0.05 and 80% power, [n] subjects per group are required. To account for an anticipated dropout rate of [X]%, we will enroll [N*] subjects per group ([total] subjects total).

Calculations were performed using [Zetyra / PASS / nQuery] and validated against published formulas (Fleiss et al., 2003).

Non-Inferiority Trial

The primary endpoint is the proportion of subjects achieving [outcome] at Week [X]. This is a non-inferiority trial comparing [new treatment] to [active control].

Based on historical trials (Author et al., Year), the active control achieves a response rate of approximately [p_C]%. We assume the new treatment will have a similar response rate. The non-inferiority margin is set at [δ]%, which preserves at least [X]% of the historical treatment effect over placebo, consistent with FDA guidance.

Using a one-sided test with α = 0.025 and 80% power, [n] subjects per group are required. To account for an anticipated dropout rate of [X]%, we will enroll [N*] subjects per group.

7. R Code

# Two-proportion superiority test
library(pwr)

# Method 1: Using pwr package (effect size h)
p1 <- 0.30  # Intervention proportion
p2 <- 0.20  # Control proportion
h <- ES.h(p1, p2)  # Cohen's h effect size

pwr.2p.test(
  h = h,
  sig.level = 0.05,
  power = 0.80,
  alternative = "two.sided"
)
# Result: n = 294 per group

# Method 2: Using power.prop.test (base R)
power.prop.test(
  p1 = 0.30,
  p2 = 0.20,
  sig.level = 0.05,
  power = 0.80,
  alternative = "two.sided"
)
# Result: n = 294 per group

# Non-inferiority test
# Using TrialSize package
library(TrialSize)

p_control <- 0.20
p_treatment <- 0.20  # Assume equal under H1
delta <- 0.10        # Non-inferiority margin
alpha <- 0.025       # One-sided

# Manual calculation
z_alpha <- qnorm(1 - alpha)
z_beta <- qnorm(0.80)
var_sum <- p_treatment*(1-p_treatment) + p_control*(1-p_control)

n_ni <- ((z_alpha + z_beta)^2 * var_sum) / (delta)^2
ceiling(n_ni)
# Result: n = 199 per group

# Cluster RCT adjustment
n_simple <- 294
m <- 20          # cluster size
icc <- 0.05      # intraclass correlation
deff <- 1 + (m - 1) * icc  # design effect = 1.95
n_cluster <- ceiling(n_simple * deff)
# Result: n = 574 per group

# Dropout adjustment
dropout_rate <- 0.15
n_adjusted <- ceiling(n_cluster / (1 - dropout_rate)^2)
# Result: n = 795 per group

References

  1. Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. 3rd ed. Wiley; 2003.
  2. Chow SC, Shao J, Wang H, Lokhnygina Y. Sample Size Calculations in Clinical Research. 3rd ed. CRC Press; 2017.
  3. Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. Wiley; 2000.
  4. Blackwelder WC. "Proving the null hypothesis" in clinical trials.Controlled Clinical Trials. 1982;3(4):345-353.
  5. FDA Guidance. Non-Inferiority Clinical Trials to Establish Effectiveness. 2016.

Last updated: December 2024 | Validated against PASS 2024, nQuery 9.5

Ready to calculate your sample size?

Use our Chi-Square Calculator to determine the sample size needed for comparing proportions between groups.

Open Chi-Square Calculator