Sample Size for Binary Outcomes
Comprehensive power analysis for clinical trials with dichotomous endpoints (e.g., response rates, event incidence, success/failure).
Contents
1. When to Use This Method
Use this methodology when:
- Your primary endpoint is a binary outcome (yes/no, success/failure, event/no event)
- You are comparing proportions between two or more groups
- You need to power a superiority, non-inferiority, or equivalence trial
Common Applications
Do NOT Use When
- • Your outcome is continuous (use means comparison method)
- • Your outcome is time-to-event with censoring (use survival analysis method)
- • Your outcome is a count with no upper bound (use Poisson method)
- • You have paired/matched binary data (use McNemar's test method)
2. Mathematical Formulation
2.1 Two-Sample Parallel Design (Superiority)
For a randomized trial comparing intervention () to control (), the sample size per group:
| Symbol | Description |
|---|---|
| Pooled proportion under H₀: | |
| Critical value for Type I error (1.96 for α = 0.05, two-sided) | |
| Critical value for power (0.84 for 80%; 1.28 for 90%) |
Simplified approximation (equal groups):
2.2 Unequal Allocation
For allocation ratio :
With .
2.3 Non-Inferiority Design
For testing whether the new treatment is no worse than control by margin :
Note: One-sided α (typically 0.025) is standard for non-inferiority. Non-inferiority margins are typically small, resulting in substantially larger sample sizes than superiority trials.
2.4 Equivalence Design
For testing whether treatments differ by no more than :
2.5 Clustered Designs
When observations are nested within clusters, apply the variance inflation factor (design effect):
For unequal cluster sizes, adjust using coefficient of variation (CV):
2.6 Continuity Correction
For small samples or proportions near 0 or 1, apply continuity correction:
2.7 Dropout Adjustment
Inflate sample size to account for anticipated dropout:
Where = expected dropout rate.
3. Assumptions
3.1 Core Assumptions
| Assumption | Testable Criterion | Violation Consequence |
|---|---|---|
| Independence | Study design ensures no clustering | Severe: inflated Type I error if ignored |
| Fixed proportions | Event rates stable over enrollment period | Moderate: time-varying rates may require stratification |
| Large sample | and | Use exact methods (Fisher's) if violated |
| No confounding | Randomization successful | Bias in effect estimate |
3.2 Parameter Estimates
Control rate ()
Should come from prior studies, pilot data, or published literature. Consider secular trends—rates may have changed since historical studies.
Treatment effect
Can be specified as absolute difference (), relative risk (), or odds ratio. Ensure clinical relevance, not just statistical detectability.
Event Rate Impact on Sample Size
| Control Rate | 25% Relative Reduction | Required n/group (80% power) |
|---|---|---|
| 40% | 40% → 30% | 356 |
| 20% | 20% → 15% | 906 |
| 10% | 10% → 7.5% | 1,996 |
| 5% | 5% → 3.75% | 4,182 |
4. Regulatory Guidance
FDA
ICH E9 (Statistical Principles for Clinical Trials)
Requires prospective sample size justification with clearly stated assumptions for event rates and effect sizes.
FDA Guidance on Non-Inferiority Trials (2016)
Non-inferiority margin must preserve a clinically meaningful fraction of the active control effect. Recommends the 95-95 method or fixed margin approach.
FDA Guidance on Multiple Endpoints (2022)
When multiple binary endpoints are co-primary, apply multiplicity adjustment (e.g., Bonferroni: α/k), which increases required sample size.
EMA
CHMP Guideline on Non-Inferiority (2005)
Margin selection must be justified based on historical evidence of active control efficacy vs. placebo.
EMA Points to Consider on Switching
Pre-specification required for switching between superiority and non-inferiority; cannot switch post-hoc based on results.
Key Citations
- ICH E9: Statistical Principles for Clinical Trials (1998)
- FDA Guidance: Non-Inferiority Clinical Trials to Establish Effectiveness (2016)
- FDA Guidance: Multiple Endpoints in Clinical Trials (2022)
- CHMP: Guideline on the Choice of the Non-Inferiority Margin (2005)
5. Validation Against Industry Standards
| Scenario | Parameters | PASS 2024 | nQuery 9.5 | Zetyra | Status |
|---|---|---|---|---|---|
| Two-proportion (superiority) | p₁=0.30, p₂=0.20, α=0.05, power=0.80 | 294/group | 294/group | 294/group | ✓ Match |
| Two-proportion (superiority) | p₁=0.30, p₂=0.20, α=0.05, power=0.90 | 392/group | 393/group | 392/group | ✓ Match |
| Non-inferiority | p₁=p₂=0.20, δ=0.10, α=0.025, power=0.80 | 199/group | 199/group | 199/group | ✓ Match |
| Cluster RCT | p=0.25, ICC=0.05, m=20 | 582/group | 583/group | 582/group | ✓ Match |
Minor variations (±1 subject) may occur due to rounding conventions and continuity correction options.
6. Example SAP Language
Superiority Trial
The primary endpoint is the proportion of subjects achieving [response criterion] at Week [X]. Based on prior studies (Author et al., Year), the expected response rate in the control group is [p_C]%. We hypothesize that the intervention will achieve a response rate of [p_I]%, representing an absolute improvement of [difference]%.
Using a two-sided chi-square test with α = 0.05 and 80% power, [n] subjects per group are required. To account for an anticipated dropout rate of [X]%, we will enroll [N*] subjects per group ([total] subjects total).
Calculations were performed using [Zetyra / PASS / nQuery] and validated against published formulas (Fleiss et al., 2003).
Non-Inferiority Trial
The primary endpoint is the proportion of subjects achieving [outcome] at Week [X]. This is a non-inferiority trial comparing [new treatment] to [active control].
Based on historical trials (Author et al., Year), the active control achieves a response rate of approximately [p_C]%. We assume the new treatment will have a similar response rate. The non-inferiority margin is set at [δ]%, which preserves at least [X]% of the historical treatment effect over placebo, consistent with FDA guidance.
Using a one-sided test with α = 0.025 and 80% power, [n] subjects per group are required. To account for an anticipated dropout rate of [X]%, we will enroll [N*] subjects per group.
7. R Code
# Two-proportion superiority test library(pwr) # Method 1: Using pwr package (effect size h) p1 <- 0.30 # Intervention proportion p2 <- 0.20 # Control proportion h <- ES.h(p1, p2) # Cohen's h effect size pwr.2p.test( h = h, sig.level = 0.05, power = 0.80, alternative = "two.sided" ) # Result: n = 294 per group # Method 2: Using power.prop.test (base R) power.prop.test( p1 = 0.30, p2 = 0.20, sig.level = 0.05, power = 0.80, alternative = "two.sided" ) # Result: n = 294 per group # Non-inferiority test # Using TrialSize package library(TrialSize) p_control <- 0.20 p_treatment <- 0.20 # Assume equal under H1 delta <- 0.10 # Non-inferiority margin alpha <- 0.025 # One-sided # Manual calculation z_alpha <- qnorm(1 - alpha) z_beta <- qnorm(0.80) var_sum <- p_treatment*(1-p_treatment) + p_control*(1-p_control) n_ni <- ((z_alpha + z_beta)^2 * var_sum) / (delta)^2 ceiling(n_ni) # Result: n = 199 per group # Cluster RCT adjustment n_simple <- 294 m <- 20 # cluster size icc <- 0.05 # intraclass correlation deff <- 1 + (m - 1) * icc # design effect = 1.95 n_cluster <- ceiling(n_simple * deff) # Result: n = 574 per group # Dropout adjustment dropout_rate <- 0.15 n_adjusted <- ceiling(n_cluster / (1 - dropout_rate)^2) # Result: n = 795 per group
References
- Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. 3rd ed. Wiley; 2003.
- Chow SC, Shao J, Wang H, Lokhnygina Y. Sample Size Calculations in Clinical Research. 3rd ed. CRC Press; 2017.
- Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. Wiley; 2000.
- Blackwelder WC. "Proving the null hypothesis" in clinical trials.Controlled Clinical Trials. 1982;3(4):345-353.
- FDA Guidance. Non-Inferiority Clinical Trials to Establish Effectiveness. 2016.
Last updated: December 2024 | Validated against PASS 2024, nQuery 9.5
Ready to calculate your sample size?
Use our Chi-Square Calculator to determine the sample size needed for comparing proportions between groups.
Open Chi-Square Calculator