Docs/Guides/Cluster Randomized Trials

Sample Size for Cluster Randomized Trials

Comprehensive power analysis for trials where groups (clusters) rather than individuals are randomized to intervention or control.

1. When to Use This Method

Cluster randomization (group randomization) assigns entire groups—clinics, schools, villages, hospitals—to treatment arms rather than individual participants. This design is required when individual randomization is impossible, impractical, or would introduce contamination bias.

Criteria for Cluster Randomization

Criterion	Description	Example
Intervention at group level	Intervention cannot be delivered to individuals independently	Hospital-wide infection control protocol
Contamination risk	Treatment effects may spread between participants in same setting	Hand hygiene education in shared wards
Administrative convenience	Randomizing groups is logistically simpler	Classroom-based educational intervention
Herd effects	Intervention benefits depend on community-level coverage	Vaccination campaigns

Common Applications

Primary Care Research

GP practices randomized to implement screening protocols, prescribing guidelines, or quality improvement interventions.

Community Trials

Villages or neighborhoods assigned to public health interventions, water treatment, or vector control programs.

Educational Research

Schools or classrooms randomized to test curricula, teaching methods, or behavioral interventions.

Infection Control

Hospital wards or units testing hygiene protocols, antibiotic stewardship, or surveillance systems.

Contraindications

Very few clusters available (<6 per arm severely limits power)
Very small clusters where individual randomization is feasible
No plausible contamination pathway and intervention can target individuals
Cluster-level treatment assignment is not operationally required

2. Mathematical Formulation

2.1 Design Effect (DEFF)

The design effect quantifies the variance inflation due to clustering. It depends on the intraclass correlation coefficient (ICC, ρ) and cluster size (m):

DEFF = 1 + (m - 1)\rho

where ρ = ICC (proportion of total variance due to between-cluster differences), m = average cluster size

The ICC represents the correlation between two randomly selected individuals within the same cluster:

\rho = \frac{\sigma^2_b}{\sigma^2_b + \sigma^2_w}

σ²_b = between-cluster variance, σ²_w = within-cluster variance

2.2 Continuous Outcomes

For a two-arm parallel CRT comparing means with k clusters per arm and m individuals per cluster:

k = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2 \sigma^2 [1 + (m-1)\rho]}{m \cdot \delta^2}

k = clusters per arm, m = cluster size, δ = μ₁ - μ₂ (treatment effect), σ² = total variance

Rearranging for cluster size m given fixed k:

m = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2 \sigma^2}{k \cdot \delta^2 - 2(z_{1-\alpha/2} + z_{1-\beta})^2 \sigma^2 \rho}

2.3 Binary Outcomes

For comparing proportions p₁ and p₂:

k = \frac{(z_{1-\alpha/2} + z_{1-\beta})^2 [p_1(1-p_1) + p_2(1-p_2)][1 + (m-1)\rho]}{m(p_1 - p_2)^2}

For binary outcomes, the ICC is defined on the correlation scale, not the variance scale

ICC for Binary Outcomes

For proportions, the relationship between ICC and coefficient of variation (CV) is: ρ = CV² × p(1-p). Published ICCs may use different definitions; verify the scale when using literature values.

2.4 Survival Outcomes

For time-to-event outcomes, the design effect inflates the required number of events and accounts for within-cluster correlation in failure times:

E_{CRT} = E_{IRT} \times DEFF = \frac{4(z_{1-\alpha/2} + z_{1-\beta})^2}{(\log HR)^2} \times [1 + (m-1)\rho]

E = required events, HR = hazard ratio

2.5 Unequal Cluster Sizes

When cluster sizes vary, the design effect increases. The relative efficiency (RE) compared to equal clusters depends on the coefficient of variation of cluster sizes (CV_m):

DEFF_{unequal} = [1 + (m-1)\rho] \times [1 + CV_m^2 \times \frac{(m-1)\rho}{1 + (m-1)\rho}]

CV_m = standard deviation of cluster sizes / mean cluster size

For moderate variability (CV_m ≤ 0.5), a simpler approximation uses:

DEFF_{unequal} \approx 1 + (\bar{m} + CV_m^2 \bar{m} - 1)\rho

2.6 Matched-Pair Designs

When clusters are matched into pairs (e.g., by geography, size) and one cluster from each pair is randomized to treatment:

k_{pairs} = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2 \sigma^2 [1 + (m-1)\rho]}{m \cdot \delta^2 (1 + \rho_c)}

ρ_c = correlation between matched clusters (typically 0.3–0.6), k_pairs = number of matched pairs

Matching can substantially reduce the required number of clusters when between-pair heterogeneity is large.

2.7 Stepped Wedge Designs

In a stepped wedge design, all clusters start in control and sequentially switch to intervention at randomly assigned times. The design effect for a complete stepped wedge with J clusters and T time periods:

DEFF_{SW} = \frac{3(1-\rho)(1 + T\rho)}{T(T-\frac{1}{T}) \times \frac{2-\rho}{2(1-\rho)}}

T = number of time periods (steps + 1), complete design with one cluster switching per period

For a more general stepped wedge with clusters per step:

Var(\hat{\theta}_{SW}) = \frac{\sigma^2_w}{JM} \times \frac{1 + \rho(Tm - 1)}{T \times m} \times f(T, S)

f(T,S) is a design-specific factor depending on the number of steps and transition pattern

2.8 Clusters vs. Individuals: Key Insight

The degrees of freedom for treatment comparison depend primarily on the number of clusters, not individuals:

df \approx 2k - 2 \quad \text{(parallel CRT with } k \text{ clusters per arm)}

Adding individuals to existing clusters provides diminishing returns; adding clusters is usually more efficient

Rule of Thumb

With high ICC (ρ > 0.05), increasing clusters from 20 to 40 per arm typically provides more power than doubling cluster size from 50 to 100 individuals per cluster.

3. Assumptions

Core Assumptions

Assumption	Testable?	If Violated
ICC is constant across clusters	Yes (variance tests)	Use robust variance estimation; sensitivity analysis
No contamination between arms	Partially	Effect dilution; estimate contamination rate
Random cluster selection	Design feature	Generalizability limited to similar clusters
No informative cluster size	Partially	Consider size-weighted or hierarchical analysis
Exchangeable correlation structure	Yes (model comparison)	Consider nested or time-varying ICC
No differential attrition by cluster	Yes (descriptive)	Model cluster-level dropout; sensitivity analysis

ICC Reference Values

Outcome Type	Cluster Type	Typical ICC	Source
Physiological (BP, HbA1c)	GP practice	0.01–0.05	Adams et al. 2004
Process (screening rates)	GP practice	0.05–0.15	Campbell et al. 2005
Knowledge/attitudes	Schools	0.10–0.25	Hedges & Hedberg 2007
Behavioral	Worksites	0.02–0.08	Murray 1998
Infectious disease	Villages	0.01–0.10	Hayes & Moulton 2017

Minimum Cluster Requirements

Clusters per Arm	Degrees of Freedom	Assessment
3–5	4–8	Severely limited; permutation tests required
6–10	10–18	Small sample corrections essential (e.g., Kenward-Roger)
11–20	20–38	Adequate for most designs with corrections
> 20	> 38	Standard methods appropriate

4. Regulatory Guidance

FDA Guidance

“Cluster randomized trials... require special design and analysis considerations. The protocol should specify the anticipated intraclass correlation coefficient (ICC) and its justification, the design effect used in sample size calculations, and the planned method for accounting for clustering in the analysis.”

— FDA Guidance: Design and Analysis of Shedding Studies for Virus or Bacteria-Based Gene Therapy and Oncolytic Products (2015)

EMA Guidance

“For cluster randomised trials, the statistical analysis should account for the clustering structure. Mixed effects models or generalised estimating equations (GEE) are recommended. The number of clusters should be sufficient to provide adequate degrees of freedom for inference.”

— EMA Guideline on multiplicity issues in clinical trials (EMA/CHMP/44762/2017)

CONSORT Extension for Cluster Trials

The CONSORT extension for cluster randomized trials (Campbell et al., 2012) requires reporting:

Rationale for using cluster randomization
How clustering was accounted for in sample size calculation
ICC value used and its justification
Observed ICC from the trial data
Analysis method accounting for clustering
Number of clusters randomized, receiving intervention, and analyzed

5. Validation Against Industry Standards

Parallel CRT, Continuous Outcome

Testing parameters: δ = 0.5 (effect size), σ = 1.0, α = 0.05 (two-sided), power = 80%, m = 30 individuals per cluster, ρ = 0.05.

Software	Clusters per Arm	Total Clusters	Design Effect
PASS 2024	6	12	2.45
nQuery 9.5	6	12	2.45
Stata (clustersampsi)	6	12	2.45
R (clusterPower)	6	12	2.45
Zetyra	6	12	2.45

Binary Outcome

Testing parameters: p₁ = 0.30, p₂ = 0.20, α = 0.05 (two-sided), power = 80%, m = 50 individuals per cluster, ρ = 0.03.

Software	Clusters per Arm	Total Individuals
PASS 2024	15	1,500
nQuery 9.5	15	1,500
Zetyra	15	1,500

Stepped Wedge Design

Testing parameters: δ = 0.3, σ = 1.0, α = 0.05 (two-sided), power = 80%, 4 time periods, 20 individuals per cluster-period, ρ = 0.05.

Software	Total Clusters	Method
R (swCRTdesign)	8	Hussey & Hughes
Stata (steppedwedge)	8	Hussey & Hughes
Zetyra	8	Hussey & Hughes

6. Example SAP Language

Parallel CRT

STATISTICAL ANALYSIS PLAN TEMPLATE

Sample Size Justification This is a cluster randomized controlled trial with general practices as the unit of randomization. We assume an intraclass correlation coefficient (ICC) of 0.05, based on published estimates for similar outcomes in primary care settings (Campbell et al., 2005). With 30 patients per practice, the design effect is: DEFF = 1 + (30 - 1) × 0.05 = 2.45 To detect a standardized mean difference of 0.4 with 80% power at α = 0.05 (two-sided), assuming σ = 1.0: Individual RCT: n = 200 per arm (400 total) CRT adjusted: n = 200 × 2.45 = 490 per arm With 30 patients per practice: k = 490/30 = 16.3 ≈ 17 practices per arm (34 total practices, 1,020 patients). Allowing 15% practice dropout, we will recruit 20 practices per arm (40 total, 1,200 patients).

Matched-Pair CRT

STATISTICAL ANALYSIS PLAN TEMPLATE

Sample Size Justification Villages will be matched into pairs based on population size, geographic region, and baseline prevalence of the outcome. Within each pair, one village will be randomized to intervention. Assumptions: - Intraclass correlation coefficient (ICC): ρ = 0.08 - Within-pair correlation: ρ_c = 0.5 - Effect size: 15 percentage point reduction (50% to 35%) - Average of 40 eligible adults per village - Power: 80%, α = 0.05 (two-sided) Design effect = [1 + (40 - 1) × 0.08] / (1 + 0.5) = 4.12 / 1.5 = 2.75 Compared to unmatched design (DEFF = 4.12), matching reduces required sample size by approximately 33%. Required: 24 matched pairs (48 villages, 1,920 individuals).

Stepped Wedge CRT

STATISTICAL ANALYSIS PLAN TEMPLATE

Sample Size Justification This stepped wedge cluster randomized trial includes 12 hospitals with 4 time periods (3 steps). At each step, 4 hospitals cross from control to intervention. Outcomes are measured in repeated cross-sections at each period. Assumptions: - ICC: ρ = 0.03 - Cluster autocorrelation: 0.8 - 50 patients per hospital per time period - Effect size: odds ratio = 0.70 - Power: 80%, α = 0.05 (two-sided) Using the Hussey & Hughes (2007) formula for a complete stepped wedge design with the above parameters, 12 clusters provide 82% power for detecting the specified effect. Total sample: 12 hospitals × 4 periods × 50 patients = 2,400. Analysis will use a generalized linear mixed model with random effects for cluster and fixed effects for time period and intervention status.

7. R Code

Design Effect Calculation

# Design effect and effective sample size
design_effect <- function(m, icc) {
  # m: cluster size
  # icc: intraclass correlation coefficient
  deff <- 1 + (m - 1) * icc
  return(deff)
}

effective_n <- function(n_total, m, icc) {
  # n_total: total sample size (k * m)
  # Returns: effective sample size accounting for clustering
  deff <- design_effect(m, icc)
  n_eff <- n_total / deff
  return(list(
    deff = deff,
    n_effective = n_eff,
    efficiency = n_eff / n_total
  ))
}

# Example
effective_n(n_total = 1000, m = 50, icc = 0.05)
# $deff = 3.45
# $n_effective = 289.9
# $efficiency = 0.29

Parallel CRT: Continuous Outcome

crt_continuous <- function(delta, sigma, m, icc,
                            alpha = 0.05, power = 0.80) {
  # delta: treatment effect (difference in means)
  # sigma: total standard deviation
  # m: cluster size
  # icc: intraclass correlation coefficient
  # Returns: clusters per arm

  z_alpha <- qnorm(1 - alpha/2)
  z_beta <- qnorm(power)

  # Design effect
  deff <- 1 + (m - 1) * icc

  # Clusters per arm
  k <- (2 * (z_alpha + z_beta)^2 * sigma^2 * deff) / (m * delta^2)

  return(list(
    clusters_per_arm = ceiling(k),
    total_clusters = ceiling(k) * 2,
    total_n = ceiling(k) * 2 * m,
    design_effect = deff
  ))
}

# Example: detect 0.5 SD difference
crt_continuous(delta = 0.5, sigma = 1.0, m = 30, icc = 0.05)
# clusters_per_arm = 6
# total_clusters = 12
# total_n = 360
# design_effect = 2.45

Parallel CRT: Binary Outcome

crt_binary <- function(p1, p2, m, icc, alpha = 0.05, power = 0.80) {
  # p1: control proportion
  # p2: intervention proportion
  # m: cluster size
  # icc: intraclass correlation coefficient

  z_alpha <- qnorm(1 - alpha/2)
  z_beta <- qnorm(power)

  # Design effect
  deff <- 1 + (m - 1) * icc

  # Variance components
  var_pooled <- p1 * (1 - p1) + p2 * (1 - p2)

  # Clusters per arm
  k <- ((z_alpha + z_beta)^2 * var_pooled * deff) / (m * (p1 - p2)^2)

  return(list(
    clusters_per_arm = ceiling(k),
    total_clusters = ceiling(k) * 2,
    total_n = ceiling(k) * 2 * m,
    design_effect = deff
  ))
}

# Example: detect reduction from 30% to 20%
crt_binary(p1 = 0.30, p2 = 0.20, m = 50, icc = 0.03)
# clusters_per_arm = 15
# total_clusters = 30
# total_n = 1500

Using clusterPower Package

# install.packages("clusterPower")
library(clusterPower)

# Continuous outcome: find number of clusters
cpa.normal(nclusters = NA, nsubjects = 30,
           d = 0.5, icc = 0.05,
           vart = 1, alpha = 0.05, power = 0.80)
# nclusters = 12 (6 per arm)

# Binary outcome: find power for given design
cpa.binary(nclusters = 40, nsubjects = 50,
           p1 = 0.30, p2 = 0.20, icc = 0.03,
           pooled = TRUE, alpha = 0.05)
# power = 0.89

# Cluster size sensitivity analysis
sapply(c(20, 30, 50, 100), function(m) {
  result <- cpa.normal(nclusters = NA, nsubjects = m,
                       d = 0.4, icc = 0.05, vart = 1,
                       alpha = 0.05, power = 0.80)
  c(m = m, clusters = result$nclusters)
})

Matched-Pair CRT

matched_pair_crt <- function(delta, sigma, m, icc, rho_c,
                              alpha = 0.05, power = 0.80) {
  # rho_c: within-pair correlation (between matched clusters)
  # Returns: number of matched pairs

  z_alpha <- qnorm(1 - alpha/2)
  z_beta <- qnorm(power)

  # Design effect for matched pairs
  deff <- 1 + (m - 1) * icc

  # Variance reduction from matching
  var_reduction <- 1 + rho_c

  # Pairs needed
  pairs <- (2 * (z_alpha + z_beta)^2 * sigma^2 * deff) /
           (m * delta^2 * var_reduction)

  return(list(
    pairs = ceiling(pairs),
    total_clusters = ceiling(pairs) * 2,
    total_n = ceiling(pairs) * 2 * m,
    design_effect = deff,
    efficiency_gain = var_reduction
  ))
}

# Example: matched-pair design with moderate pair correlation
matched_pair_crt(delta = 0.4, sigma = 1.0, m = 40,
                 icc = 0.08, rho_c = 0.5)
# pairs = 24
# total_clusters = 48
# efficiency_gain = 1.5 (33% fewer clusters than unmatched)

Stepped Wedge Design (swCRTdesign)

# install.packages("swCRTdesign")
library(swCRTdesign)

# Define stepped wedge design: 4 periods, 3 steps
# X = design matrix (0 = control, 1 = intervention)
X <- matrix(c(
  0, 1, 1, 1,  # Cluster group 1: switches at period 2
  0, 0, 1, 1,  # Cluster group 2: switches at period 3
  0, 0, 0, 1   # Cluster group 3: switches at period 4
), nrow = 3, byrow = TRUE)

# Power calculation for continuous outcome
swPwr(X = X,
      m = 20,        # individuals per cluster-period
      n = 4,         # clusters per sequence
      delta = 0.3,   # effect size
      sigma.y = 1,   # total SD
      rho = 0.05,    # ICC
      alpha = 0.05)
# Power ≈ 0.82

# Find required clusters for 80% power
swSampleSize(X = X,
             m = 20,
             delta = 0.3,
             sigma.y = 1,
             rho = 0.05,
             power = 0.80,
             alpha = 0.05)

ICC Estimation from Pilot Data

library(lme4)
library(performance)

# Estimate ICC from pilot data
# data: data frame with 'outcome' and 'cluster' variables
estimate_icc <- function(data, outcome_var, cluster_var) {
  formula <- as.formula(paste(outcome_var, "~ 1 + (1|", cluster_var, ")"))

  # Fit null model with random cluster effect
  model <- lmer(formula, data = data)

  # Extract variance components
  vc <- as.data.frame(VarCorr(model))
  var_between <- vc[vc$grp == cluster_var, "vcov"]
  var_within <- sigma(model)^2

  icc <- var_between / (var_between + var_within)

  # Confidence interval via bootstrap
  icc_boot <- icc(model)

  return(list(
    icc = icc,
    icc_ci = icc_boot$ICC_adjusted,
    var_between = var_between,
    var_within = var_within
  ))
}

# Example with simulated data
set.seed(123)
n_clusters <- 30
n_per_cluster <- 20
true_icc <- 0.05

cluster_effects <- rnorm(n_clusters, 0, sqrt(true_icc))
pilot_data <- data.frame(
  cluster = rep(1:n_clusters, each = n_per_cluster),
  outcome = rnorm(n_clusters * n_per_cluster, 0, sqrt(1 - true_icc)) +
            rep(cluster_effects, each = n_per_cluster)
)

estimate_icc(pilot_data, "outcome", "cluster")
# icc ≈ 0.05 (plus confidence interval)

Sensitivity Analysis for ICC

# Sensitivity analysis across ICC range
icc_sensitivity <- function(delta, sigma, m,
                            icc_range = seq(0.01, 0.15, 0.01),
                            alpha = 0.05, power = 0.80) {

  results <- sapply(icc_range, function(icc) {
    res <- crt_continuous(delta, sigma, m, icc, alpha, power)
    c(icc = icc,
      deff = res$design_effect,
      clusters = res$total_clusters,
      total_n = res$total_n)
  })

  return(as.data.frame(t(results)))
}

# Generate sensitivity table
sens <- icc_sensitivity(delta = 0.4, sigma = 1.0, m = 30)
print(sens)

# Plot
library(ggplot2)
ggplot(sens, aes(x = icc, y = clusters)) +
  geom_line(size = 1, color = "#3B82F6") +
  geom_point(size = 2, color = "#3B82F6") +
  labs(
    title = "Required Clusters vs. ICC",
    x = "Intraclass Correlation Coefficient",
    y = "Total Clusters Required"
  ) +
  theme_minimal()

# Table for protocol
knitr::kable(sens, digits = 2,
             col.names = c("ICC", "Design Effect",
                          "Total Clusters", "Total N"))

8. References

Campbell MK, Piaggio G, Elbourne DR, Altman DG (2012). CONSORT 2010 statement: extension to cluster randomised trials.BMJ, 345:e5661.

Donner A, Klar N (2000).Design and Analysis of Cluster Randomization Trials in Health Research. Arnold Publishers.

Hayes RJ, Moulton LH (2017).Cluster Randomised Trials, 2nd ed. Chapman and Hall/CRC.

Hemming K, Taljaard M (2016). Sample size calculations for stepped wedge and cluster randomised trials: a unified approach. Journal of Clinical Epidemiology, 69:137-146.

Hussey MA, Hughes JP (2007). Design and analysis of stepped wedge cluster randomized trials.Contemporary Clinical Trials, 28(2):182-191.

Eldridge SM, Ashby D, Kerry S (2006). Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method.International Journal of Epidemiology, 35(5):1292-1300.

Campbell MK, Fayers PM, Grimshaw JM (2005). Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research.Clinical Trials, 2(2):99-107.

Adams G, Gulliford MC, Ukoumunne OC, et al. (2004). Patterns of intra-cluster correlation from primary care research to inform study design and analysis.Journal of Clinical Epidemiology, 57(8):785-794.

Ready to Calculate?

Use our Cluster Randomized Trial Calculator to determine the optimal number of clusters and cluster size for your study.

Sample Size Calculator

Related Guides

Sample Size: Continuous Outcomes

Individual-level sample size formulas that CRT designs inflate.

Longitudinal Studies

Sample size for repeated measures—shares correlation considerations with CRTs.

Sample Size for Cluster Randomized Trials

Contents

1. When to Use This Method

Criteria for Cluster Randomization

Common Applications

Primary Care Research

Community Trials

Educational Research

Infection Control

Contraindications

2. Mathematical Formulation

2.1 Design Effect (DEFF)

2.2 Continuous Outcomes

2.3 Binary Outcomes

ICC for Binary Outcomes

2.4 Survival Outcomes

2.5 Unequal Cluster Sizes

2.6 Matched-Pair Designs

2.7 Stepped Wedge Designs

2.8 Clusters vs. Individuals: Key Insight

Rule of Thumb

3. Assumptions

Core Assumptions

ICC Reference Values

Minimum Cluster Requirements

4. Regulatory Guidance

FDA Guidance

EMA Guidance

CONSORT Extension for Cluster Trials

5. Validation Against Industry Standards

Parallel CRT, Continuous Outcome

Binary Outcome

Stepped Wedge Design

6. Example SAP Language

Parallel CRT

Matched-Pair CRT

Stepped Wedge CRT

7. R Code

Design Effect Calculation

Parallel CRT: Continuous Outcome

Parallel CRT: Binary Outcome

Using clusterPower Package

Matched-Pair CRT

Stepped Wedge Design (swCRTdesign)

ICC Estimation from Pilot Data

Sensitivity Analysis for ICC

8. References

Ready to Calculate?

Related Guides

Sample Size: Continuous Outcomes

Longitudinal Studies