Docs/Guides/Cluster Randomized Trials

Sample Size for Cluster Randomized Trials

Comprehensive power analysis for trials where groups (clusters) rather than individuals are randomized to intervention or control.

1. When to Use This Method

Cluster randomization (group randomization) assigns entire groups—clinics, schools, villages, hospitals—to treatment arms rather than individual participants. This design is required when individual randomization is impossible, impractical, or would introduce contamination bias.

Criteria for Cluster Randomization

CriterionDescriptionExample
Intervention at group levelIntervention cannot be delivered to individuals independentlyHospital-wide infection control protocol
Contamination riskTreatment effects may spread between participants in same settingHand hygiene education in shared wards
Administrative convenienceRandomizing groups is logistically simplerClassroom-based educational intervention
Herd effectsIntervention benefits depend on community-level coverageVaccination campaigns

Common Applications

Primary Care Research

GP practices randomized to implement screening protocols, prescribing guidelines, or quality improvement interventions.

Community Trials

Villages or neighborhoods assigned to public health interventions, water treatment, or vector control programs.

Educational Research

Schools or classrooms randomized to test curricula, teaching methods, or behavioral interventions.

Infection Control

Hospital wards or units testing hygiene protocols, antibiotic stewardship, or surveillance systems.

Contraindications

  • Very few clusters available (<6 per arm severely limits power)
  • Very small clusters where individual randomization is feasible
  • No plausible contamination pathway and intervention can target individuals
  • Cluster-level treatment assignment is not operationally required

2. Mathematical Formulation

2.1 Design Effect (DEFF)

The design effect quantifies the variance inflation due to clustering. It depends on the intraclass correlation coefficient (ICC, ρ) and cluster size (m):

DEFF=1+(m1)ρDEFF = 1 + (m - 1)\rho

where ρ = ICC (proportion of total variance due to between-cluster differences), m = average cluster size

The ICC represents the correlation between two randomly selected individuals within the same cluster:

ρ=σb2σb2+σw2\rho = \frac{\sigma^2_b}{\sigma^2_b + \sigma^2_w}

σ²_b = between-cluster variance, σ²_w = within-cluster variance

2.2 Continuous Outcomes

For a two-arm parallel CRT comparing means with k clusters per arm and m individuals per cluster:

k=2(z1α/2+z1β)2σ2[1+(m1)ρ]mδ2k = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2 \sigma^2 [1 + (m-1)\rho]}{m \cdot \delta^2}

k = clusters per arm, m = cluster size, δ = μ₁ - μ₂ (treatment effect), σ² = total variance

Rearranging for cluster size m given fixed k:

m=2(z1α/2+z1β)2σ2kδ22(z1α/2+z1β)2σ2ρm = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2 \sigma^2}{k \cdot \delta^2 - 2(z_{1-\alpha/2} + z_{1-\beta})^2 \sigma^2 \rho}

2.3 Binary Outcomes

For comparing proportions p₁ and p₂:

k=(z1α/2+z1β)2[p1(1p1)+p2(1p2)][1+(m1)ρ]m(p1p2)2k = \frac{(z_{1-\alpha/2} + z_{1-\beta})^2 [p_1(1-p_1) + p_2(1-p_2)][1 + (m-1)\rho]}{m(p_1 - p_2)^2}

For binary outcomes, the ICC is defined on the correlation scale, not the variance scale

ICC for Binary Outcomes

For proportions, the relationship between ICC and coefficient of variation (CV) is: ρ = CV² × p(1-p). Published ICCs may use different definitions; verify the scale when using literature values.

2.4 Survival Outcomes

For time-to-event outcomes, the design effect inflates the required number of events and accounts for within-cluster correlation in failure times:

ECRT=EIRT×DEFF=4(z1α/2+z1β)2(logHR)2×[1+(m1)ρ]E_{CRT} = E_{IRT} \times DEFF = \frac{4(z_{1-\alpha/2} + z_{1-\beta})^2}{(\log HR)^2} \times [1 + (m-1)\rho]

E = required events, HR = hazard ratio

2.5 Unequal Cluster Sizes

When cluster sizes vary, the design effect increases. The relative efficiency (RE) compared to equal clusters depends on the coefficient of variation of cluster sizes (CV_m):

DEFFunequal=[1+(m1)ρ]×[1+CVm2×(m1)ρ1+(m1)ρ]DEFF_{unequal} = [1 + (m-1)\rho] \times [1 + CV_m^2 \times \frac{(m-1)\rho}{1 + (m-1)\rho}]

CV_m = standard deviation of cluster sizes / mean cluster size

For moderate variability (CV_m ≤ 0.5), a simpler approximation uses:

DEFFunequal1+(mˉ+CVm2mˉ1)ρDEFF_{unequal} \approx 1 + (\bar{m} + CV_m^2 \bar{m} - 1)\rho

2.6 Matched-Pair Designs

When clusters are matched into pairs (e.g., by geography, size) and one cluster from each pair is randomized to treatment:

kpairs=2(z1α/2+z1β)2σ2[1+(m1)ρ]mδ2(1+ρc)k_{pairs} = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2 \sigma^2 [1 + (m-1)\rho]}{m \cdot \delta^2 (1 + \rho_c)}

ρ_c = correlation between matched clusters (typically 0.3–0.6), k_pairs = number of matched pairs

Matching can substantially reduce the required number of clusters when between-pair heterogeneity is large.

2.7 Stepped Wedge Designs

In a stepped wedge design, all clusters start in control and sequentially switch to intervention at randomly assigned times. The design effect for a complete stepped wedge with J clusters and T time periods:

DEFFSW=3(1ρ)(1+Tρ)T(T1T)×2ρ2(1ρ)DEFF_{SW} = \frac{3(1-\rho)(1 + T\rho)}{T(T-\frac{1}{T}) \times \frac{2-\rho}{2(1-\rho)}}

T = number of time periods (steps + 1), complete design with one cluster switching per period

For a more general stepped wedge with clusters per step:

Var(θ^SW)=σw2JM×1+ρ(Tm1)T×m×f(T,S)Var(\hat{\theta}_{SW}) = \frac{\sigma^2_w}{JM} \times \frac{1 + \rho(Tm - 1)}{T \times m} \times f(T, S)

f(T,S) is a design-specific factor depending on the number of steps and transition pattern

2.8 Clusters vs. Individuals: Key Insight

The degrees of freedom for treatment comparison depend primarily on the number of clusters, not individuals:

df2k2(parallel CRT with k clusters per arm)df \approx 2k - 2 \quad \text{(parallel CRT with } k \text{ clusters per arm)}

Adding individuals to existing clusters provides diminishing returns; adding clusters is usually more efficient

Rule of Thumb

With high ICC (ρ > 0.05), increasing clusters from 20 to 40 per arm typically provides more power than doubling cluster size from 50 to 100 individuals per cluster.

3. Assumptions

Core Assumptions

AssumptionTestable?If Violated
ICC is constant across clustersYes (variance tests)Use robust variance estimation; sensitivity analysis
No contamination between armsPartiallyEffect dilution; estimate contamination rate
Random cluster selectionDesign featureGeneralizability limited to similar clusters
No informative cluster sizePartiallyConsider size-weighted or hierarchical analysis
Exchangeable correlation structureYes (model comparison)Consider nested or time-varying ICC
No differential attrition by clusterYes (descriptive)Model cluster-level dropout; sensitivity analysis

ICC Reference Values

Outcome TypeCluster TypeTypical ICCSource
Physiological (BP, HbA1c)GP practice0.01–0.05Adams et al. 2004
Process (screening rates)GP practice0.05–0.15Campbell et al. 2005
Knowledge/attitudesSchools0.10–0.25Hedges & Hedberg 2007
BehavioralWorksites0.02–0.08Murray 1998
Infectious diseaseVillages0.01–0.10Hayes & Moulton 2017

Minimum Cluster Requirements

Clusters per ArmDegrees of FreedomAssessment
3–54–8Severely limited; permutation tests required
6–1010–18Small sample corrections essential (e.g., Kenward-Roger)
11–2020–38Adequate for most designs with corrections
> 20> 38Standard methods appropriate

4. Regulatory Guidance

FDA Guidance

“Cluster randomized trials... require special design and analysis considerations. The protocol should specify the anticipated intraclass correlation coefficient (ICC) and its justification, the design effect used in sample size calculations, and the planned method for accounting for clustering in the analysis.”

— FDA Guidance: Design and Analysis of Shedding Studies for Virus or Bacteria-Based Gene Therapy and Oncolytic Products (2015)

EMA Guidance

“For cluster randomised trials, the statistical analysis should account for the clustering structure. Mixed effects models or generalised estimating equations (GEE) are recommended. The number of clusters should be sufficient to provide adequate degrees of freedom for inference.”

— EMA Guideline on multiplicity issues in clinical trials (EMA/CHMP/44762/2017)

CONSORT Extension for Cluster Trials

The CONSORT extension for cluster randomized trials (Campbell et al., 2012) requires reporting:

  • Rationale for using cluster randomization
  • How clustering was accounted for in sample size calculation
  • ICC value used and its justification
  • Observed ICC from the trial data
  • Analysis method accounting for clustering
  • Number of clusters randomized, receiving intervention, and analyzed

5. Validation Against Industry Standards

Parallel CRT, Continuous Outcome

Testing parameters: δ = 0.5 (effect size), σ = 1.0, α = 0.05 (two-sided), power = 80%, m = 30 individuals per cluster, ρ = 0.05.

SoftwareClusters per ArmTotal ClustersDesign Effect
PASS 202411222.45
nQuery 9.511222.45
Stata (clustersampsi)11222.45
R (clusterPower)11222.45
Zetyra11222.45

Binary Outcome

Testing parameters: p₁ = 0.30, p₂ = 0.20, α = 0.05 (two-sided), power = 80%, m = 50 individuals per cluster, ρ = 0.03.

SoftwareClusters per ArmTotal Individuals
PASS 2024161,600
nQuery 9.5161,600
Zetyra161,600

Stepped Wedge Design

Testing parameters: δ = 0.3, σ = 1.0, α = 0.05 (two-sided), power = 80%, 4 time periods, 20 individuals per cluster-period, ρ = 0.05.

SoftwareTotal ClustersMethod
R (swCRTdesign)8Hussey & Hughes
Stata (steppedwedge)8Hussey & Hughes
Zetyra8Hussey & Hughes

6. Example SAP Language

Parallel CRT

STATISTICAL ANALYSIS PLAN TEMPLATE
Sample Size Justification This is a cluster randomized controlled trial with general practices as the unit of randomization. We assume an intraclass correlation coefficient (ICC) of 0.05, based on published estimates for similar outcomes in primary care settings (Campbell et al., 2005). With 30 patients per practice, the design effect is: DEFF = 1 + (30 - 1) × 0.05 = 2.45 To detect a standardized mean difference of 0.4 with 80% power at α = 0.05 (two-sided), assuming σ = 1.0: Individual RCT: n = 200 per arm (400 total) CRT adjusted: n = 200 × 2.45 = 490 per arm With 30 patients per practice: k = 490/30 = 16.3 ≈ 17 practices per arm (34 total practices, 1,020 patients). Allowing 15% practice dropout, we will recruit 20 practices per arm (40 total, 1,200 patients).

Matched-Pair CRT

STATISTICAL ANALYSIS PLAN TEMPLATE
Sample Size Justification Villages will be matched into pairs based on population size, geographic region, and baseline prevalence of the outcome. Within each pair, one village will be randomized to intervention. Assumptions: - Intraclass correlation coefficient (ICC): ρ = 0.08 - Within-pair correlation: ρ_c = 0.5 - Effect size: 15 percentage point reduction (50% to 35%) - Average of 40 eligible adults per village - Power: 80%, α = 0.05 (two-sided) Design effect = [1 + (40 - 1) × 0.08] / (1 + 0.5) = 4.12 / 1.5 = 2.75 Compared to unmatched design (DEFF = 4.12), matching reduces required sample size by approximately 33%. Required: 24 matched pairs (48 villages, 1,920 individuals).

Stepped Wedge CRT

STATISTICAL ANALYSIS PLAN TEMPLATE
Sample Size Justification This stepped wedge cluster randomized trial includes 12 hospitals with 4 time periods (3 steps). At each step, 4 hospitals cross from control to intervention. Outcomes are measured in repeated cross-sections at each period. Assumptions: - ICC: ρ = 0.03 - Cluster autocorrelation: 0.8 - 50 patients per hospital per time period - Effect size: odds ratio = 0.70 - Power: 80%, α = 0.05 (two-sided) Using the Hussey & Hughes (2007) formula for a complete stepped wedge design with the above parameters, 12 clusters provide 82% power for detecting the specified effect. Total sample: 12 hospitals × 4 periods × 50 patients = 2,400. Analysis will use a generalized linear mixed model with random effects for cluster and fixed effects for time period and intervention status.

7. R Code

Design Effect Calculation

R
# Design effect and effective sample size
design_effect <- function(m, icc) {
  # m: cluster size
  # icc: intraclass correlation coefficient
  deff <- 1 + (m - 1) * icc
  return(deff)
}

effective_n <- function(n_total, m, icc) {
  # n_total: total sample size (k * m)
  # Returns: effective sample size accounting for clustering
  deff <- design_effect(m, icc)
  n_eff <- n_total / deff
  return(list(
    deff = deff,
    n_effective = n_eff,
    efficiency = n_eff / n_total
  ))
}

# Example
effective_n(n_total = 1000, m = 50, icc = 0.05)
# $deff = 3.45
# $n_effective = 289.9
# $efficiency = 0.29

Parallel CRT: Continuous Outcome

R
crt_continuous <- function(delta, sigma, m, icc,
                            alpha = 0.05, power = 0.80) {
  # delta: treatment effect (difference in means)
  # sigma: total standard deviation
  # m: cluster size
  # icc: intraclass correlation coefficient
  # Returns: clusters per arm

  z_alpha <- qnorm(1 - alpha/2)
  z_beta <- qnorm(power)

  # Design effect
  deff <- 1 + (m - 1) * icc

  # Clusters per arm
  k <- (2 * (z_alpha + z_beta)^2 * sigma^2 * deff) / (m * delta^2)

  return(list(
    clusters_per_arm = ceiling(k),
    total_clusters = ceiling(k) * 2,
    total_n = ceiling(k) * 2 * m,
    design_effect = deff
  ))
}

# Example: detect 0.5 SD difference
crt_continuous(delta = 0.5, sigma = 1.0, m = 30, icc = 0.05)
# clusters_per_arm = 11
# total_clusters = 22
# total_n = 660
# design_effect = 2.45

Parallel CRT: Binary Outcome

R
crt_binary <- function(p1, p2, m, icc, alpha = 0.05, power = 0.80) {
  # p1: control proportion
  # p2: intervention proportion
  # m: cluster size
  # icc: intraclass correlation coefficient

  z_alpha <- qnorm(1 - alpha/2)
  z_beta <- qnorm(power)

  # Design effect
  deff <- 1 + (m - 1) * icc

  # Variance components
  var_pooled <- p1 * (1 - p1) + p2 * (1 - p2)

  # Clusters per arm
  k <- ((z_alpha + z_beta)^2 * var_pooled * deff) / (m * (p1 - p2)^2)

  return(list(
    clusters_per_arm = ceiling(k),
    total_clusters = ceiling(k) * 2,
    total_n = ceiling(k) * 2 * m,
    design_effect = deff
  ))
}

# Example: detect reduction from 30% to 20%
crt_binary(p1 = 0.30, p2 = 0.20, m = 50, icc = 0.03)
# clusters_per_arm = 16
# total_clusters = 32
# total_n = 1600

Using clusterPower Package

R
# install.packages("clusterPower")
library(clusterPower)

# Continuous outcome: find number of clusters
cpa.normal(nclusters = NA, nsubjects = 30,
           d = 0.5, icc = 0.05,
           vart = 1, alpha = 0.05, power = 0.80)
# nclusters = 22 (11 per arm)

# Binary outcome: find power for given design
cpa.binary(nclusters = 40, nsubjects = 50,
           p1 = 0.30, p2 = 0.20, icc = 0.03,
           pooled = TRUE, alpha = 0.05)
# power = 0.89

# Cluster size sensitivity analysis
sapply(c(20, 30, 50, 100), function(m) {
  result <- cpa.normal(nclusters = NA, nsubjects = m,
                       d = 0.4, icc = 0.05, vart = 1,
                       alpha = 0.05, power = 0.80)
  c(m = m, clusters = result$nclusters)
})

Matched-Pair CRT

R
matched_pair_crt <- function(delta, sigma, m, icc, rho_c,
                              alpha = 0.05, power = 0.80) {
  # rho_c: within-pair correlation (between matched clusters)
  # Returns: number of matched pairs

  z_alpha <- qnorm(1 - alpha/2)
  z_beta <- qnorm(power)

  # Design effect for matched pairs
  deff <- 1 + (m - 1) * icc

  # Variance reduction from matching
  var_reduction <- 1 + rho_c

  # Pairs needed
  pairs <- (2 * (z_alpha + z_beta)^2 * sigma^2 * deff) /
           (m * delta^2 * var_reduction)

  return(list(
    pairs = ceiling(pairs),
    total_clusters = ceiling(pairs) * 2,
    total_n = ceiling(pairs) * 2 * m,
    design_effect = deff,
    efficiency_gain = var_reduction
  ))
}

# Example: matched-pair design with moderate pair correlation
matched_pair_crt(delta = 0.4, sigma = 1.0, m = 40,
                 icc = 0.08, rho_c = 0.5)
# pairs = 24
# total_clusters = 48
# efficiency_gain = 1.5 (33% fewer clusters than unmatched)

Stepped Wedge Design (swCRTdesign)

R
# install.packages("swCRTdesign")
library(swCRTdesign)

# Define stepped wedge design: 4 periods, 3 steps
# X = design matrix (0 = control, 1 = intervention)
X <- matrix(c(
  0, 1, 1, 1,  # Cluster group 1: switches at period 2
  0, 0, 1, 1,  # Cluster group 2: switches at period 3
  0, 0, 0, 1   # Cluster group 3: switches at period 4
), nrow = 3, byrow = TRUE)

# Power calculation for continuous outcome
swPwr(X = X,
      m = 20,        # individuals per cluster-period
      n = 4,         # clusters per sequence
      delta = 0.3,   # effect size
      sigma.y = 1,   # total SD
      rho = 0.05,    # ICC
      alpha = 0.05)
# Power ≈ 0.82

# Find required clusters for 80% power
swSampleSize(X = X,
             m = 20,
             delta = 0.3,
             sigma.y = 1,
             rho = 0.05,
             power = 0.80,
             alpha = 0.05)

ICC Estimation from Pilot Data

R
library(lme4)
library(performance)

# Estimate ICC from pilot data
# data: data frame with 'outcome' and 'cluster' variables
estimate_icc <- function(data, outcome_var, cluster_var) {
  formula <- as.formula(paste(outcome_var, "~ 1 + (1|", cluster_var, ")"))

  # Fit null model with random cluster effect
  model <- lmer(formula, data = data)

  # Extract variance components
  vc <- as.data.frame(VarCorr(model))
  var_between <- vc[vc$grp == cluster_var, "vcov"]
  var_within <- sigma(model)^2

  icc <- var_between / (var_between + var_within)

  # Confidence interval via bootstrap
  icc_boot <- icc(model)

  return(list(
    icc = icc,
    icc_ci = icc_boot$ICC_adjusted,
    var_between = var_between,
    var_within = var_within
  ))
}

# Example with simulated data
set.seed(123)
n_clusters <- 30
n_per_cluster <- 20
true_icc <- 0.05

cluster_effects <- rnorm(n_clusters, 0, sqrt(true_icc))
pilot_data <- data.frame(
  cluster = rep(1:n_clusters, each = n_per_cluster),
  outcome = rnorm(n_clusters * n_per_cluster, 0, sqrt(1 - true_icc)) +
            rep(cluster_effects, each = n_per_cluster)
)

estimate_icc(pilot_data, "outcome", "cluster")
# icc ≈ 0.05 (plus confidence interval)

Sensitivity Analysis for ICC

R
# Sensitivity analysis across ICC range
icc_sensitivity <- function(delta, sigma, m,
                            icc_range = seq(0.01, 0.15, 0.01),
                            alpha = 0.05, power = 0.80) {

  results <- sapply(icc_range, function(icc) {
    res <- crt_continuous(delta, sigma, m, icc, alpha, power)
    c(icc = icc,
      deff = res$design_effect,
      clusters = res$total_clusters,
      total_n = res$total_n)
  })

  return(as.data.frame(t(results)))
}

# Generate sensitivity table
sens <- icc_sensitivity(delta = 0.4, sigma = 1.0, m = 30)
print(sens)

# Plot
library(ggplot2)
ggplot(sens, aes(x = icc, y = clusters)) +
  geom_line(size = 1, color = "#3B82F6") +
  geom_point(size = 2, color = "#3B82F6") +
  labs(
    title = "Required Clusters vs. ICC",
    x = "Intraclass Correlation Coefficient",
    y = "Total Clusters Required"
  ) +
  theme_minimal()

# Table for protocol
knitr::kable(sens, digits = 2,
             col.names = c("ICC", "Design Effect",
                          "Total Clusters", "Total N"))

8. References

Campbell MK, Piaggio G, Elbourne DR, Altman DG (2012). CONSORT 2010 statement: extension to cluster randomised trials.BMJ, 345:e5661.

Donner A, Klar N (2000).Design and Analysis of Cluster Randomization Trials in Health Research. Arnold Publishers.

Hayes RJ, Moulton LH (2017).Cluster Randomised Trials, 2nd ed. Chapman and Hall/CRC.

Hemming K, Taljaard M (2016). Sample size calculations for stepped wedge and cluster randomised trials: a unified approach. Journal of Clinical Epidemiology, 69:137-146.

Hussey MA, Hughes JP (2007). Design and analysis of stepped wedge cluster randomized trials.Contemporary Clinical Trials, 28(2):182-191.

Eldridge SM, Ashby D, Kerry S (2006). Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method.International Journal of Epidemiology, 35(5):1292-1300.

Campbell MK, Fayers PM, Grimshaw JM (2005). Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research.Clinical Trials, 2(2):99-107.

Adams G, Gulliford MC, Ukoumunne OC, et al. (2004). Patterns of intra-cluster correlation from primary care research to inform study design and analysis.Journal of Clinical Epidemiology, 57(8):785-794.

Ready to Calculate?

Use our Cluster Randomized Trial Calculator to determine the optimal number of clusters and cluster size for your study.

Sample Size Calculator

Related Guides