Sample Size for Cluster Randomized Trials
Comprehensive power analysis for trials where groups (clusters) rather than individuals are randomized to intervention or control.
Contents
1. When to Use This Method
Cluster randomization (group randomization) assigns entire groups—clinics, schools, villages, hospitals—to treatment arms rather than individual participants. This design is required when individual randomization is impossible, impractical, or would introduce contamination bias.
Criteria for Cluster Randomization
| Criterion | Description | Example |
|---|---|---|
| Intervention at group level | Intervention cannot be delivered to individuals independently | Hospital-wide infection control protocol |
| Contamination risk | Treatment effects may spread between participants in same setting | Hand hygiene education in shared wards |
| Administrative convenience | Randomizing groups is logistically simpler | Classroom-based educational intervention |
| Herd effects | Intervention benefits depend on community-level coverage | Vaccination campaigns |
Common Applications
Primary Care Research
GP practices randomized to implement screening protocols, prescribing guidelines, or quality improvement interventions.
Community Trials
Villages or neighborhoods assigned to public health interventions, water treatment, or vector control programs.
Educational Research
Schools or classrooms randomized to test curricula, teaching methods, or behavioral interventions.
Infection Control
Hospital wards or units testing hygiene protocols, antibiotic stewardship, or surveillance systems.
Contraindications
- Very few clusters available (<6 per arm severely limits power)
- Very small clusters where individual randomization is feasible
- No plausible contamination pathway and intervention can target individuals
- Cluster-level treatment assignment is not operationally required
2. Mathematical Formulation
2.1 Design Effect (DEFF)
The design effect quantifies the variance inflation due to clustering. It depends on the intraclass correlation coefficient (ICC, ρ) and cluster size (m):
where ρ = ICC (proportion of total variance due to between-cluster differences), m = average cluster size
The ICC represents the correlation between two randomly selected individuals within the same cluster:
σ²_b = between-cluster variance, σ²_w = within-cluster variance
2.2 Continuous Outcomes
For a two-arm parallel CRT comparing means with k clusters per arm and m individuals per cluster:
k = clusters per arm, m = cluster size, δ = μ₁ - μ₂ (treatment effect), σ² = total variance
Rearranging for cluster size m given fixed k:
2.3 Binary Outcomes
For comparing proportions p₁ and p₂:
For binary outcomes, the ICC is defined on the correlation scale, not the variance scale
ICC for Binary Outcomes
For proportions, the relationship between ICC and coefficient of variation (CV) is: ρ = CV² × p(1-p). Published ICCs may use different definitions; verify the scale when using literature values.
2.4 Survival Outcomes
For time-to-event outcomes, the design effect inflates the required number of events and accounts for within-cluster correlation in failure times:
E = required events, HR = hazard ratio
2.5 Unequal Cluster Sizes
When cluster sizes vary, the design effect increases. The relative efficiency (RE) compared to equal clusters depends on the coefficient of variation of cluster sizes (CV_m):
CV_m = standard deviation of cluster sizes / mean cluster size
For moderate variability (CV_m ≤ 0.5), a simpler approximation uses:
2.6 Matched-Pair Designs
When clusters are matched into pairs (e.g., by geography, size) and one cluster from each pair is randomized to treatment:
ρ_c = correlation between matched clusters (typically 0.3–0.6), k_pairs = number of matched pairs
Matching can substantially reduce the required number of clusters when between-pair heterogeneity is large.
2.7 Stepped Wedge Designs
In a stepped wedge design, all clusters start in control and sequentially switch to intervention at randomly assigned times. The design effect for a complete stepped wedge with J clusters and T time periods:
T = number of time periods (steps + 1), complete design with one cluster switching per period
For a more general stepped wedge with clusters per step:
f(T,S) is a design-specific factor depending on the number of steps and transition pattern
2.8 Clusters vs. Individuals: Key Insight
The degrees of freedom for treatment comparison depend primarily on the number of clusters, not individuals:
Adding individuals to existing clusters provides diminishing returns; adding clusters is usually more efficient
Rule of Thumb
With high ICC (ρ > 0.05), increasing clusters from 20 to 40 per arm typically provides more power than doubling cluster size from 50 to 100 individuals per cluster.
3. Assumptions
Core Assumptions
| Assumption | Testable? | If Violated |
|---|---|---|
| ICC is constant across clusters | Yes (variance tests) | Use robust variance estimation; sensitivity analysis |
| No contamination between arms | Partially | Effect dilution; estimate contamination rate |
| Random cluster selection | Design feature | Generalizability limited to similar clusters |
| No informative cluster size | Partially | Consider size-weighted or hierarchical analysis |
| Exchangeable correlation structure | Yes (model comparison) | Consider nested or time-varying ICC |
| No differential attrition by cluster | Yes (descriptive) | Model cluster-level dropout; sensitivity analysis |
ICC Reference Values
| Outcome Type | Cluster Type | Typical ICC | Source |
|---|---|---|---|
| Physiological (BP, HbA1c) | GP practice | 0.01–0.05 | Adams et al. 2004 |
| Process (screening rates) | GP practice | 0.05–0.15 | Campbell et al. 2005 |
| Knowledge/attitudes | Schools | 0.10–0.25 | Hedges & Hedberg 2007 |
| Behavioral | Worksites | 0.02–0.08 | Murray 1998 |
| Infectious disease | Villages | 0.01–0.10 | Hayes & Moulton 2017 |
Minimum Cluster Requirements
| Clusters per Arm | Degrees of Freedom | Assessment |
|---|---|---|
| 3–5 | 4–8 | Severely limited; permutation tests required |
| 6–10 | 10–18 | Small sample corrections essential (e.g., Kenward-Roger) |
| 11–20 | 20–38 | Adequate for most designs with corrections |
| > 20 | > 38 | Standard methods appropriate |
4. Regulatory Guidance
FDA Guidance
“Cluster randomized trials... require special design and analysis considerations. The protocol should specify the anticipated intraclass correlation coefficient (ICC) and its justification, the design effect used in sample size calculations, and the planned method for accounting for clustering in the analysis.”
— FDA Guidance: Design and Analysis of Shedding Studies for Virus or Bacteria-Based Gene Therapy and Oncolytic Products (2015)
EMA Guidance
“For cluster randomised trials, the statistical analysis should account for the clustering structure. Mixed effects models or generalised estimating equations (GEE) are recommended. The number of clusters should be sufficient to provide adequate degrees of freedom for inference.”
— EMA Guideline on multiplicity issues in clinical trials (EMA/CHMP/44762/2017)
CONSORT Extension for Cluster Trials
The CONSORT extension for cluster randomized trials (Campbell et al., 2012) requires reporting:
- Rationale for using cluster randomization
- How clustering was accounted for in sample size calculation
- ICC value used and its justification
- Observed ICC from the trial data
- Analysis method accounting for clustering
- Number of clusters randomized, receiving intervention, and analyzed
5. Validation Against Industry Standards
Parallel CRT, Continuous Outcome
Testing parameters: δ = 0.5 (effect size), σ = 1.0, α = 0.05 (two-sided), power = 80%, m = 30 individuals per cluster, ρ = 0.05.
| Software | Clusters per Arm | Total Clusters | Design Effect |
|---|---|---|---|
| PASS 2024 | 11 | 22 | 2.45 |
| nQuery 9.5 | 11 | 22 | 2.45 |
| Stata (clustersampsi) | 11 | 22 | 2.45 |
| R (clusterPower) | 11 | 22 | 2.45 |
| Zetyra | 11 | 22 | 2.45 |
Binary Outcome
Testing parameters: p₁ = 0.30, p₂ = 0.20, α = 0.05 (two-sided), power = 80%, m = 50 individuals per cluster, ρ = 0.03.
| Software | Clusters per Arm | Total Individuals |
|---|---|---|
| PASS 2024 | 16 | 1,600 |
| nQuery 9.5 | 16 | 1,600 |
| Zetyra | 16 | 1,600 |
Stepped Wedge Design
Testing parameters: δ = 0.3, σ = 1.0, α = 0.05 (two-sided), power = 80%, 4 time periods, 20 individuals per cluster-period, ρ = 0.05.
| Software | Total Clusters | Method |
|---|---|---|
| R (swCRTdesign) | 8 | Hussey & Hughes |
| Stata (steppedwedge) | 8 | Hussey & Hughes |
| Zetyra | 8 | Hussey & Hughes |
6. Example SAP Language
Parallel CRT
Matched-Pair CRT
Stepped Wedge CRT
7. R Code
Design Effect Calculation
# Design effect and effective sample size
design_effect <- function(m, icc) {
# m: cluster size
# icc: intraclass correlation coefficient
deff <- 1 + (m - 1) * icc
return(deff)
}
effective_n <- function(n_total, m, icc) {
# n_total: total sample size (k * m)
# Returns: effective sample size accounting for clustering
deff <- design_effect(m, icc)
n_eff <- n_total / deff
return(list(
deff = deff,
n_effective = n_eff,
efficiency = n_eff / n_total
))
}
# Example
effective_n(n_total = 1000, m = 50, icc = 0.05)
# $deff = 3.45
# $n_effective = 289.9
# $efficiency = 0.29Parallel CRT: Continuous Outcome
crt_continuous <- function(delta, sigma, m, icc,
alpha = 0.05, power = 0.80) {
# delta: treatment effect (difference in means)
# sigma: total standard deviation
# m: cluster size
# icc: intraclass correlation coefficient
# Returns: clusters per arm
z_alpha <- qnorm(1 - alpha/2)
z_beta <- qnorm(power)
# Design effect
deff <- 1 + (m - 1) * icc
# Clusters per arm
k <- (2 * (z_alpha + z_beta)^2 * sigma^2 * deff) / (m * delta^2)
return(list(
clusters_per_arm = ceiling(k),
total_clusters = ceiling(k) * 2,
total_n = ceiling(k) * 2 * m,
design_effect = deff
))
}
# Example: detect 0.5 SD difference
crt_continuous(delta = 0.5, sigma = 1.0, m = 30, icc = 0.05)
# clusters_per_arm = 11
# total_clusters = 22
# total_n = 660
# design_effect = 2.45Parallel CRT: Binary Outcome
crt_binary <- function(p1, p2, m, icc, alpha = 0.05, power = 0.80) {
# p1: control proportion
# p2: intervention proportion
# m: cluster size
# icc: intraclass correlation coefficient
z_alpha <- qnorm(1 - alpha/2)
z_beta <- qnorm(power)
# Design effect
deff <- 1 + (m - 1) * icc
# Variance components
var_pooled <- p1 * (1 - p1) + p2 * (1 - p2)
# Clusters per arm
k <- ((z_alpha + z_beta)^2 * var_pooled * deff) / (m * (p1 - p2)^2)
return(list(
clusters_per_arm = ceiling(k),
total_clusters = ceiling(k) * 2,
total_n = ceiling(k) * 2 * m,
design_effect = deff
))
}
# Example: detect reduction from 30% to 20%
crt_binary(p1 = 0.30, p2 = 0.20, m = 50, icc = 0.03)
# clusters_per_arm = 16
# total_clusters = 32
# total_n = 1600Using clusterPower Package
# install.packages("clusterPower")
library(clusterPower)
# Continuous outcome: find number of clusters
cpa.normal(nclusters = NA, nsubjects = 30,
d = 0.5, icc = 0.05,
vart = 1, alpha = 0.05, power = 0.80)
# nclusters = 22 (11 per arm)
# Binary outcome: find power for given design
cpa.binary(nclusters = 40, nsubjects = 50,
p1 = 0.30, p2 = 0.20, icc = 0.03,
pooled = TRUE, alpha = 0.05)
# power = 0.89
# Cluster size sensitivity analysis
sapply(c(20, 30, 50, 100), function(m) {
result <- cpa.normal(nclusters = NA, nsubjects = m,
d = 0.4, icc = 0.05, vart = 1,
alpha = 0.05, power = 0.80)
c(m = m, clusters = result$nclusters)
})Matched-Pair CRT
matched_pair_crt <- function(delta, sigma, m, icc, rho_c,
alpha = 0.05, power = 0.80) {
# rho_c: within-pair correlation (between matched clusters)
# Returns: number of matched pairs
z_alpha <- qnorm(1 - alpha/2)
z_beta <- qnorm(power)
# Design effect for matched pairs
deff <- 1 + (m - 1) * icc
# Variance reduction from matching
var_reduction <- 1 + rho_c
# Pairs needed
pairs <- (2 * (z_alpha + z_beta)^2 * sigma^2 * deff) /
(m * delta^2 * var_reduction)
return(list(
pairs = ceiling(pairs),
total_clusters = ceiling(pairs) * 2,
total_n = ceiling(pairs) * 2 * m,
design_effect = deff,
efficiency_gain = var_reduction
))
}
# Example: matched-pair design with moderate pair correlation
matched_pair_crt(delta = 0.4, sigma = 1.0, m = 40,
icc = 0.08, rho_c = 0.5)
# pairs = 24
# total_clusters = 48
# efficiency_gain = 1.5 (33% fewer clusters than unmatched)Stepped Wedge Design (swCRTdesign)
# install.packages("swCRTdesign")
library(swCRTdesign)
# Define stepped wedge design: 4 periods, 3 steps
# X = design matrix (0 = control, 1 = intervention)
X <- matrix(c(
0, 1, 1, 1, # Cluster group 1: switches at period 2
0, 0, 1, 1, # Cluster group 2: switches at period 3
0, 0, 0, 1 # Cluster group 3: switches at period 4
), nrow = 3, byrow = TRUE)
# Power calculation for continuous outcome
swPwr(X = X,
m = 20, # individuals per cluster-period
n = 4, # clusters per sequence
delta = 0.3, # effect size
sigma.y = 1, # total SD
rho = 0.05, # ICC
alpha = 0.05)
# Power ≈ 0.82
# Find required clusters for 80% power
swSampleSize(X = X,
m = 20,
delta = 0.3,
sigma.y = 1,
rho = 0.05,
power = 0.80,
alpha = 0.05)ICC Estimation from Pilot Data
library(lme4)
library(performance)
# Estimate ICC from pilot data
# data: data frame with 'outcome' and 'cluster' variables
estimate_icc <- function(data, outcome_var, cluster_var) {
formula <- as.formula(paste(outcome_var, "~ 1 + (1|", cluster_var, ")"))
# Fit null model with random cluster effect
model <- lmer(formula, data = data)
# Extract variance components
vc <- as.data.frame(VarCorr(model))
var_between <- vc[vc$grp == cluster_var, "vcov"]
var_within <- sigma(model)^2
icc <- var_between / (var_between + var_within)
# Confidence interval via bootstrap
icc_boot <- icc(model)
return(list(
icc = icc,
icc_ci = icc_boot$ICC_adjusted,
var_between = var_between,
var_within = var_within
))
}
# Example with simulated data
set.seed(123)
n_clusters <- 30
n_per_cluster <- 20
true_icc <- 0.05
cluster_effects <- rnorm(n_clusters, 0, sqrt(true_icc))
pilot_data <- data.frame(
cluster = rep(1:n_clusters, each = n_per_cluster),
outcome = rnorm(n_clusters * n_per_cluster, 0, sqrt(1 - true_icc)) +
rep(cluster_effects, each = n_per_cluster)
)
estimate_icc(pilot_data, "outcome", "cluster")
# icc ≈ 0.05 (plus confidence interval)Sensitivity Analysis for ICC
# Sensitivity analysis across ICC range
icc_sensitivity <- function(delta, sigma, m,
icc_range = seq(0.01, 0.15, 0.01),
alpha = 0.05, power = 0.80) {
results <- sapply(icc_range, function(icc) {
res <- crt_continuous(delta, sigma, m, icc, alpha, power)
c(icc = icc,
deff = res$design_effect,
clusters = res$total_clusters,
total_n = res$total_n)
})
return(as.data.frame(t(results)))
}
# Generate sensitivity table
sens <- icc_sensitivity(delta = 0.4, sigma = 1.0, m = 30)
print(sens)
# Plot
library(ggplot2)
ggplot(sens, aes(x = icc, y = clusters)) +
geom_line(size = 1, color = "#3B82F6") +
geom_point(size = 2, color = "#3B82F6") +
labs(
title = "Required Clusters vs. ICC",
x = "Intraclass Correlation Coefficient",
y = "Total Clusters Required"
) +
theme_minimal()
# Table for protocol
knitr::kable(sens, digits = 2,
col.names = c("ICC", "Design Effect",
"Total Clusters", "Total N"))8. References
Campbell MK, Piaggio G, Elbourne DR, Altman DG (2012). CONSORT 2010 statement: extension to cluster randomised trials.BMJ, 345:e5661.
Donner A, Klar N (2000).Design and Analysis of Cluster Randomization Trials in Health Research. Arnold Publishers.
Hayes RJ, Moulton LH (2017).Cluster Randomised Trials, 2nd ed. Chapman and Hall/CRC.
Hemming K, Taljaard M (2016). Sample size calculations for stepped wedge and cluster randomised trials: a unified approach. Journal of Clinical Epidemiology, 69:137-146.
Hussey MA, Hughes JP (2007). Design and analysis of stepped wedge cluster randomized trials.Contemporary Clinical Trials, 28(2):182-191.
Eldridge SM, Ashby D, Kerry S (2006). Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method.International Journal of Epidemiology, 35(5):1292-1300.
Campbell MK, Fayers PM, Grimshaw JM (2005). Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research.Clinical Trials, 2(2):99-107.
Adams G, Gulliford MC, Ukoumunne OC, et al. (2004). Patterns of intra-cluster correlation from primary care research to inform study design and analysis.Journal of Clinical Epidemiology, 57(8):785-794.
Ready to Calculate?
Use our Cluster Randomized Trial Calculator to determine the optimal number of clusters and cluster size for your study.