Sample Size for Longitudinal and Repeated Measures
Comprehensive power analysis for clinical trials with multiple measurements per subject over time.
Contents
1. When to Use This Method
Use this methodology when:
- Each subject is measured multiple times at pre-specified intervals
- You want to compare trajectories (slopes, rates of change) between groups
- You want to compare endpoints while adjusting for baseline
- You need to model within-subject correlation explicitly
Common Designs
| Design | Description | Use Case |
|---|---|---|
| Parallel longitudinal | Groups randomized, measured over time | Most clinical trials |
| Crossover | Each subject receives all treatments | Chronic conditions, washout possible |
| Pre-post | Baseline + one follow-up | Simple intervention studies |
| N-of-1 | Single patient, multiple treatment periods | Personalized medicine |
| Interrupted time series | Single unit, multiple observations pre/post | Policy evaluation |
Common Applications
- Disease progression trials (Alzheimer's, MS, ALS)
- Growth studies (pediatric development)
- Pharmacokinetic studies (drug concentration over time)
- Chronic disease management (diabetes, hypertension)
- Behavioral interventions (weight loss, smoking cessation)
- Quality of life trajectories
Contraindications
- Single measurement per subject — use standard methods
- Time-to-event outcome with censoring — use survival analysis
- Cluster-level randomization — use cluster methods (though can combine)
- No meaningful time structure to measurements
2. Mathematical Formulation
2.1 Comparison Types
| Comparison | Question Answered | Statistical Approach |
|---|---|---|
| Endpoint means | Are final values different? | ANCOVA, MMRM |
| Change from baseline | Is the change different? | Paired t-test, ANCOVA |
| Slopes | Are rates of change different? | Mixed models, GEE |
| AUC | Is total exposure different? | Summary statistic |
| Time × Treatment | Do trajectories diverge? | Mixed models |
2.2 Sample Size for Comparing Slopes
For comparing rates of change between two groups with equally-spaced measurements:
β₁ - β₀ = difference in slopes between groups, V_t = variance of time points
For equally-spaced times :
Variance of time points for equally-spaced design
2.3 Sample Size for Comparing Endpoint Means (ANCOVA)
When adjusting for baseline in a pre-post design:
ρ = correlation between baseline and endpoint, (1-ρ²) = variance reduction from ANCOVA
Efficiency Gain
If ρ = 0.5, ANCOVA reduces required n by 25%. If ρ = 0.7, reduction is 51%.
2.4 Sample Size for Change Scores
For comparing mean change from baseline:
ρ = correlation between baseline and follow-up
2.5 General Repeated Measures Formula
For measurements with compound symmetry correlation :
The factor [1 + (m-1)ρ]/m represents efficiency relative to a single measurement
Efficiency Factor by Number of Measurements
| m | ρ = 0.3 | ρ = 0.5 | ρ = 0.7 |
|---|---|---|---|
| 2 | 0.65 | 0.75 | 0.85 |
| 3 | 0.53 | 0.67 | 0.80 |
| 4 | 0.48 | 0.63 | 0.78 |
| 5 | 0.44 | 0.60 | 0.76 |
| 10 | 0.37 | 0.55 | 0.73 |
2.6 Correlation Structures
| Structure | Assumption | Formula |
|---|---|---|
| Compound Symmetry (CS) | All pairs equally correlated | for all i ≠ j |
| AR(1) | Correlation decays with time lag | |
| Unstructured | No pattern assumed | Each pair estimated separately |
| Toeplitz | Correlation depends on lag only |
Impact on sample size: AR(1) provides less benefit from additional measurements than CS because distant observations are less correlated.
2.7 Crossover Design
For a 2×2 crossover comparing two treatments:
σ²_w = within-subject variance (typically much smaller than between-subject variance)
Efficiency
Crossover designs require roughly as many subjects as parallel designs, where ρ = correlation between periods.
2.8 Missing Data Adjustment
Simple Inflation (MCAR assumption):
d = per-visit dropout rate, k = power (typically 1 for conservative, 2 for optimistic)
Pattern Mixture Approach:
For monotone dropout with expected completion rate :
r_info = information retained from incomplete observations (depends on analysis method)
2.9 Diminishing Returns of Additional Measurements
| Measurements (m) | Relative efficiency (ρ=0.5) | Marginal gain |
|---|---|---|
| 1 | 1.00 | — |
| 2 | 0.75 | 25% |
| 3 | 0.67 | 8% |
| 4 | 0.63 | 4% |
| 5 | 0.60 | 3% |
| 10 | 0.55 | <1% each |
Rule of Thumb
Beyond 4-5 measurements, additional timepoints yield minimal sample size reduction under compound symmetry correlation.
3. Assumptions
3.1 Core Assumptions
| Assumption | Testable Criterion | Violation Consequence |
|---|---|---|
| Correct correlation structure | AIC/BIC model comparison | Inefficient estimates; incorrect SEs |
| Missing at Random (MAR) | Untestable; sensitivity analysis | Biased estimates if MNAR |
| Linear trajectories (for slope comparison) | Residual plots, polynomial terms | Wrong functional form invalidates slope comparison |
| No carryover (crossover) | Washout period adequate; test for period × treatment | Biased treatment effect |
| Homogeneous variance | Residual plots by group/time | Use heterogeneous variance models |
3.2 Missing Data Mechanisms
| Mechanism | Definition | Example | Analysis Implication |
|---|---|---|---|
| MCAR | Missingness unrelated to any data | Random administrative error | Complete case valid |
| MAR | Missingness related to observed data only | Sicker patients miss visits but sickness is measured | Mixed models, multiple imputation valid |
| MNAR | Missingness related to unobserved outcome | Dropout due to lack of efficacy | Sensitivity analysis required |
3.3 Correlation Magnitude Guidelines
| Setting | Typical ρ Range |
|---|---|
| Short-term (days to weeks) | 0.7 – 0.9 |
| Medium-term (weeks to months) | 0.5 – 0.7 |
| Long-term (months to years) | 0.3 – 0.5 |
| Highly stable traits | 0.8 – 0.95 |
| Highly variable measures | 0.2 – 0.4 |
4. Regulatory Guidance
FDA
- ICH E9 (Statistical Principles): Recommends mixed-effects models for repeated measures (MMRM) as primary analysis for longitudinal trials. Emphasizes handling of missing data.
- FDA Guidance on Missing Data in Clinical Trials (2010):
- Discourages LOCF (Last Observation Carried Forward)
- Recommends likelihood-based methods (MMRM) or multiple imputation
- Requires sensitivity analyses under different missing data assumptions
- Primary estimand should be clearly defined
- ICH E9(R1) Addendum on Estimands (2019): Define how intercurrent events (dropout, treatment switching) are handled. Composite, hypothetical, treatment policy, and principal stratum strategies.
EMA
- CHMP Guideline on Missing Data (2010): MAR is often a reasonable assumption but must be justified. Sensitivity analyses required for departures from MAR. Pattern mixture and selection models recommended for MNAR scenarios.
- CHMP Points to Consider on ANCOVA: ANCOVA with baseline as covariate preferred for pre-post designs. More powerful than change score analysis or post-only comparison.
Key Analysis Requirements
- Pre-specify correlation structure or state that unstructured will be used
- Define primary estimand including handling of missing data
- Sensitivity analyses for missing data mechanism assumptions
- Avoid LOCF as primary analysis (acceptable only as sensitivity)
Key Citations
- ICH E9: Statistical Principles for Clinical Trials (1998)
- ICH E9(R1): Addendum on Estimands and Sensitivity Analysis (2019)
- FDA Guidance: Missing Data in Confirmatory Clinical Trials (2010)
- NRC Report: Prevention and Treatment of Missing Data in Clinical Trials (2010)
- CHMP: Guideline on Missing Data in Confirmatory Clinical Trials (2010)
5. Validation Against Industry Standards
ANCOVA (Pre-Post Design)
| Scenario | Parameters | PASS 2024 | nQuery 9.5 | Zetyra |
|---|---|---|---|---|
| ANCOVA | Δ=5, σ=10, ρ=0.5, α=0.05, power=0.80 | 48/group | 48/group | 48/group ✓ |
| ANCOVA | Δ=5, σ=10, ρ=0.7, α=0.05, power=0.80 | 33/group | 33/group | 33/group ✓ |
| No baseline adj. | Δ=5, σ=10, α=0.05, power=0.80 | 64/group | 64/group | 64/group ✓ |
Repeated Measures (Compound Symmetry)
| Scenario | Parameters | PASS 2024 | nQuery 9.5 | Zetyra |
|---|---|---|---|---|
| 3 timepoints | Δ=5, σ=10, ρ=0.5, α=0.05, power=0.80 | 43/group | 43/group | 43/group ✓ |
| 5 timepoints | Δ=5, σ=10, ρ=0.5, α=0.05, power=0.80 | 39/group | 38/group | 39/group ✓ |
| 3 timepoints | Δ=5, σ=10, ρ=0.7, α=0.05, power=0.80 | 51/group | 51/group | 51/group ✓ |
Slope Comparison
| Scenario | Parameters | PASS 2024 | nQuery 9.5 | Zetyra |
|---|---|---|---|---|
| 4 timepoints | Δβ=2, σ=10, times=0,1,2,3 | 100/group | 100/group | 100/group ✓ |
| 6 timepoints | Δβ=2, σ=10, times=0,1,...,5 | 35/group | 35/group | 35/group ✓ |
Crossover Design
| Scenario | Parameters | PASS 2024 | nQuery 9.5 | Zetyra |
|---|---|---|---|---|
| 2×2 crossover | Δ=5, σ_w=6, α=0.05, power=0.80 | 12 total | 12 total | 12 total ✓ |
| 2×2 crossover | Δ=5, σ_w=8, α=0.05, power=0.80 | 22 total | 22 total | 22 total ✓ |
Minor variations may occur due to rounding and formula variants.
6. Example SAP Language
Parallel Longitudinal Trial (MMRM)
ANCOVA (Pre-Post Design)
Slope Comparison (Disease Progression)
Crossover Trial
7. R Code
ANCOVA (Pre-Post Design)
ancova_sample_size <- function(delta, sigma, rho, alpha = 0.05, power = 0.80) {
z_alpha <- qnorm(1 - alpha/2)
z_beta <- qnorm(power)
# Variance reduction from ANCOVA
var_reduction <- 1 - rho^2
n <- 2 * sigma^2 * var_reduction * (z_alpha + z_beta)^2 / delta^2
list(
n_per_group = ceiling(n),
variance_reduction = paste0(round((1 - var_reduction) * 100), "%"),
vs_unadjusted = ceiling(2 * sigma^2 * (z_alpha + z_beta)^2 / delta^2)
)
}
# Example: Δ=5, σ=10, ρ=0.5
ancova_sample_size(delta = 5, sigma = 10, rho = 0.5)
# n = 48/group (vs. 64 unadjusted), 25% reduction
# Higher correlation = greater efficiency
ancova_sample_size(delta = 5, sigma = 10, rho = 0.7)
# n = 33/group, 49% reductionRepeated Measures (Compound Symmetry)
repeated_measures_cs <- function(delta, sigma, m, rho, alpha = 0.05, power = 0.80) {
z_alpha <- qnorm(1 - alpha/2)
z_beta <- qnorm(power)
# Efficiency factor for m measurements under CS
efficiency <- (1 + (m - 1) * rho) / m
# Base sample size (single measurement)
n_base <- 2 * sigma^2 * (z_alpha + z_beta)^2 / delta^2
# Adjusted sample size
n <- n_base * efficiency
list(
n_per_group = ceiling(n),
efficiency_factor = round(efficiency, 3),
reduction_vs_single = paste0(round((1 - efficiency) * 100), "%")
)
}
# Example: 4 timepoints, ρ=0.5
repeated_measures_cs(delta = 5, sigma = 10, m = 4, rho = 0.5)
# n = 40/group, 37% reduction vs single measurement
# Diminishing returns beyond 4-5 measurements
sapply(2:10, function(m) {
repeated_measures_cs(5, 10, m, 0.5)$n_per_group
})Slope Comparison
slope_comparison <- function(delta_slope, sigma, times, alpha = 0.05, power = 0.80) {
z_alpha <- qnorm(1 - alpha/2)
z_beta <- qnorm(power)
# Variance of time points
t_mean <- mean(times)
V_t <- sum((times - t_mean)^2)
n <- 2 * sigma^2 * (z_alpha + z_beta)^2 / (delta_slope^2 * V_t)
list(
n_per_group = ceiling(n),
time_variance = V_t,
n_timepoints = length(times)
)
}
# Example: 4 equally-spaced times (0,1,2,3), slope difference = 2
slope_comparison(delta_slope = 2, sigma = 10, times = c(0, 1, 2, 3))
# n = 100/group
# More timepoints = more power for slope comparison
slope_comparison(delta_slope = 2, sigma = 10, times = 0:5)
# n = 35/group (6 timepoints)Crossover Design
crossover_sample_size <- function(delta, sigma_within, alpha = 0.05, power = 0.80) {
z_alpha <- qnorm(1 - alpha/2)
z_beta <- qnorm(power)
# 2x2 crossover formula
n <- 2 * sigma_within^2 * (z_alpha + z_beta)^2 / delta^2
list(
n_total = ceiling(n),
n_per_sequence = ceiling(n / 2)
)
}
# Example: Δ=5, within-subject SD=6
crossover_sample_size(delta = 5, sigma_within = 6)
# n = 12 total
# Compare to parallel design
parallel_n <- 2 * ceiling(2 * 10^2 * (qnorm(0.975) + qnorm(0.8))^2 / 5^2)
crossover_n <- crossover_sample_size(5, 6)$n_total
cat("Parallel:", parallel_n, "vs Crossover:", crossover_n, "\n")Using longpower Package
# install.packages("longpower")
library(longpower)
# Power for comparing slopes (linear mixed model)
power.mmrm(
N = 100, # Per group
Ra = 0.5, # Correlation in group A
ra = 0.5, # Autocorrelation in A
sigmaa = 10, # SD in group A
Rb = 0.5, # Correlation in group B
rb = 0.5, # Autocorrelation in B
sigmab = 10, # SD in group B
lambda = c(0, 0, 0, 1), # Contrast (difference at time 4)
times = c(0, 1, 2, 3),
delta = 5, # Effect size
sig.level = 0.05
)
# Sample size for pilot study
liu.liang.linear.power(
delta = 5, # Effect size
u = c(0, 1, 2, 3), # Time points
v = c(0, 1, 2, 3), # Same for both groups
sigma2 = 100, # Residual variance
R = 0.5, # Correlation
alternative = "two.sided",
power = 0.80
)Mixed Model Power with simr
library(lme4)
library(simr)
# Create pilot data structure
pilot_data <- expand.grid(
subject = 1:30,
time = c(0, 1, 2, 3)
)
pilot_data$group <- rep(c("control", "treatment"), each = 60)
pilot_data$y <- rnorm(120)
# Fit model
model <- lmer(y ~ group * time + (1 + time | subject), data = pilot_data)
# Set effect size for group:time interaction
fixef(model)["grouptreatment:time"] <- 2 # Slope difference
# Power simulation
powerSim(model, test = fixed("grouptreatment:time"), nsim = 100)Dropout Adjustment
dropout_adjustment <- function(n, dropout_per_visit, n_visits, method = "conservative") {
if (method == "conservative") {
# Assume completers only
completion_rate <- (1 - dropout_per_visit)^(n_visits - 1)
n_adj <- ceiling(n / completion_rate)
} else if (method == "optimistic") {
# Account for partial information from dropouts
# Using square root adjustment
n_adj <- ceiling(n / sqrt((1 - dropout_per_visit)^(n_visits - 1)))
}
list(
original_n = n,
adjusted_n = n_adj,
expected_completers = floor(n_adj * (1 - dropout_per_visit)^(n_visits - 1)),
completion_rate = round((1 - dropout_per_visit)^(n_visits - 1) * 100, 1)
)
}
# Example: 5% dropout per visit, 5 visits
dropout_adjustment(n = 50, dropout_per_visit = 0.05, n_visits = 5)AR(1) vs Compound Symmetry Comparison
compare_correlation_structures <- function(delta, sigma, m, rho, alpha = 0.05, power = 0.80) {
z_alpha <- qnorm(1 - alpha/2)
z_beta <- qnorm(power)
n_base <- 2 * sigma^2 * (z_alpha + z_beta)^2 / delta^2
# Compound symmetry
eff_cs <- (1 + (m - 1) * rho) / m
n_cs <- ceiling(n_base * eff_cs)
# AR(1) - correlation decays
# Approximate efficiency (exact depends on contrast)
ar1_corr <- sapply(1:(m-1), function(k) rho^k)
avg_corr_ar1 <- mean(c(1, ar1_corr[1:(m-1)], ar1_corr))
eff_ar1 <- (1 + (m - 1) * avg_corr_ar1 * 0.7) / m # Approximation
n_ar1 <- ceiling(n_base * eff_ar1)
data.frame(
structure = c("Compound Symmetry", "AR(1)"),
n_per_group = c(n_cs, n_ar1),
efficiency = c(eff_cs, eff_ar1)
)
}
compare_correlation_structures(delta = 5, sigma = 10, m = 5, rho = 0.6)8. References
Diggle PJ, Heagerty P, Liang KY, Zeger SL (2002).Analysis of Longitudinal Data, 2nd ed. Oxford University Press.
Fitzmaurice GM, Laird NM, Ware JH (2011).Applied Longitudinal Analysis, 2nd ed. Wiley.
Liu GF, Lu K, Mogg R, et al. (2009). Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials?Statistics in Medicine, 28(20):2509-2530.
Mallinckrodt CH, et al. (2008). Recommendations for the primary analysis of continuous endpoints in longitudinal clinical trials.Drug Information Journal, 42(4):303-319.
National Research Council (2010).The Prevention and Treatment of Missing Data in Clinical Trials. National Academies Press.
Siddiqui O, Hung HMJ, O'Neill R (2009). MMRM vs. LOCF: a comprehensive comparison based on simulation study and 25 NDA datasets.Journal of Biopharmaceutical Statistics, 19(2):227-246.
Ready to Calculate?
Use our Longitudinal Calculator to determine the optimal sample size for your repeated measures study.