Docs/Guides/Chi-Square Tests

Chi-Square Tests for Categorical Data

A complete guide to chi-square tests of independence and homogeneity, effect size measures, Fisher's exact test, power analysis, sample size determination, and McNemar's test for paired proportions.

1. Overview

The chi-square test is one of the most widely used statistical procedures for analyzing categorical data. It evaluates whether the observed frequencies in a contingency table differ significantly from the frequencies we would expect if the row and column variables were independent.

Two Interpretations

•Test of independence — a single sample is cross-classified on two categorical variables. Is there an association between them?
•Test of homogeneity — independent samples from two or more populations are compared on one categorical variable. Do the populations share the same distribution?

Despite the different study designs, the arithmetic is identical: both compute the same Pearson chi-square statistic from a contingency table.

When to Use Chi-Square Tests

Comparing response rates between treatment groups

Testing whether adverse event incidence differs by group

A/B testing conversion rates across variants

Survey data: opinion by demographic category

Tip: Our Chi-Square Calculator supports four modes: Test (2x2 and RxC), Power Analysis, Sample Size, and McNemar's Test — all computed entirely in the browser.

2. The 2x2 Contingency Table

The simplest contingency table is the 2x2 layout, comparing two groups on a binary outcome. Each cell contains the count of observations falling in that row-column combination.

	Outcome +	Outcome −	Row Total
Group 1	a	b	a + b
Group 2	c	d	c + d
Col Total	a + c	b + d	N

Expected Counts

Under the null hypothesis of independence, the expected count for each cell is:

E_{ij} = \frac{(\text{Row } i \text{ total}) \times (\text{Col } j \text{ total})}{N}

Degrees of Freedom

For a 2x2 table the degrees of freedom are:

df = (R - 1)(C - 1) = (2 - 1)(2 - 1) = 1

Yates' Continuity Correction

Because the chi-square distribution is continuous but the test statistic is computed from discrete counts, Yates (1934) proposed subtracting 0.5 from each absolute deviation before squaring. For the 2x2 case:

\chi^2_{\text{Yates}} = \sum \frac{(|O_{ij} - E_{ij}| - 0.5)^2}{E_{ij}}

Note: Yates' correction is conservative — it reduces the test statistic, making it harder to reject H₀. Many modern references recommend using the uncorrected statistic or Fisher's exact test instead.

3. Chi-Square Test Statistic

The Pearson chi-square statistic measures the overall discrepancy between observed and expected counts across all cells in an $R \times C$ contingency table:

\chi^2 = \sum_{i=1}^{R} \sum_{j=1}^{C} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

Degrees of Freedom

The general formula for degrees of freedom is:

df = (R - 1)(C - 1)

For example, a 3x4 table has $(3-1)(4-1) = 6$ degrees of freedom.

P-value and Decision Rule

The p-value is the probability of observing a chi-square statistic as large or larger than the computed value under the null hypothesis:

p = P(\chi^2_{df} \geq \chi^2_{\text{obs}})

Reject $H_0$ (independence) when $p < \alpha$ . The chi-square test is always one-tailed (right tail) because larger values indicate greater deviation from independence.

Key insight: A statistically significant chi-square result tells you the variables are associated but says nothing about the direction or strength of the association. For that you need effect size measures (Section 4).

4. Effect Sizes

Statistical significance depends on sample size: a trivially small association will reach $p < 0.05$ with a large enough N. Effect size measures quantify how strong the association is, independent of sample size.

Phi Coefficient (2x2 tables)

For 2x2 tables, the phi coefficient is the correlation between two binary variables:

\varphi = \sqrt{\frac{\chi^2}{N}}

Cramér's V (RxC tables)

Cramér's V generalizes phi to tables larger than 2x2 by normalizing by the smaller table dimension:

V = \sqrt{\frac{\chi^2}{N \times \min(R-1,\, C-1)}}

Cohen's Benchmarks

Effect Size	Small	Medium	Large
w (phi / V)	0.1	0.3	0.5

Odds Ratio (2x2 tables)

The odds ratio quantifies the multiplicative change in odds of the outcome between the two groups:

OR = \frac{a \times d}{b \times c}

Relative Risk (2x2 tables)

The relative risk (risk ratio) compares proportions directly:

RR = \frac{a / (a + b)}{c / (c + d)}

Note: Odds ratio and relative risk are only meaningful for 2x2 tables. For larger tables, use Cramér's V or examine standardized residuals cell by cell.

5. Fisher's Exact Test

The Pearson chi-square relies on a large-sample approximation: the statistic is approximately chi-square distributed. When expected cell counts are small (commonly < 5), this approximation breaks down. Fisher's exact test avoids the approximation entirely.

Hypergeometric Distribution

Given fixed marginal totals, the probability of observing cell count $a$ follows the hypergeometric distribution:

P(a) = \frac{\binom{a+b}{a}\binom{c+d}{c}}{\binom{N}{a+c}}

The exact p-value sums the probabilities of all tables as extreme as or more extreme than the observed table, conditional on the margins.

When to Prefer Fisher's Over Chi-Square

•Any expected cell count is less than 5
•Total sample size is less than about 20–30
•The table is very unbalanced (one margin much larger than the other)
•You want an exact p-value rather than an asymptotic approximation

Tip: Our calculator automatically reports both Pearson chi-square and Fisher's exact p-values for 2x2 tables. The two will diverge most when expected cell counts are small.

Fisher's exact test is exact only conditional on the observed marginal totals, and that conditioning makes it somewhat conservative in many practical settings. The mid-p variant or unconditional exact methods (e.g., Boschloo's test) can be better-powered while still controlling Type I error. For 2x2 tables with small expected counts where power matters, consider these alternatives rather than defaulting to Fisher.

6. Power Analysis

Power is the probability of correctly rejecting $H_0$ when the alternative is true. For chi-square tests, power depends on the sample size, the significance level, the degrees of freedom, and the effect size.

Cohen's Effect Size w

Cohen defined effect size $w$ for chi-square tests as:

w = \sqrt{\sum_{i=1}^{m} \frac{(p_{0i} - p_{1i})^2}{p_{0i}}}

where $p_{0i}$ are the cell probabilities under $H_0$ and $p_{1i}$ are the cell probabilities under $H_1$ . For a 2x2 table, $w = |\varphi|$ .

Non-centrality Parameter

Under the alternative hypothesis, the test statistic follows a non-central chi-square distribution with non-centrality parameter:

\lambda = N \times w^2

Power Calculation

Power equals the probability that a non-central chi-square variate exceeds the critical value from the central distribution:

\text{Power} = P\!\left(\chi^2_{df,\lambda} > \chi^2_{df,\alpha}\right)

Sample Size Determination

To find the minimum $N$ for a desired power $1 - \beta$ , invert the power equation. A useful approximation:

N \approx \frac{\chi^2_{df,\alpha} + z_{1-\beta}\sqrt{2 \cdot df}}{w^2}

Note: This approximation works well for $df \geq 2$ . The calculator uses a normal approximation with Newton refinement for the non-centrality parameter.

7. McNemar's Test

McNemar's test is designed for paired binary data — before/after measurements on the same subjects, matched case-control studies, or any design where each observation in one condition is paired with an observation in the other.

The Discordant Pairs

Consider paired binary outcomes arranged in a 2x2 table:

	After +	After −
Before +	a (concordant)	b (discordant)
Before −	c (discordant)	d (concordant)

Only the discordant pairs ( $b$ and $c$ ) carry information about change. The test statistic is:

\chi^2_{\text{McNemar}} = \frac{(b - c)^2}{b + c}

This follows a chi-square distribution with 1 degree of freedom under $H_0: p_b = p_c$ (i.e., the probability of change in each direction is equal).

When to Use McNemar vs Standard Chi-Square

•McNemar: paired or matched data — the same subjects measured at two time points, or case-control pairs
•Standard chi-square: independent groups — different subjects in each cell of the contingency table

Warning: Applying a standard chi-square test to paired data ignores the within-pair correlation and can produce misleading results. Always check whether your data are paired before choosing a test.

8. Assumptions & Limitations

Key Assumptions

•Independence: each observation is independent of every other. Clustered or repeated-measures data violate this assumption.
•Expected cell counts: the chi-square approximation requires all expected counts to be reasonably large. The common rule of thumb is $E_{ij} \geq 5$ for all cells.
•Fixed margins: the total sample size (and sometimes row/column totals) is fixed by design.
•Mutually exclusive categories: each observation falls into exactly one cell.

Alternatives When Assumptions Fail

Situation	Alternative Method
Small expected counts (< 5)	Fisher's exact test
Paired or matched data	McNemar's test
Prefer likelihood-based test	G-test (log-likelihood ratio)
Stratified / confounded data	Cochran-Mantel-Haenszel test
Ordered categories	Cochran-Armitage trend test

Rule of thumb: If more than 20% of expected cell counts fall below 5, or any expected count is below 1, do not rely on the chi-square approximation — use Fisher's exact test or collapse categories.

9. API Reference

The Chi-Square Calculator runs entirely in the browser — there is no backend API. State is captured in URL parameters so results can be shared via links. Below are the four modes and their parameters.

Mode 1: Test (2x2 and RxC)

Parameter	Type	Description
mode	string	"test"
rows	number	Number of rows (2–10)
cols	number	Number of columns (2–10)
data	string	Comma-separated cell values (row-major order)
alpha	number	Significance level (default 0.05)

Outputs: chi-square statistic, df, p-value, phi (2x2), Cramér's V, odds ratio (2x2), relative risk (2x2), Fisher's exact p-value (2x2), expected counts.

Example URL: /calculators/chi-square?mode=test&rows=2&cols=2&data=50,10,30,20&alpha=0.05

Mode 2: Power Analysis

Parameter	Type	Description
mode	string	"power"
w	number	Effect size w
alpha	number	Significance level (default 0.05)
power	number	Target power (default 0.80)
df	number	Degrees of freedom (default 1)

Outputs: required total sample size N, non-centrality parameter, critical value.

Example URL: /calculators/chi-square/power?w=0.3&alpha=0.05&power=0.8&df=1

Mode 3: Sample Size

Parameter	Type	Description
ssrows	number	Number of rows (default 2)
sscols	number	Number of columns (default 2)
expected	string	Comma-separated expected proportions (row-major)
alpha	number	Significance level (default 0.05)
power	number	Desired power (default 0.80)

Outputs: required total sample size, per-cell N, derived Cohen's w, chi-square critical value.

Example URL: /calculators/chi-square/sample-size?ssrows=2&sscols=2&expected=0.3,0.2,0.2,0.3&alpha=0.05&power=0.8

Mode 4: McNemar's Test

Parameter	Type	Description
mode	string	"mcnemar"
a	number	Count: +/+ (concordant)
b	number	Count: +/− (discordant)
c	number	Count: −/+ (discordant)
d	number	Count: −/− (concordant)

Outputs: McNemar chi-square statistic, p-value, discordant pair count, exact binomial p-value.

Example URL: /calculators/chi-square?mode=mcnemar&a=40&b=15&c=5&d=30

10. References

Pearson K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine. 1900;50(302):157–175.
Fisher RA. On the interpretation of chi-square from contingency tables, and the calculation of P. Journal of the Royal Statistical Society. 1922;85(1):87–94.
Yates F. Contingency tables involving small numbers and the chi-square test. Supplement to the Journal of the Royal Statistical Society. 1934;1(2):217–235.
Cramér H. Mathematical Methods of Statistics. Princeton University Press; 1946.
McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12(2):153–157.
Cochran WG. Some methods for strengthening the common chi-square tests. Biometrics. 1954;10(4):417–451.
Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Lawrence Erlbaum Associates; 1988.

Last updated: April 2026

Chi-Square Tests for Categorical Data

Contents

1. Overview

Two Interpretations

When to Use Chi-Square Tests

2. The 2x2 Contingency Table

Expected Counts

Degrees of Freedom

Yates' Continuity Correction

3. Chi-Square Test Statistic

Degrees of Freedom

P-value and Decision Rule

4. Effect Sizes

Phi Coefficient (2x2 tables)

Cramér's V (RxC tables)

Cohen's Benchmarks

Odds Ratio (2x2 tables)

Relative Risk (2x2 tables)

5. Fisher's Exact Test

Hypergeometric Distribution

When to Prefer Fisher's Over Chi-Square

6. Power Analysis

Cohen's Effect Size w

Non-centrality Parameter

Power Calculation

Sample Size Determination

7. McNemar's Test

The Discordant Pairs

When to Use McNemar vs Standard Chi-Square

8. Assumptions & Limitations

Key Assumptions

Alternatives When Assumptions Fail

9. API Reference

Mode 1: Test (2x2 and RxC)

Mode 2: Power Analysis

Mode 3: Sample Size

Mode 4: McNemar's Test

10. References

Related Documentation

Sample Size for Binary Outcomes

Non-Inferiority & Equivalence

Ready to run your chi-square analysis?