Sample Size Calculation with G*Power

"How did you determine your sample size?" is now a standard question on ethics-committee forms, and "as many participants as we could reach" is no longer an acceptable answer. Sample size calculation means determining, before data collection, the smallest number of participants that gives your planned statistical test adequate power — and the free G*Power software does it in minutes. This guide covers the logic of the four interlocking parameters, how to choose an expected effect size, step-by-step G*Power procedures for t-tests, ANOVA and regression, and a ready-made reporting template for the methods section.

Why power analysis has become compulsory

The requirement is not mere box-ticking. An underpowered sample inflates the risk of missing an effect that genuinely exists (a Type II error); an needlessly large one burdens participants and wastes resources — ethics committees scrutinise both directions. On the journal side, APA 7 reporting standards and most author guidelines now require an explicit statement of how the sample size was determined. A thesis without a power analysis meets the question at the viva; a paper that ran one but failed to report it meets the same question from reviewers.

Four interlocking parameters

The whole of power analysis rests on the mathematical relationship among four quantities; fix any three and the fourth is determined:

Significance level (α): The probability of a Type I error; 0.05 is the near-universal convention in the social sciences.
Power (1−β): The probability of detecting a true effect; 0.80 is the accepted floor, with 0.90 common in clinical research.
Effect size: The expected magnitude of the effect you are hunting (d, f, f², r...); the most critical of the four and the one most often got wrong.
Sample size (N): The unknown solved for in an a-priori analysis; once the other three are fixed, G*Power computes it.

A priori or post hoc?

An a-priori analysis is run before data collection and yields the required N; this is what ethics committees and journals expect. Post-hoc "observed power", calculated after the analysis from the effect size actually found, is heavily criticised in the methodological literature: observed power is a one-to-one function of the p-value, so it adds no information, and the defence "the result was non-significant but power was low" is circular reasoning. If the data have already been collected, the sound alternative is a sensitivity analysis: report the smallest effect size detectable with the available N at α = 0.05 and power = 0.80, and discuss your findings against that threshold.

Choosing the expected effect size

Everything in the calculation hinges on this choice; a touch of optimism can halve the computed N and leave the study underpowered from day one. The order of preference is: (1) pilot data collected with the same instruments, (2) effect sizes reported in the closest meta-analyses or prior studies (shaded downwards somewhat, since publication bias inflates them), and (3) failing both, Cohen's conventions (for d: 0.2/0.5/0.8 = small/medium/large). Treat the conventions as a last resort: if a "medium" effect is unrealistic in your field, the calculation built on it is unrealistic too. When in doubt, assume the smaller effect — that is, plan for the larger N — as the safe side of the trade-off.

Sample size calculation in G*Power, step by step

For an independent-samples t-test, the typical sequence is:

Test family: t tests → Statistical test: Means: Difference between two independent means (two groups).
Type of power analysis: A priori: Compute required sample size.
Inputs: Tails = Two; Effect size d = 0.5; α err prob = 0.05; Power (1−β) = 0.80; Allocation ratio N2/N1 = 1.
Calculate: The output is 64 per group, 128 in total. The Determine button computes d for you from group means and standard deviations.

For one-way ANOVA, choose F tests → ANOVA: Fixed effects, omnibus, one-way; the effect size is entered as f (the Determine panel converts from η²) along with the number of groups. For multiple regression, choose F tests → Linear multiple regression: Fixed model, R² deviation from zero; the effect size is f² = R²/(1−R²), entered with the number of predictors. The table below summarises required sample sizes at common effect sizes for the three workhorse tests:

Required sample sizes computed in G*Power at α = 0.05 (two-tailed) and power = 0.80
Test	Small effect	Medium effect	Large effect
Independent-samples t-test (d; per group)	d = 0.2 → 394	d = 0.5 → 64	d = 0.8 → 26
One-way ANOVA, 3 groups (f; total)	f = 0.10 → 969	f = 0.25 → 159	f = 0.40 → 66
Multiple regression, 5 predictors (f²; total)	f² = 0.02 → 647	f² = 0.15 → 92	f² = 0.35 → 43
Pearson correlation (r; total)	r = 0.10 → 782	r = 0.30 → 84	r = 0.50 → 29

Required N per group for an independent-samples t-test (two-tailed, α = 0.05, power = 0.80); N grows steeply as the effect shrinks

Reporting the calculation and allowing for attrition

A complete report contains five elements: the software and version, the type of analysis, the target test, the input parameters and the result. A ready-made template: "An a-priori power analysis in G*Power 3.1 indicated that an independent-samples t-test with a medium effect size (d = 0.5), α = 0.05 and power = 0.80 requires 64 participants per group, 128 in total." Adding one sentence on where the effect size came from (pilot, literature or convention) closes off the reviewer's follow-up question in advance. For the wider conventions on presenting results, see our guide on reporting statistics in APA 7 style.

Remember that the computed N is the number of participants who must enter the analysis, not the number to recruit. Dropout is unavoidable in longitudinal designs, online surveys and clinical follow-ups; depending on the field, an attrition allowance of 10–20% is added to set the recruitment target. With 128 required participants and 15% expected attrition, for example, the target becomes 128 / 0.85 ≈ 151. Reporting the allowance together with its rationale builds confidence with both the ethics committee and reviewers.

Power analysis is not a formality; it is the insurance that tells you whether the effect you failed to find was truly absent or simply sought with too few participants.

Frequently Asked Questions

Is G*Power free, and where do I download it?

Yes, G*Power is entirely free and can be downloaded for Windows and macOS from the Heinrich Heine University Düsseldorf website. No licence is needed for academic use; just use a current version and cite the version number in your report.

A reviewer asked for post-hoc power — what should I do?

Observed power is a direct function of the p-value, so it is methodologically criticised; offer a sensitivity analysis instead where possible. Reporting the smallest effect detectable with your sample at α = 0.05 and power = 0.80 answers the same concern with information that is actually new.

What if I can find no basis for an expected effect size?

Consider running a small pilot study first; if that is not feasible, look at the typical range of effects in the nearest meta-analyses in your field. As a last resort, pick a conservative small-to-medium value from Cohen's conventions and justify the choice explicitly in your methods section.

How does Celsus support power analysis?

We provide end-to-end support: identifying the right test for your research question, deriving an effect size from the literature, running the a-priori calculation in G*Power, planning the attrition allowance and drafting a methods paragraph ready for ethics committees and theses. For complex designs we also run simulation-based power analyses in R (pwr) beyond what G*Power covers.