Statistics

ANOVA Explained: Analysis of Variance for Beginners

March 24, 202612 min readIstražimo Team

# ANOVA Explained: Analysis of Variance for Beginners

You are comparing life satisfaction scores across four university faculties: Psychology, Law, Engineering, and Medicine. Your first instinct might be to run six separate t-tests, one for each pair of groups. That instinct is understandable, but it is wrong, and understanding why it is wrong is the gateway to understanding ANOVA.

ANOVA (Analysis of Variance) is the statistical method designed for comparing means across three or more groups simultaneously. It is one of the most widely used tests in the social sciences, and if you understand the t-test, you already have the conceptual foundation. ANOVA is, in many ways, the t-test's older sibling.

Why You Cannot Just Run Multiple T-Tests

This is the single most important concept to grasp before learning ANOVA, so let us be concrete about it.

If you have 4 groups, you would need 6 pairwise t-tests:

•Psychology vs. Law
•Psychology vs. Engineering
•Psychology vs. Medicine
•Law vs. Engineering
•Law vs. Medicine
•Engineering vs. Medicine

Each t-test has a 5% chance of producing a false positive (Type I error) at alpha = 0.05. The probability of getting at least one false positive across all 6 tests is:

1 - (0.95)^6 = 0.265, or about 26.5%

That means you have roughly a 1-in-4 chance of declaring a significant difference that does not actually exist. With 5 groups (10 tests), it rises to 40%. With 10 groups (45 tests), it is 90%.

This is called the familywise error rate, and it is the fundamental problem that ANOVA solves. ANOVA tests all groups simultaneously with a single test, keeping your Type I error rate at 5%.

What ANOVA Actually Does

Despite its name, ANOVA does not directly compare variances between group means. Instead, it compares two types of variability:

Between-group variance: How much the group means differ from each other.
Within-group variance: How much individual scores vary within each group (essentially, noise).

The F-statistic is the ratio of these two:

F = Between-group variance / Within-group variance

If the groups are truly different, the between-group variance will be large relative to the within-group variance, producing a large F. If the groups are essentially the same, both variances will be similar, and F will be close to 1.

Think of it as a signal-to-noise ratio. A large F means the signal (group differences) is loud relative to the noise (individual variation).

Types of ANOVA

One-Way ANOVA

When to use it: You have one independent variable (factor) with three or more levels (groups), and one continuous dependent variable.

Example: Comparing life satisfaction scores (dependent variable) across four faculties (independent variable with 4 levels).

This is the simplest and most common form of ANOVA. If you have only two groups, one-way ANOVA gives exactly the same result as an independent samples t-test (in fact, F = t-squared).

Two-Way ANOVA (Factorial ANOVA)

When to use it: You have two independent variables and want to examine their individual effects and their interaction.

Example: Examining the effects of both faculty (Psychology, Law, Engineering, Medicine) and gender (male, female) on life satisfaction. This design answers three questions:

Is there a main effect of faculty? (Do faculties differ in satisfaction?)
Is there a main effect of gender? (Do men and women differ in satisfaction?)
Is there an interaction? (Does the effect of faculty depend on gender?)

The interaction is often the most interesting finding. Maybe men and women do not differ overall, but women in Engineering report much lower satisfaction than men in Engineering, while the reverse is true in Psychology.

Repeated Measures ANOVA

When to use it: The same participants are measured at three or more time points.

Example: Measuring anxiety levels before therapy, after 4 weeks, and after 8 weeks in the same group of patients.

This is the extension of the paired t-test to three or more measurements. Because the same individuals are measured repeatedly, individual differences are controlled for, giving the test more statistical power.

Mixed ANOVA

When to use it: You have both a between-subjects factor and a within-subjects factor.

Example: Comparing anxiety reduction across two therapy types (between-subjects: CBT vs. psychodynamic) measured at three time points (within-subjects: before, mid, after).

Assumptions of ANOVA

1. Independence of Observations

Each observation must be independent. Participants should not influence each other's scores. This assumption is the hardest to fix if violated.

2. Normality

The dependent variable should be approximately normally distributed within each group. ANOVA is quite robust to normality violations when group sizes are similar and at least 20 to 30 per group. For serious violations, consider the Kruskal-Wallis test (non-parametric alternative to one-way ANOVA).

3. Homogeneity of Variances

The variance in each group should be roughly equal. This is tested with Levene's test.

•If Levene's test is non-significant (p > 0.05): Assumption met, proceed normally.
•If Levene's test is significant (p < 0.05): Variances are unequal. Use Welch's ANOVA (available in R and SPSS) or the Brown-Forsythe correction.

Rule of thumb: If the largest group variance is no more than 4 times the smallest, ANOVA is generally robust, especially with equal group sizes.

A Complete Example: Student Satisfaction Across Faculties

Suppose you surveyed 200 students (50 per faculty) about their satisfaction with their study program on a scale from 1 to 7.

Faculty	N	Mean	SD
Psychology	50	5.24	1.12
Law	50	4.56	1.28
Engineering	50	4.78	1.19
Medicine	50	5.41	1.05

Step 1: State the hypotheses.

•H0: All four group means are equal (mu_Psych = mu_Law = mu_Eng = mu_Med).
•H1: At least one group mean differs from the others.

Note that H1 does not specify which groups differ. ANOVA is an omnibus test: it tells you that at least one difference exists but not where.

Step 2: Check assumptions.

•Independence: Each student belongs to one faculty. Check.
•Normality: With N = 50 per group, ANOVA is robust. Shapiro-Wilk tests yield p-values above 0.10 for all groups. Check.
•Homogeneity: Levene's test gives F(3, 196) = 1.23, p = 0.30 (non-significant). Check.

Step 3: Run ANOVA.

Source	SS	df	MS	F	p
Between groups	23.41	3	7.80	5.73	0.001
Within groups	266.88	196	1.36
Total	290.29	199

F(3, 196) = 5.73, p = 0.001

Since p < 0.05, we reject the null hypothesis. At least one faculty differs from the others in satisfaction. But which ones?

Step 4: Compute effect size.

Eta-squared (eta-sq) = SS_between / SS_total = 23.41 / 290.29 = 0.081

This means that faculty membership explains about 8.1% of the variance in satisfaction scores.

Eta-squared	Interpretation
0.01	Small
0.06	Medium
0.14	Large

Our eta-squared of 0.081 is a medium effect. The result is planning-relevant. For future studies, this effect size could inform a power analysis for sample size determination.

Post-Hoc Tests: Finding Out Where the Differences Are

A significant ANOVA tells you that the groups are not all equal, but it does not tell you which specific pairs differ. Post-hoc tests fill this gap. They compare every pair of groups while adjusting for multiple comparisons.

Tukey's HSD (Honestly Significant Difference)

Best for: Equal group sizes, when you want to compare all pairs.

How it works: Computes a critical difference based on the studentized range distribution. Any pair whose mean difference exceeds this critical value is declared significant.

This is the most commonly used post-hoc test and is the default recommendation when you have no specific reason to choose another.

Bonferroni Correction

Best for: When you have a small number of planned comparisons rather than all possible pairs.

How it works: Divides alpha by the number of comparisons. For 6 comparisons, each test uses alpha = 0.05 / 6 = 0.0083.

Bonferroni is more conservative (less likely to find significance) than Tukey. Use it when you have specific hypotheses about which pairs to compare.

Scheffe's Test

Best for: Complex comparisons (contrasts involving combinations of groups, not just simple pairs).

How it works: The most conservative of the three. It controls the error rate for any possible comparison, including non-pairwise contrasts.

Use Scheffe when you want to compare, say, the average of Psychology and Medicine against the average of Law and Engineering.

Games-Howell

Best for: When the homogeneity of variances assumption is violated (unequal variances).

How it works: Like Tukey's, but does not assume equal variances. It is the post-hoc equivalent of Welch's t-test.

Post-Hoc Results for Our Example

Using Tukey's HSD at alpha = 0.05:

Comparison	Mean Difference	p-value	Significant?
Psychology vs. Law	0.68	0.012	Yes
Psychology vs. Engineering	0.46	0.148	No
Psychology vs. Medicine	-0.17	0.882	No
Law vs. Engineering	-0.22	0.732	No
Law vs. Medicine	-0.85	0.001	Yes
Engineering vs. Medicine	-0.63	0.021	Yes

Interpretation: Psychology and Medicine students report significantly higher satisfaction than Law students. Engineering students differ significantly only from Medicine students. Psychology and Engineering do not differ significantly, nor do Psychology and Medicine.

Reporting ANOVA in APA Format

Here is how to report our example:

A one-way ANOVA was conducted to examine differences in study satisfaction across four faculties. There was a statistically significant difference, F(3, 196) = 5.73, p = .001, eta-sq = .08. Post-hoc comparisons using Tukey's HSD indicated that Law students (M = 4.56, SD = 1.28) reported significantly lower satisfaction than Psychology students (M = 5.24, SD = 1.12, p = .012), Medicine students (M = 5.41, SD = 1.05, p = .001), and Engineering students did not differ significantly from Psychology or Law students. Medicine students also reported significantly higher satisfaction than Engineering students (M = 4.78, SD = 1.19, p = .021).

Key elements to include:

•Type of ANOVA
•F-statistic with between-groups and within-groups degrees of freedom
•Exact p-value
•Effect size (eta-squared or partial eta-squared)
•Post-hoc test used and specific pairwise results
•Group means and standard deviations

Two-Way ANOVA: A Brief Look at Interactions

Suppose we add gender to our faculty example, creating a 4 x 2 factorial design. We now get three F-tests:

Main effect of faculty: F(3, 192) = 5.41, p = 0.001 (same pattern as before).
Main effect of gender: F(1, 192) = 0.89, p = 0.346 (no overall gender difference).
Interaction (faculty x gender): F(3, 192) = 3.12, p = 0.027 (significant).

The interaction is the interesting finding. It means that the pattern of satisfaction across faculties is different for men and women. Perhaps women in Engineering report lower satisfaction than men in Engineering, while the gender gap is reversed or absent in other faculties.

When an interaction is significant, the main effects should be interpreted with caution (or not at all), because the story is not about overall group differences but about how one factor modifies the effect of the other.

Common Mistake: Forgetting the Post-Hoc Test

This is the most frequent error students make with ANOVA. They run the omnibus F-test, get a significant result, and then stop. They report "there was a significant difference across the four groups" without identifying which groups actually differ.

Why this is a problem: A significant ANOVA tells you almost nothing useful by itself. Saying "the groups differ" without specifying how is like saying "something happened" without saying what. Your reader, your committee, and the scientific community need to know the specific pattern of differences.

What to do:

Run the ANOVA (omnibus test).
If significant, run post-hoc tests.
Report both the omnibus result and the post-hoc pairwise comparisons.
If the ANOVA is non-significant, do not run post-hoc tests (there is nothing to follow up on).

A related mistake is running post-hoc tests when the omnibus ANOVA is not significant. If F is non-significant, you should accept that the groups do not differ and stop there. Running pairwise tests after a non-significant ANOVA is a form of fishing for results.

ANOVA vs. T-Test: A Quick Decision Guide

Situation	Test
2 independent groups	Independent t-test
2 related measurements	Paired t-test
3+ independent groups	One-way ANOVA
3+ related measurements	Repeated measures ANOVA
2 factors, both between-subjects	Two-way ANOVA
1 between-subjects factor + 1 within-subjects factor	Mixed ANOVA

If you are unsure whether your design calls for a t-test or ANOVA, count your groups. Two groups means a t-test. Three or more means ANOVA. It really is that straightforward.

Beyond ANOVA: When to Use Something Else

ANOVA is powerful, but it is not always the right tool:

•Non-normal data, small samples: Kruskal-Wallis test (non-parametric alternative).
•Unequal variances: Welch's ANOVA or Brown-Forsythe test.
•Covariates to control for: ANCOVA (Analysis of Covariance).
•Multiple dependent variables: MANOVA (Multivariate ANOVA).
•Complex nested or crossed designs: Mixed-effects models (linear mixed models).

For most undergraduate and master's thesis work, one-way and two-way ANOVA will cover the vast majority of designs.

Wrapping Up

ANOVA is a logical extension of the t-test that allows you to compare three or more groups while keeping your Type I error rate under control. The key steps are: check assumptions, run the omnibus test, compute effect size, and (if significant) follow up with post-hoc comparisons. The most common mistakes are skipping the post-hoc step and running multiple t-tests instead.

If you are collecting survey data from multiple groups, the Istrazimo platform runs one-way ANOVA with automatic post-hoc tests (Tukey HSD), computes eta-squared, and formats everything in APA style. It handles the computation so you can focus on what the numbers actually mean for your research question.

Try this in Istražimo

From creating surveys to statistical analysis, all in one place. Free for students and researchers.

Start for free →

Statistics

T-Test Explained: When to Use It and How to Interpret Results

Statistics

Correlation vs Causation: Why Association Isn't Proof

Statistics

Chi-Square Test: When and How to Use It

Statistics

ANOVA Explained: Analysis of Variance for Beginners

March 24, 202612 min readIstražimo Team

# ANOVA Explained: Analysis of Variance for Beginners

Why You Cannot Just Run Multiple T-Tests

This is the single most important concept to grasp before learning ANOVA, so let us be concrete about it.

If you have 4 groups, you would need 6 pairwise t-tests:

•Psychology vs. Law
•Psychology vs. Engineering
•Psychology vs. Medicine
•Law vs. Engineering
•Law vs. Medicine
•Engineering vs. Medicine

Each t-test has a 5% chance of producing a false positive (Type I error) at alpha = 0.05. The probability of getting at least one false positive across all 6 tests is:

1 - (0.95)^6 = 0.265, or about 26.5%

That means you have roughly a 1-in-4 chance of declaring a significant difference that does not actually exist. With 5 groups (10 tests), it rises to 40%. With 10 groups (45 tests), it is 90%.

This is called the familywise error rate, and it is the fundamental problem that ANOVA solves. ANOVA tests all groups simultaneously with a single test, keeping your Type I error rate at 5%.

What ANOVA Actually Does

Despite its name, ANOVA does not directly compare variances between group means. Instead, it compares two types of variability:

Between-group variance: How much the group means differ from each other.
Within-group variance: How much individual scores vary within each group (essentially, noise).

The F-statistic is the ratio of these two:

F = Between-group variance / Within-group variance

Think of it as a signal-to-noise ratio. A large F means the signal (group differences) is loud relative to the noise (individual variation).

Types of ANOVA

One-Way ANOVA

When to use it: You have one independent variable (factor) with three or more levels (groups), and one continuous dependent variable.

Example: Comparing life satisfaction scores (dependent variable) across four faculties (independent variable with 4 levels).

This is the simplest and most common form of ANOVA. If you have only two groups, one-way ANOVA gives exactly the same result as an independent samples t-test (in fact, F = t-squared).

Two-Way ANOVA (Factorial ANOVA)

When to use it: You have two independent variables and want to examine their individual effects and their interaction.

Example: Examining the effects of both faculty (Psychology, Law, Engineering, Medicine) and gender (male, female) on life satisfaction. This design answers three questions:

Is there a main effect of faculty? (Do faculties differ in satisfaction?)
Is there a main effect of gender? (Do men and women differ in satisfaction?)
Is there an interaction? (Does the effect of faculty depend on gender?)

Repeated Measures ANOVA

When to use it: The same participants are measured at three or more time points.

Example: Measuring anxiety levels before therapy, after 4 weeks, and after 8 weeks in the same group of patients.

Mixed ANOVA

When to use it: You have both a between-subjects factor and a within-subjects factor.

Example: Comparing anxiety reduction across two therapy types (between-subjects: CBT vs. psychodynamic) measured at three time points (within-subjects: before, mid, after).

Assumptions of ANOVA

1. Independence of Observations

Each observation must be independent. Participants should not influence each other's scores. This assumption is the hardest to fix if violated.

2. Normality

3. Homogeneity of Variances

The variance in each group should be roughly equal. This is tested with Levene's test.

•If Levene's test is non-significant (p > 0.05): Assumption met, proceed normally.
•If Levene's test is significant (p < 0.05): Variances are unequal. Use Welch's ANOVA (available in R and SPSS) or the Brown-Forsythe correction.

Rule of thumb: If the largest group variance is no more than 4 times the smallest, ANOVA is generally robust, especially with equal group sizes.

A Complete Example: Student Satisfaction Across Faculties

Suppose you surveyed 200 students (50 per faculty) about their satisfaction with their study program on a scale from 1 to 7.

Faculty	N	Mean	SD
Psychology	50	5.24	1.12
Law	50	4.56	1.28
Engineering	50	4.78	1.19
Medicine	50	5.41	1.05

Step 1: State the hypotheses.

•H0: All four group means are equal (mu_Psych = mu_Law = mu_Eng = mu_Med).
•H1: At least one group mean differs from the others.

Note that H1 does not specify which groups differ. ANOVA is an omnibus test: it tells you that at least one difference exists but not where.

Step 2: Check assumptions.

•Independence: Each student belongs to one faculty. Check.
•Normality: With N = 50 per group, ANOVA is robust. Shapiro-Wilk tests yield p-values above 0.10 for all groups. Check.
•Homogeneity: Levene's test gives F(3, 196) = 1.23, p = 0.30 (non-significant). Check.

Step 3: Run ANOVA.

Source	SS	df	MS	F	p
Between groups	23.41	3	7.80	5.73	0.001
Within groups	266.88	196	1.36
Total	290.29	199

F(3, 196) = 5.73, p = 0.001

Since p < 0.05, we reject the null hypothesis. At least one faculty differs from the others in satisfaction. But which ones?

Step 4: Compute effect size.

Eta-squared (eta-sq) = SS_between / SS_total = 23.41 / 290.29 = 0.081

This means that faculty membership explains about 8.1% of the variance in satisfaction scores.

Eta-squared	Interpretation
0.01	Small
0.06	Medium
0.14	Large

Our eta-squared of 0.081 is a medium effect. The result is planning-relevant. For future studies, this effect size could inform a power analysis for sample size determination.

Post-Hoc Tests: Finding Out Where the Differences Are

Tukey's HSD (Honestly Significant Difference)

Best for: Equal group sizes, when you want to compare all pairs.

How it works: Computes a critical difference based on the studentized range distribution. Any pair whose mean difference exceeds this critical value is declared significant.

This is the most commonly used post-hoc test and is the default recommendation when you have no specific reason to choose another.

Bonferroni Correction

Best for: When you have a small number of planned comparisons rather than all possible pairs.

How it works: Divides alpha by the number of comparisons. For 6 comparisons, each test uses alpha = 0.05 / 6 = 0.0083.

Bonferroni is more conservative (less likely to find significance) than Tukey. Use it when you have specific hypotheses about which pairs to compare.

Scheffe's Test

Best for: Complex comparisons (contrasts involving combinations of groups, not just simple pairs).

How it works: The most conservative of the three. It controls the error rate for any possible comparison, including non-pairwise contrasts.

Use Scheffe when you want to compare, say, the average of Psychology and Medicine against the average of Law and Engineering.

Games-Howell

Best for: When the homogeneity of variances assumption is violated (unequal variances).

How it works: Like Tukey's, but does not assume equal variances. It is the post-hoc equivalent of Welch's t-test.

Post-Hoc Results for Our Example

Using Tukey's HSD at alpha = 0.05:

Comparison	Mean Difference	p-value	Significant?
Psychology vs. Law	0.68	0.012	Yes
Psychology vs. Engineering	0.46	0.148	No
Psychology vs. Medicine	-0.17	0.882	No
Law vs. Engineering	-0.22	0.732	No
Law vs. Medicine	-0.85	0.001	Yes
Engineering vs. Medicine	-0.63	0.021	Yes

Reporting ANOVA in APA Format

Here is how to report our example:

A one-way ANOVA was conducted to examine differences in study satisfaction across four faculties. There was a statistically significant difference, F(3, 196) = 5.73, p = .001, eta-sq = .08. Post-hoc comparisons using Tukey's HSD indicated that Law students (M = 4.56, SD = 1.28) reported significantly lower satisfaction than Psychology students (M = 5.24, SD = 1.12, p = .012), Medicine students (M = 5.41, SD = 1.05, p = .001), and Engineering students did not differ significantly from Psychology or Law students. Medicine students also reported significantly higher satisfaction than Engineering students (M = 4.78, SD = 1.19, p = .021).

Key elements to include:

•Type of ANOVA
•F-statistic with between-groups and within-groups degrees of freedom
•Exact p-value
•Effect size (eta-squared or partial eta-squared)
•Post-hoc test used and specific pairwise results
•Group means and standard deviations

Two-Way ANOVA: A Brief Look at Interactions

Suppose we add gender to our faculty example, creating a 4 x 2 factorial design. We now get three F-tests:

Main effect of faculty: F(3, 192) = 5.41, p = 0.001 (same pattern as before).
Main effect of gender: F(1, 192) = 0.89, p = 0.346 (no overall gender difference).
Interaction (faculty x gender): F(3, 192) = 3.12, p = 0.027 (significant).

Common Mistake: Forgetting the Post-Hoc Test

What to do:

Run the ANOVA (omnibus test).
If significant, run post-hoc tests.
Report both the omnibus result and the post-hoc pairwise comparisons.
If the ANOVA is non-significant, do not run post-hoc tests (there is nothing to follow up on).

ANOVA vs. T-Test: A Quick Decision Guide

Situation	Test
2 independent groups	Independent t-test
2 related measurements	Paired t-test
3+ independent groups	One-way ANOVA
3+ related measurements	Repeated measures ANOVA
2 factors, both between-subjects	Two-way ANOVA
1 between-subjects factor + 1 within-subjects factor	Mixed ANOVA

If you are unsure whether your design calls for a t-test or ANOVA, count your groups. Two groups means a t-test. Three or more means ANOVA. It really is that straightforward.

Beyond ANOVA: When to Use Something Else

ANOVA is powerful, but it is not always the right tool:

•Non-normal data, small samples: Kruskal-Wallis test (non-parametric alternative).
•Unequal variances: Welch's ANOVA or Brown-Forsythe test.
•Covariates to control for: ANCOVA (Analysis of Covariance).
•Multiple dependent variables: MANOVA (Multivariate ANOVA).
•Complex nested or crossed designs: Mixed-effects models (linear mixed models).

For most undergraduate and master's thesis work, one-way and two-way ANOVA will cover the vast majority of designs.

Wrapping Up

Try this in Istražimo

From creating surveys to statistical analysis, all in one place. Free for students and researchers.

Start for free →

Statistics

Why You Cannot Just Run Multiple T-Tests

What ANOVA Actually Does

Types of ANOVA

One-Way ANOVA

Two-Way ANOVA (Factorial ANOVA)

Repeated Measures ANOVA

Mixed ANOVA

Assumptions of ANOVA

1. Independence of Observations

2. Normality

3. Homogeneity of Variances

A Complete Example: Student Satisfaction Across Faculties

Post-Hoc Tests: Finding Out Where the Differences Are

Tukey's HSD (Honestly Significant Difference)

Bonferroni Correction

Scheffe's Test

Games-Howell

Post-Hoc Results for Our Example

Reporting ANOVA in APA Format

Two-Way ANOVA: A Brief Look at Interactions

Common Mistake: Forgetting the Post-Hoc Test

ANOVA vs. T-Test: A Quick Decision Guide

Beyond ANOVA: When to Use Something Else

Wrapping Up

Try this in Istražimo

Related posts

T-Test Explained: When to Use It and How to Interpret Results

Correlation vs Causation: Why Association Isn't Proof

Chi-Square Test: When and How to Use It

Why You Cannot Just Run Multiple T-Tests

What ANOVA Actually Does

Types of ANOVA

One-Way ANOVA

Two-Way ANOVA (Factorial ANOVA)

Repeated Measures ANOVA

Mixed ANOVA

Assumptions of ANOVA

1. Independence of Observations

2. Normality

3. Homogeneity of Variances

A Complete Example: Student Satisfaction Across Faculties

Post-Hoc Tests: Finding Out Where the Differences Are

Tukey's HSD (Honestly Significant Difference)

Bonferroni Correction

Scheffe's Test

Games-Howell

Post-Hoc Results for Our Example

Reporting ANOVA in APA Format

Two-Way ANOVA: A Brief Look at Interactions

Common Mistake: Forgetting the Post-Hoc Test

ANOVA vs. T-Test: A Quick Decision Guide

Beyond ANOVA: When to Use Something Else

Wrapping Up

Try this in Istražimo

Related posts

T-Test Explained: When to Use It and How to Interpret Results

Correlation vs Causation: Why Association Isn't Proof

Chi-Square Test: When and How to Use It