# Random # As simple as possible ANOVA

### Ok, here we will explain, as simply as possible, what an ANOVA is and how to do it. We explain it using a simplified 2 group example, which means it is basically the same as the t-test. The whole point of the ANOVA is to test, generally, whether two groups are different from each other or not. The way this is done is called an ANalyses Of VAriance.

Basically, we are testing the amount of variation that is between the groups vs. the amount of variation that is within each group. If there is more variation between groups than within the groups, then we say there is a ‘statistically significant’ difference.

This relies on two concepts. The first is variance, which is the spread of the scores, in general (i.e., all of the scores you have). The second is the distinction between what is called ‘within’ group variance and ‘between’ group variance. In Figure 1 the within group variance is how much the number of books varies within each group, those who read the blog and those who don’t. The between group variance is how much the two group averages differ. We essentially examine if the averages of the two groups are sufficiently different, given how much variation there is within each group. 🙂

For instance, let’s pretend we have some data on the number of books read by two different types of people; those people who follow this blog vs those who do not. We have asked 100 people in each group and have the results in table 1. The means are the blue diamonds and the standard deviations [how we quantify the variation] for each group is the green or orange colored bars around the diamonds.

Now, here is where the ANOVA comes in. The critical question is whether the two groups are different, on average (note that we can rarely say anything about the individual level).

If there are no group differences between the two groups, then the mean number of books read should be close to equal. If they are very different, then there might be reason to say that those who follow are somehow different from those who do not. We use the ANalyses Of VAriance to tell us.

## Specifically, we look at the averages of the groups and how much the scores vary around those averages.

In figure 1, you can see three potential outcomes of our book study which easily demonstrates the ANOVA.

In the first two potential outcomes, the averages are the same, 36 and 58. In the third potential outcome, the group means are about 25 and 75.  The question in all three potential situations is whether these averages are different; and whether they are depends upon how much variation there is within each group (the orange and green ‘error’ bars). In the first potential outcome, there is much variation within each group and it is hard to tell is a person in the middle follows the blog or not. In the second potential outcome, there is less variation between the groups, and we can more confidently predict whether a person is in the blog or not based upon how many books they read. Notice that there does not necessarily need to be ‘a lot’ of distance between the groups, if the variation within the groups is small (Potential outcome 2). Also recognize that even if there is a Lot of variation between the groups, if the group differences are large enough, it still may be considered ‘statistically significantly’ different (Potential outcome 3).

### The keys in ANOVA are how much the two groups differ, and how much the individuals within each group differ. If there are large differences within each group, it makes it harder to say that the groups are ‘statistically’ different because there is more overlap between groups (given the same mean difference; see Figure 1).

For an excellent mathematical treatment of the ANOVA, please see here.

And that is how we can explain ANOVA as simply as possible.