I'm sure I've got this completely wrapped round my head, but I just can't figure it out.

The t-test compares two normal distributions using the Z distribution. That's why there's an assumption of normality in the DATA.

ANOVA is equivalent to linear regression with dummy variables, and uses sums of squares, just like OLS. That's why there's an assumption of normality of RESIDUALS.

It's taken me several years, but I think I've finally grasped those basic facts. So why is it that the t-test is equivalent to ANOVA with two groups? How can they be equivalent if they don't even assume the same things about the data?

The t-test with two groups assumes that each group is normally distributed with the same variance (although the means may differ under the alternative hypothesis). That is equivalent to a regression with a dummy variable as the regression allows the mean of each group to differ but not the variance. Hence the residuals (equal to the data with the group means subtracted) have the same distribution --- that is, they are normally distributed with zero mean.

A t-test with unequal variances is not equivalent to a one-way ANOVA.

I totally agree with Rob's answer, but let me put it another way (using wikipedia):

Assumptions ANOVA:

- Independence of cases – this is an assumption of the model that simplifies the statistical analysis.
- Normality – the distributions of the residuals are normal.
- Equality (or "homogeneity") of variances, called homoscedasticity

Assumptions t-test:

- Each of the two populations being compared should follow a normal distribution ...
- ... the two populations being compared should have the same variance ...
- The data used to carry out the test should be sampled independently from the two populations being compared.

Hence, I would refute the question, as they obviously have the same assumptions (although in a different order :-) ).

The t-test simply a special case of the F-test where only two groups are being compared. The result of either will be exactly the same in terms of the p-value and there is a simple relationship between the F and t statistics as well. F = t^2. The two tests are algebraically equivalent and their assumptions are the same.

In fact, these equivalences extend to the whole class of ANOVAs, t-tests, and linear regression models. The t-test is a special case of ANOVA. ANOVA is a special case of regression. All of these procedures are subsumed under the General Linear Model and share the same assumptions.

- Independence of observations.
- Normality of residuals = normality in each group in the special case.
- Equal of variances of residuals = equal variances across groups in the special case.

You might think of it as normality in the data, but you are checking for normality in each group--which is actually the same as checking for normality in the residuals when the only predictor in the model is an indicator of group. Likewise with equal variances.

Just as an aside, R does not have seperate routines for ANOVA. The anova functions in R are just wrappers to the lm() function--the same thing that is used to fit linear regression models--packaged a little differently to provide what is typically found in an ANOVA summary rather than a regression summary.

One obvious point that everyone's overlooked: With ANOVA you're testing the null that the mean is identical regardless of the values of your explanatory variables. With a T-Test you can also test the one-sided case, that the mean is specifically greater given one value of your explanatory variable than given the other.

I will prefer to use t-test for comparing two groups and will use ANOVA for more than 2 groups, due to reasons. Important reason being the assumption of equal variances.