# [R] Strange degrees of freedom and SS from car::Anova with type II SS?

Ramon Diaz-Uriarte rdi@z02 @ending from gm@il@com
Thu Dec 6 01:33:52 CET 2018

```Dear All,

I do not understand the degrees of freedom returned by car::Anova under
some models. They seem to be too many (e.g., numerical variables getting
more than 1 df, factors getting more df than levels there are).

This is a reproducible example:

library(car)
data(Prestige)

## Make sure no issues from NAs in comparisons of SS below
prestige_nona <- na.omit(Prestige)

Anova(lm(prestige ~ women * type * income * education,
data = prestige_nona))

## Notice how women, a numerical variable, has 3 df
## and type (factor with 3 levels) has 4 df.

## In contrast this seems to get the df right:
Anova(lm(prestige ~ women * type * income * education,
data = prestige_nona), type = "III")

## And also gives the df I'd expect
anova(lm(prestige ~ women * type * income * education,
data = prestige_nona))

## Type II SS for women in the above model I do not understand either.
m_1 <- lm(prestige ~ type * income * education, data = prestige_nona)
m_2 <- lm(prestige ~ type * income * education + women, data = prestige_nona)
## Does not match women SS
sum(residuals(m_1)^2) - sum(residuals(m_2)^2)

## See [1] below for examples where they match.

Looking at the code, I do not understand what the call from
linearHypothesis returns here (specially compared to other models), and the
problem seems to be in the return from ConjComp, possibly due to the the
vcov of the model? (But this is over my head).

I understand this is not a reasonable model to fit, and there are possibly
serious collinearity problems. But I was surprised by the dfs in the
absence of any warning of something gone wrong. So I think there is
something very basic I do not understand.

Thanks,

R.

[1] In contrast, in other models I see what I'd expect. For example:

## 1 df for women, 2 for type
Anova(lm(prestige ~ type * income * women, data = prestige_nona))
m_1 <- lm(prestige ~ type * income, data = prestige_nona)
m_2 <- lm(prestige ~ type * income + women, data = prestige_nona)
## Type II SS for women
sum(residuals(m_1)^2) - sum(residuals(m_2)^2)

## 1 df for women, income, education
Anova(lm(prestige ~ education * income * women, data = prestige_nona))
m_1 <- lm(prestige ~ education * income, data = prestige_nona)
m_2 <- lm(prestige ~ education * income + women, data = prestige_nona)
## Type II SS for women
sum(residuals(m_1)^2) - sum(residuals(m_2)^2)

--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Arzobispo Morcillo, 4