# [R] Strange degrees of freedom and SS from car::Anova with type II SS?

Fox, John jfox @ending from mcm@@ter@c@
Thu Dec 6 03:43:06 CET 2018

```Dear R.,

The problem you constructed is too ill-conditioned for the method that Anova() uses to compute type-II sums of squares and the associated degrees of freedom, with an immense condition number of the coefficient covariance matrix:

> library(car)

> mod <- lm(prestige ~ women * type * income * education, data=Prestige)
> e <- eigen(vcov(mod))\$values
> max(e)/min(e)
[1] 2.776205e+17

Simply centering the numerical predictors reduces the condition number by a factor of 10^3, which allows Anova() to work, even though the problem is still extremely ill-conditioned:

> Prestige.c <- within(Prestige, {
+   income <- income - mean(income)
+   education <- education - mean(education)
+   women <- women - mean(women)
+ })
> mod.c <- lm(prestige ~ women * type * income * education, data=Prestige.c)
> e.c <- eigen(vcov(mod.c))\$values
> max(e)/min(e)
[1] 2.776205e+17

> Anova(mod.c)
Anova Table (Type II tests)

Response: prestige
Sum Sq Df F value    Pr(>F)
women                        167.29  1  4.9516 0.0291142 *
type                         744.30  2 11.0150 6.494e-05 ***
income                       789.00  1 23.3529 7.112e-06 ***
education                    699.54  1 20.7050 2.057e-05 ***
women:type                   140.32  2  2.0766 0.1326023
women:income                  33.14  1  0.9807 0.3252424
type:income                  653.40  2  9.6697 0.0001859 ***
women:education               30.36  1  0.8986 0.3462316
type:education                 0.72  2  0.0107 0.9893462
income:education               7.88  1  0.2331 0.6306681
women:type:income            136.80  2  2.0245 0.1393087
women:type:education         140.18  2  2.0745 0.1328633
women:income:education       100.42  1  2.9722 0.0888832 .
type:income:education         82.02  2  1.2138 0.3029069
women:type:income:education    2.05  2  0.0303 0.9701334
Residuals                   2500.16 74
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> mod.c.2 <- update(mod.c, . ~ . - women:type:income:education)
> sum(residuals(mod.c.2)^2) - sum(residuals(mod.c)^2)
[1] 2.049735

Beyond demonstrating that the algorithm that Anova() uses can be made to fail if the coefficient covariance matrix is sufficiently ill-conditioned problem, I’m not sure what the point of this is. I suppose that we could try to detect this condition, which falls in the small region between where lm() detects a singularity and the projections used by Anova() break down.

Best,
John

-------------------------------------------------
John Fox, Professor Emeritus
McMaster University
Web: http::/socserv.mcmaster.ca/jfox

> On Dec 5, 2018, at 7:33 PM, Ramon Diaz-Uriarte <rdiaz02 using gmail.com> wrote:
>
>
> Dear All,
>
> I do not understand the degrees of freedom returned by car::Anova under
> some models. They seem to be too many (e.g., numerical variables getting
> more than 1 df, factors getting more df than levels there are).
>
> This is a reproducible example:
>
> library(car)
> data(Prestige)
>
> ## Make sure no issues from NAs in comparisons of SS below
> prestige_nona <- na.omit(Prestige)
>
> Anova(lm(prestige ~ women * type * income * education,
>         data = prestige_nona))
>
> ## Notice how women, a numerical variable, has 3 df
> ## and type (factor with 3 levels) has 4 df.
>
>
> ## In contrast this seems to get the df right:
> Anova(lm(prestige ~ women * type * income * education,
>         data = prestige_nona), type = "III")
>
> ## And also gives the df I'd expect
> anova(lm(prestige ~ women * type * income * education,
>         data = prestige_nona))
>
>
>
> ## Type II SS for women in the above model I do not understand either.
> m_1 <- lm(prestige ~ type * income * education, data = prestige_nona)
> m_2 <- lm(prestige ~ type * income * education + women, data = prestige_nona)
> ## Does not match women SS
> sum(residuals(m_1)^2) - sum(residuals(m_2)^2)
>
> ## See [1] below for examples where they match.
>
>
> Looking at the code, I do not understand what the call from
> linearHypothesis returns here (specially compared to other models), and the
> problem seems to be in the return from ConjComp, possibly due to the the
> vcov of the model? (But this is over my head).
>
>
> I understand this is not a reasonable model to fit, and there are possibly
> serious collinearity problems. But I was surprised by the dfs in the
> absence of any warning of something gone wrong. So I think there is
> something very basic I do not understand.
>
>
>
> Thanks,
>
>
> R.
>
>
> [1] In contrast, in other models I see what I'd expect. For example:
>
> ## 1 df for women, 2 for type
> Anova(lm(prestige ~ type * income * women, data = prestige_nona))
> m_1 <- lm(prestige ~ type * income, data = prestige_nona)
> m_2 <- lm(prestige ~ type * income + women, data = prestige_nona)
> ## Type II SS for women
> sum(residuals(m_1)^2) - sum(residuals(m_2)^2)
>
> ## 1 df for women, income, education
> Anova(lm(prestige ~ education * income * women, data = prestige_nona))
> m_1 <- lm(prestige ~ education * income, data = prestige_nona)
> m_2 <- lm(prestige ~ education * income + women, data = prestige_nona)
> ## Type II SS for women
> sum(residuals(m_1)^2) - sum(residuals(m_2)^2)
>
>
>
>
> --
> Ramon Diaz-Uriarte
> Department of Biochemistry, Lab B-25
> Arzobispo Morcillo, 4
> Spain
>
> Phone: +34-91-497-2412
>
> Email: rdiaz02 using gmail.com
>       ramon.diaz using iib.uam.es
>
> http://ligarto.org/rdiaz
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help