[R] type III ANOVA for a nested linear model
Peter Dalgaard
P.Dalgaard at biostat.ku.dk
Wed Jul 11 17:03:13 CEST 2007
Carsten Jaeger wrote:
> Hello Peter,
>
> thanks for your help. I'm quite sure that I specified the right model.
> Factor C is indeed nested within factor A. I think you were confused by
> the numbering of C (1..11), and it is easier to understand when I code
> it as you suggested (1,2,3 within each level of A, as in mydata1 [see
> below]). However, it does not matter which numbering I choose for
> carrying the analysis, as
>
> anova(lm(resp ~ A * B + (C %in% A), mydata))
> anova(lm(resp ~ A * B + (C %in% A), mydata1))
>
> both give the same results (as at least I had expected because of the
> nesting).
>
> However, I found that Anova() from the car package only accepts the
> second version. So,
>
> Anova(lm(resp ~ A * B + (C %in% A), mydata)) does not work (giving an
> error) but
> Anova(lm(resp ~ A * B + (C %in% A), mydata1)) does.
>
> This behaviour is rather confusing, or is there anything I'm missing?
>
>
You're not listening to what I told you!
A term C %in% A (or A/C) is not a _specification_ that C is nested in
A, it is a _directive_ to include the terms A and C:A. Now, C:A involves
a term for each combination of A and C, of which many are empty if C is
strictly coarser than A. This may well be what is confusing Anova().
In fact, with this (c(1:3,6:11)) coding of C, A:C is completely
equivalent to C, but if you look at summary(lm(....)) you will see a lot
of NA coefficients in the A:C case. If you use resp ~ A*B+C, then you
still get a couple of missing coefficients in the C terms because of
collinearity with the A terms. (Notice that this is one case where the
order inside the model formula will matter; C+A*B is not the same.)
Whether you'd want C as a random factor is a different matter. It is
often the natural model if C is "subject" and A is "group". Let's assume
that this is the case: In an ordinary linear model, you can test whether
you can remove C (or A:C) , which implies that all subjects in the same
group have the same level of the response. In your case, the hypothesis
is accepted, but the F statistic is around 3 (on (6, 6) DF) , which
suggests that there might be some variation of subjects within groups.
In a mixed-effects model, you assume that this variation exists and
therefore you use the SSD for C as the denominator when testing A, which
is arguably safer than pooling it with the somewhat smaller residual SSD.
> Thanks for your help again,
>
> Carsten
>
>
> R> mydata
> A B C resp
> 1 1 1 1 34.12
> 2 1 1 2 32.45
> 3 1 1 3 44.55
> 4 1 2 1 20.88
> 5 1 2 2 22.32
> 6 1 2 3 27.71
> 7 2 1 6 38.20
> 8 2 1 7 31.62
> 9 2 1 8 38.71
> 10 2 2 6 18.93
> 11 2 2 7 20.57
> 12 2 2 8 31.55
> 13 3 1 9 40.81
> 14 3 1 10 42.23
> 15 3 1 11 41.26
> 16 3 2 9 28.41
> 17 3 2 10 24.07
> 18 3 2 11 21.16
>
> R> mydata1
> A B C resp
> 1 1 1 1 34.12
> 2 1 1 2 32.45
> 3 1 1 3 44.55
> 4 1 2 1 20.88
> 5 1 2 2 22.32
> 6 1 2 3 27.71
> 7 2 1 1 38.20
> 8 2 1 2 31.62
> 9 2 1 3 38.71
> 10 2 2 1 18.93
> 11 2 2 2 20.57
> 12 2 2 3 31.55
> 13 3 1 1 40.81
> 14 3 1 2 42.23
> 15 3 1 3 41.26
> 16 3 2 1 28.41
> 17 3 2 2 24.07
> 18 3 2 3 21.16
>
>
> On Tue, 2007-07-10 at 13:54 +0200, Peter Dalgaard wrote:
>
>> Carsten Jaeger wrote:
>>
>>> Hello,
>>>
>>> is it possible to obtain type III sums of squares for a nested model as
>>> in the following:
>>>
>>> lmod <- lm(resp ~ A * B + (C %in% A), mydata))
>>>
>>> I have tried
>>>
>>> library(car)
>>> Anova(lmod, type="III")
>>>
>>> but this gives me an error (and I also understand from the documentation
>>> of Anova as well as from a previous request
>>> (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/64477.html) that it is
>>> not possible to specify nested models with car's Anova).
>>>
>>> anova(lmod) works, of course.
>>>
>>> My data (given below) is balanced so I expect the results to be similar
>>> for both type I and type III sums of squares. But are they *exactly* the
>>> same? The editor of the journal which I'm sending my manuscript to
>>> requests what he calls "conventional" type III tests and I'm not sure if
>>>
>>> can convince him to accept my type I analysis.
>>>
>> In balanced designs, type I-IV SSD's are all identical. However, I don't think the model does what I think you think it does.
>>
>> Notice that "nesting" is used with two diferent meanings, in R it would be that the codings of C only makes sense within levels of A - e.g. if they were numbered 1:3 within each group, but with C==1 when A==1 having nothing to do with C==1 when A==2. SAS does something. er. else...
>>
>> What I think you want is a model where C is a random terms so that main effects of A can be tested, like in
>>
>>
>>> summary(aov(resp ~ A * B + Error(C), dd))
>>>
>> Error: C
>> Df Sum Sq Mean Sq F value Pr(>F)
>> A 2 33.123 16.562 0.4981 0.6308
>> Residuals 6 199.501 33.250
>>
>> Error: Within
>> Df Sum Sq Mean Sq F value Pr(>F)
>> B 1 915.21 915.21 83.7846 9.57e-05 ***
>> A:B 2 16.13 8.07 0.7384 0.5168
>> Residuals 6 65.54 10.92
>> ---
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>>
>> (This is essentially the same structure as Martin Bleichner had earlier today, also @web.de. What is this? an epidemic? ;-))
>>
>>
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list