[R] ss's are incorrect from aov with multiple factors (EXAMPLE!)
Peter Dalgaard BSA
p.dalgaard at biostat.ku.dk
Sat Jul 12 12:37:20 CEST 2003
John Christie <jc at or.psychology.dal.ca> writes:
> OK, I do see that there is a problem in my first email. I have
> noticed this with repeated measures designs. Otherwise, of course,
> there is only one error term for all factors. But, with repeated
> measures designs this is not the case.
>
>
> On Friday, July 11, 2003, at 10:00 PM, Spencer Graves wrote:
>
> > People tend to get the quickest and most helpful responses
> > when they provide a toy problem that produces what they think are
> > anamolous results
>
> here is an admittedly poor example with factors a and b and s subjects.
>
> a<-factor(rep(c(0,1),12))
> b<-factor(rep(c(0,0,1,1),6))
> s<- factor(rep(1:6,each=4))
> x <- c(49.5, 62.8, 46.8, 57, 59.8, 58.5, 55.5, 56, 62.8, 55.8, 69.5,
> 55, 62, 48.8, 45.5, 44.2, 52, 51.5, 49.8, 48.8, 57.2, 59, 53.2, 56)
>
> now
>
> summary(aov(x~a*b+Error(s/(a*b))))
>
> gives a table of results
> but, if one wanted to generate a confidence interval for factor b one
> needs to reanalyze the results thusly
>
> ss<-aggregate(x, list(s=s, b=b), mean)
> summary(aov(x~b+Error(s/b), data=ss))
>
> This yields an error term half the size as that reported for b in the
> combined ANOVA. I would suggest that the way the ss and MSE are
> reported is erroneous since they should be able to be used to directly
> calculate confidence intervals or make mean comparisons without having
> to collapse and reanalyze for every effect.
>
> Furthermore, I am guessing that this problem makes it impossible to
> get a correct average MSE that includes the interaction term. OK, far
> from impossible, but very difficult to verify that the term is correct.
>
> NOTE F for b is the same in the first ANOVA and the second.
As far as I can tell, yes, you get different results if you analyse
the original data than if you collapse by taking means over the a
factor, and no, you should not expect otherwise. The various SS in the
full analysis are distance measures in 24-dim space, whereas in the
aggregated analysis you get a distance in 12-space. The relation is
that every value entering in the b and s:b terms will be duplicated in
the former, hence the SS is twice as big.
This is standard procedure, and R does the same as e.g. Genstat in
this respect. It is also necessary to ensure that the residual MS are
comparable, e.g. that you can test for a significant s:b random effect
by comparing with the residual MS to that of the s:a:b stratum.
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list