[R] Lack of independence in anova()
Phillip Good
pigood at verizon.net
Mon Jul 4 18:45:54 CEST 2005
To my surprise, the R functions I employed did NOT verify the independence
property. I've no quarrel with the theory--it's the R functions that worry
me.
pG
-----Original Message-----
From: Douglas Bates [mailto:dmbates at gmail.com]
Sent: Monday, July 04, 2005 9:13 AM
To: pigood at verizon.net; r-help
Subject: Re: [R] Lack of independence in anova()
I have already had email exchanges off-list with Phillip Good pointing
out that the independence property that he wishes to establish by
simulation is a consequence of orthogonality of the column span of the
row contrasts and the interaction contrasts. If you set the contrasts
option to a set of orthogonal contrasts such as contr.helmert or
contr.sum, which has no effect on the results of the anova, this is
easily established.
> build
function(size, v = rnorm(sum(size))) {
col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
+size[7]+size[8]))
return(data.frame(c=factor(col), r=factor(row),yield=v))
}
> size
[1] 3 3 3 0 3 3 3 0
> set.seed(1234321)
> vv <- build(size)
> vv
c r yield
1 0 0 1.2369081
2 0 0 1.5616230
3 0 0 1.8396185
4 1 0 0.3173245
5 1 0 1.0715115
6 1 0 -1.1459955
7 2 0 0.2488894
8 2 0 0.1158625
9 2 0 2.6200816
10 0 1 1.2624048
11 0 1 -0.9862578
12 0 1 -0.3235653
13 1 1 0.2039706
14 1 1 -1.4574576
15 1 1 1.9158713
16 2 1 -2.0333909
17 2 1 1.0050272
18 2 1 0.6789184
> options(contrasts = c('contr.helmert', 'contr.poly'))
> crossprod(model.matrix(~c*r, vv))
(Intercept) c1 c2 r1 c1:r1 c2:r1
(Intercept) 18 0 0 0 0 0
c1 0 12 0 0 0 0
c2 0 0 36 0 0 0
r1 0 0 0 18 0 0
c1:r1 0 0 0 0 12 0
c2:r1 0 0 0 0 0 36
On 7/4/05, Phillip Good <pigood at verizon.net> wrote:
> If the observations are normally distributed and the 2xk design is
> balanced, theory requires that the tests for interaction and row effects
be
> independent. In my program, appended below, this would translate to cntT
> (approx)= cntR*cntI/N if all R routines were functioning correctly. They
> aren't.
>
> sim2=function(size,N,p){
> cntR=0
> cntC=0
> cntI=0
> cntT=0
> cntP=0
> for(i in 1:N){
> #generate data
> v=gendata(size)
> #analyze after build(ing) design containing data
> lm.out=lm(yield~c*r,build(size,v))
> av.out=anova(lm.out)
> #if column effect is significant, increment cntC
> if (av.out[[5]][1]<=p)cntC=cntC+1
> #if row effect is significant, increment cntR
> if (av.out[[5]][2]<=p){
> cntR=cntR+1
> tmp = 1
> }
> else tmp =0
> if (av.out[[5]][3]<=p){
> #if interaction is significant, increment cntI
> cntI=cntI+1
> #if both interaction and row effect are significant, increment
cntT
> cntT=cntT + tmp
> }
> }
> list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT)
> }
>
> build=function(size,v){
> #size is a vector containing the sample sizes
> col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
> rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
> row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
> +size[7]+size[8]))
> return(data.frame(c=factor(col), r=factor(row),yield=v))
> }
>
> gendata=function(size){
> ssize=sum(size);
> return (rnorm(ssize))
> }
>
> #Example
> size=c(3,3,3,0,3,3,3,0)
> sim2(size,10000,10,.16)
>
>
>
> Phillip Good
> Huntington Beach CA
>
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list