[R] Lack of independence in anova()

Mon Jul 4 18:45:54 CEST 2005

To my surprise, the R functions I employed did NOT verify the independence
property.  I've no quarrel with the theory--it's the R functions that worry
me.

pG

-----Original Message-----
From: Douglas Bates [mailto:dmbates at gmail.com]
Sent: Monday, July 04, 2005 9:13 AM
To: pigood at verizon.net; r-help
Subject: Re: [R] Lack of independence in anova()

I have already had email exchanges off-list with Phillip Good pointing
out that the independence property that he wishes to establish by
simulation is a consequence of orthogonality of the column span of the
row contrasts and the interaction contrasts.  If you set the contrasts
option to a set of orthogonal contrasts such as contr.helmert or
contr.sum, which has no effect on the results of the anova, this is
easily established.

> build
function(size, v = rnorm(sum(size))) {
    col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
    rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
    row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
    +size[7]+size[8]))
    return(data.frame(c=factor(col), r=factor(row),yield=v))
}
> size
[1] 3 3 3 0 3 3 3 0
> set.seed(1234321)
> vv <- build(size)
> vv
   c r      yield
1  0 0  1.2369081
2  0 0  1.5616230
3  0 0  1.8396185
4  1 0  0.3173245
5  1 0  1.0715115
6  1 0 -1.1459955
7  2 0  0.2488894
8  2 0  0.1158625
9  2 0  2.6200816
10 0 1  1.2624048
11 0 1 -0.9862578
12 0 1 -0.3235653
13 1 1  0.2039706
14 1 1 -1.4574576
15 1 1  1.9158713
16 2 1 -2.0333909
17 2 1  1.0050272
18 2 1  0.6789184
> options(contrasts = c('contr.helmert', 'contr.poly'))
> crossprod(model.matrix(~c*r, vv))
            (Intercept) c1 c2 r1 c1:r1 c2:r1
(Intercept)          18  0  0  0     0     0
c1                    0 12  0  0     0     0
c2                    0  0 36  0     0     0
r1                    0  0  0 18     0     0
c1:r1                 0  0  0  0    12     0
c2:r1                 0  0  0  0     0    36

On 7/4/05, Phillip Good <pigood at verizon.net> wrote:
>  If the observations are normally distributed and the 2xk design is
> balanced,  theory requires that the tests for interaction and row effects
be
> independent.  In my program, appended below, this would translate to cntT
> (approx)= cntR*cntI/N if all R routines were functioning correctly.  They
> aren't.
>
> sim2=function(size,N,p){
>   cntR=0
>   cntC=0
>   cntI=0
>   cntT=0
>   cntP=0
>   for(i in 1:N){
>     #generate data
>      v=gendata(size)
>     #analyze after build(ing) design containing data
>      lm.out=lm(yield~c*r,build(size,v))
>      av.out=anova(lm.out)
>     #if column effect is significant, increment cntC
>      if (av.out[[5]][1]<=p)cntC=cntC+1
>    #if row effect is significant, increment cntR
>      if (av.out[[5]][2]<=p){
>            cntR=cntR+1
>            tmp = 1
>            }
>      else tmp =0
>      if (av.out[[5]][3]<=p){
>            #if interaction is significant, increment cntI
>             cntI=cntI+1
>         #if both interaction and row effect are significant, increment
cntT
>             cntT=cntT + tmp
>             }
>      }
>     list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT)
> }
>
> build=function(size,v){
> #size is a vector containing the sample sizes
> col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
> rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
> row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
> +size[7]+size[8]))
> return(data.frame(c=factor(col), r=factor(row),yield=v))
> }
>
> gendata=function(size){
>   ssize=sum(size);
>   return (rnorm(ssize))
> }
>
> #Example
>  size=c(3,3,3,0,3,3,3,0)
>  sim2(size,10000,10,.16)
>
>
>
> Phillip Good
> Huntington Beach CA
>
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
>