[R] Anova - interpretation of the interaction term

Bill.Venables@csiro.au Bill.Venables at csiro.au
Sat Apr 23 04:57:38 CEST 2005



: -----Original Message-----
: From: r-help-bounces at stat.math.ethz.ch 
: [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
: michael watson (IAH-C)
: Sent: Friday, 22 April 2005 7:47 PM
: To: r-help at stat.math.ethz.ch
: Subject: [R] Anova - interpretation of the interaction term
: 
: 
: Hi
: 
: So carrying on my use of analysis of variance to check for the effects
: of two factors.  It's made simpler by the fact that both my 
: factors have
: only two levels each, creating four unique groups.
: 
: I have a highly significant interaction term.  In the context of the
: experiment, this makes sense.  I can visualise the data 
: graphically, and
: sure enough I can see that both factors have different effects on the
: data DEPENDING on what the value of the other factor is.  
: 
: I explain this all to my colleague - and she asks "but which ones are
: different?"  This is best illustrated with an example.  We have either
: infected | uninfected, and vaccinated | unvaccinated (the two 
: factors).
: We're measuring expression of a gene.  Graphically, in the infected
: group, vaccination makes expression go up.  In the uninfected group,
: vaccination makes expression go down.  In both the vaccinated and
: unvaccinated groups, infection makes expression go down, but it goes
: down further in unvaccinated than it does in vaccinated.
: 
: So from a statistical point of view, I can see exactly why the
: interaction term is significant, but what my colleage wants to know is
: that WITHIN the vaccinated group, does infection decrease expression
: significantly?  And within the unvaccinated group, does infection
: decrease expression significantly?  Etc etc etc  Can I get this
: information from the output of the ANOVA, or do I carry out a separate
: test on e.g. just the vaccinated group? (seems a cop out to me)

No, you can't get this kind of specific information out of the anova
table and yes, anova tables *are* a bit of a cop out.  (I sometimes 
think they should only be allowed between consenting adults in private.)

What you are asking for is a non-standard, but perfectly reasonable
partition of the degrees of freedom between the classes of a single
factor with four levels got by pairing up the levels of vaccination and
innoculation.  Of course you can get this information, but you have to
do a bit of work for it.  

Before I give the example which I don't expect too many people to read
entirely, let me issue a little challenge, namely to write tools to 
automate a generalized version of the procedure below.

Here is the example, (drawing from the explanation given in a certain 
book, to wit chapter 6):

> dat <- expand.grid(vac = c("N", "Y"), inf = c("-", "+"))
> dat <- rbind(dat, dat)  # to get a bit of replication

Now we make a 4-level factor from vaccination and infection and
generate a bit of data with an infection effect built into it:

> dat <- transform(dat, vac_inf = vac:inf, 
				     y = as.numeric(inf) + rnorm(8))
> dat
   vac inf vac_inf          y
1    N   -     N:-  0.2285096
2    Y   -     Y:-  1.3504610
3    N   +     N:+  2.5581254
4    Y   +     Y:+  2.9208313
11   N   -     N:- -0.8403039
21   Y   -     Y:- -0.2440574
31   N   +     N:+  2.4844055
41   Y   +     Y:+  2.0772671

Now give the joint factor contrasts reflecting the partition
we want to effect:

> levels(dat$vac_inf)
[1] "N:-" "N:+" "Y:-" "Y:+"
> m <- matrix(scan(), ncol = 4, byrow = T)
1: -1  1  0  0
5:  0  0 -1  1
9:  1  1 -1 -1
13: 
Read 12 items
> fractions(ginv(m))  ## just to see what it looks like
     [,1] [,2] [,3]
[1,] -1/2    0  1/4
[2,]  1/2    0  1/4
[3,]    0 -1/2 -1/4
[4,]    0  1/2 -1/4

Note that we could have simply used t(m), but this
is not always possible.  Associate these contrasts, fit
and analyse:

> contrasts(dat$vac_inf) <- ginv(m)
> gm <- aov(y ~ vac_inf, dat)
> summary(gm)
            Df  Sum Sq Mean Sq F value  Pr(>F)
vac_inf      3 12.1294  4.0431   7.348 0.04190
Residuals    4  2.2009  0.5502

This doesn't tell us too much other than there are differences,
probably.  Now to specify the partition:

                
> summary(gm, 
	split = list(vac_inf = list("- vs +|N" = 1, 
						    "- vs +|Y" = 2)))
                    Df  Sum Sq Mean Sq F value  Pr(>F)
vac_inf              3 12.1294  4.0431  7.3480 0.04190
  vac_inf: - vs +|N  1  7.9928  7.9928 14.5262 0.01892
  vac_inf: - vs +|Y  1  3.7863  3.7863  6.8813 0.05860
Residuals            4  2.2009  0.5502                


As expected, infection changes the mean for both vaccinated and
unvaccinated, as we arranged when we generated the data.

: 
: Many thanks, and sorry, but it's Friday.
: 
: Mick
: 
: ______________________________________________
: R-help at stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! 
: http://www.R-project.org/posting-guide.html
:




More information about the R-help mailing list