[R] anova planned comparisons/contrasts

Thu Nov 22 20:38:12 CET 2007

Hi,

I'm trying to figure out how anova works in R by translating the
examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a
snag with planned comparisons, their box 9.4 and section 9.6. It's a
basic anova design:

treatment <- factor(rep(c("control", "glucose", "fructose", 
	                  "gluc+fruct", "sucrose"), each = 10))

length <- c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
            57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
            58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
            58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
            62, 66, 65, 63, 64, 62, 65, 65, 62, 67)

sugars <- data.frame(treatment, length)

The basic analysis is easy enough:

anova(lm(length ~ treatment, sugars))

S&R proceed to calculate planned comparisons between control and all
other groups, between gluc+fruct and the remaining sugars, and among
the three pure sugars. I can get the first two comparisons using the
contrasts() function:

contrasts(sugars$treatment) <- matrix(c(-4, 1, 1,  1,  1,
                                        0, -1, 3, -1, -1), 5, 2) 

summary(lm(length ~ treatment, sugars))

The results appear to be the same, excepting that the book calculates
an F value, whereas R produces an equivalent t value. However, I can't
figure out how to calculate a contrast among three groups. For those
without access to Sokal and Rohlf, I've written an R function that
performs the analysis they use, copied below. Is there a better way to
do this in R?

Also, I don't know how to interpret the Estimate and Std. Error
columns from the summary of the lm with contrasts. It's clear the
intercept in this case is the grand mean, but what are the other
values? They appear to be the difference between one of the contrast
groups and the grand mean -- wouldn't an estimate of the differences
between the contrasted groups be more appropriate, or am I (likely)
misinterpreting this?

Thanks!

Tyler

MyContrast <- function(Var, Group, G1, G2, G3=NULL) {
  ## Var == data vector, Group == factor
  ## G1, G2, G3 == character vectors of factor levels to contrast

  nG1 = sum(Group %in% G1)
  nG2 = sum(Group %in% G2)
  GrandMean = mean(Var[Group %in% c(G1, G2, G3)]) 
  G1Mean = mean(Var[Group %in% G1])
  G2Mean = mean(Var[Group %in% G2])

  if(is.null(G3))
    MScontr = nG1 * ((G1Mean - GrandMean)^2) + 
    	      nG2 * ((G2Mean - GrandMean)^2)
  else {
      nG3 = sum(Group %in% G3)
      G3Mean = mean(Var[Group %in% G3])
      MScontr = (nG1 * ((G1Mean - GrandMean)^2) + 
              	 nG2 * ((G2Mean - GrandMean)^2) + 
	    	 nG3 * ((G3Mean - GrandMean)^2))/2
    }

  An <- anova(lm(Var ~ Group))
  MSwithin = An[3]['Residuals',]
  DegF = An$Df[length(An$Df)]
  Fval = MScontr / MSwithin
  Pval = 1 - pf(Fval, 1, DegF)

  return (list(MS_contrasts = MScontr, MS_within = MSwithin, 
  	       F_value = Fval, P_value = Pval))
}

## The first two contrasts produce the same (+/- rounding error)
## p-values as obtained using contrasts()
MyContrast(sugars$length, sugars$treatment, 'control', 
			  c("fructose", "gluc+fruct", "glucose",
			  "sucrose"))
MyContrast(sugars$length, sugars$treatment, 'gluc+fruct',
			  c("fructose", "glucose", "sucrose"))

## How do you do this in standard R?
MyContrast(sugars$length, sugars$treatment, "fructose", "glucose",
			  "sucrose")