[R] anova planned comparisons/contrasts

Tue Nov 27 00:26:18 CET 2007

Others have shown some approaches that work well for after you fit the
model.  Here is another approach starting with the model fit itself:

tmp <- c("control", "glucose", "fructose", 
	                  "gluc+fruct", "sucrose")

treatment <- factor(rep( tmp, each=10 ), levels=tmp)

length <- c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
            57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
            58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
            58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
            62, 66, 65, 63, 64, 62, 65, 65, 62, 67)

cm1 <- rbind( int=1/5, cntrl = c(4, -1,-1,-1,-1)/4,
	plus=c( 0, -1, -1, 3, -1 )/3,
	sug1=c( 0, -1, 1, 0, 0),
	sug2=c( 0, -1, -1, 0, 2)/2 )

cm2 <- zapsmall(solve(cm1))

contrasts(treatment) <- cm2[,-1]

treatment

sugars <- data.frame( length=length, treatment=treatment )

fit <- aov( length ~ treatment, data=sugars )
summary.lm(fit)
summary(fit)
summary(fit, split=list(treatment=list(
  'Control vs. Sugars'=1, 'Gluc+Fruc vs. sugars'=2, 'Sugars'=3:4)))

Is this more what you were looking for?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Tyler Smith
> Sent: Thursday, November 22, 2007 12:38 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] anova planned comparisons/contrasts
> 
> Hi,
> 
> I'm trying to figure out how anova works in R by translating 
> the examples in Sokal And Rohlf's (1995 3rd edition) 
> Biometry. I've hit a snag with planned comparisons, their box 
> 9.4 and section 9.6. It's a basic anova design:
> 
> treatment <- factor(rep(c("control", "glucose", "fructose", 
> 	                  "gluc+fruct", "sucrose"), each = 10))
> 
> length <- c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
>             57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
>             58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
>             58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
>             62, 66, 65, 63, 64, 62, 65, 65, 62, 67)
> 
> sugars <- data.frame(treatment, length)
> 
> The basic analysis is easy enough:
> 
> anova(lm(length ~ treatment, sugars))
> 
> S&R proceed to calculate planned comparisons between control 
> and all other groups, between gluc+fruct and the remaining 
> sugars, and among the three pure sugars. I can get the first 
> two comparisons using the
> contrasts() function:
> 
> contrasts(sugars$treatment) <- matrix(c(-4, 1, 1,  1,  1,
>                                         0, -1, 3, -1, -1), 5, 2) 
> 
> summary(lm(length ~ treatment, sugars))
> 
> The results appear to be the same, excepting that the book 
> calculates an F value, whereas R produces an equivalent t 
> value. However, I can't figure out how to calculate a 
> contrast among three groups. For those without access to 
> Sokal and Rohlf, I've written an R function that performs the 
> analysis they use, copied below. Is there a better way to do 
> this in R?
> 
> Also, I don't know how to interpret the Estimate and Std. 
> Error columns from the summary of the lm with contrasts. It's 
> clear the intercept in this case is the grand mean, but what 
> are the other values? They appear to be the difference 
> between one of the contrast groups and the grand mean -- 
> wouldn't an estimate of the differences between the 
> contrasted groups be more appropriate, or am I (likely) 
> misinterpreting this?
> 
> Thanks!
> 
> Tyler
> 
> MyContrast <- function(Var, Group, G1, G2, G3=NULL) {
>   ## Var == data vector, Group == factor
>   ## G1, G2, G3 == character vectors of factor levels to contrast
>   
>   nG1 = sum(Group %in% G1)
>   nG2 = sum(Group %in% G2)
>   GrandMean = mean(Var[Group %in% c(G1, G2, G3)])
>   G1Mean = mean(Var[Group %in% G1])
>   G2Mean = mean(Var[Group %in% G2])
> 
>   if(is.null(G3))
>     MScontr = nG1 * ((G1Mean - GrandMean)^2) + 
>     	      nG2 * ((G2Mean - GrandMean)^2)
>   else {
>       nG3 = sum(Group %in% G3)
>       G3Mean = mean(Var[Group %in% G3])
>       MScontr = (nG1 * ((G1Mean - GrandMean)^2) + 
>               	 nG2 * ((G2Mean - GrandMean)^2) + 
> 	    	 nG3 * ((G3Mean - GrandMean)^2))/2
>     }
>   
>   An <- anova(lm(Var ~ Group))
>   MSwithin = An[3]['Residuals',]
>   DegF = An$Df[length(An$Df)]
>   Fval = MScontr / MSwithin
>   Pval = 1 - pf(Fval, 1, DegF)
>   
>   return (list(MS_contrasts = MScontr, MS_within = MSwithin, 
>   	       F_value = Fval, P_value = Pval)) }
> 
> ## The first two contrasts produce the same (+/- rounding 
> error) ## p-values as obtained using contrasts() 
> MyContrast(sugars$length, sugars$treatment, 'control', 
> 			  c("fructose", "gluc+fruct", "glucose",
> 			  "sucrose"))
> MyContrast(sugars$length, sugars$treatment, 'gluc+fruct',
> 			  c("fructose", "glucose", "sucrose"))
> 
> ## How do you do this in standard R?
> MyContrast(sugars$length, sugars$treatment, "fructose", "glucose",
> 			  "sucrose")
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>