[R] Understanding linear contrasts in Anova using R
David Howell
David.Howell at uvm.edu
Wed Sep 29 21:42:03 CEST 2010
#I am trying to understand how R fits models for contrasts in a
#simple one-way anova. This is an example, I am not stupid enough to want
#to simultaneously apply all of these contrasts to real data. With a few
#exceptions, the tests that I would compute by hand (or by other software)
#will give the same t or F statistics. It is the contrast estimates that
R produces
#that I can't seem to understand.
#
# In searching for answers to this problem, I found a great PowerPoint
slide (I think by John Fox).
# The slide pointed to the coefficients, said something like "these are
coeff. that no one could love," and
#then suggested looking at the means to understand where they came from.
I have stared
# and stared at his means and then my means, but can't find a relationship.
# The following code and output illustrates the problem.
# Various examples of Anova using R
dv <- c(1.28, 1.35, 3.31, 3.06, 2.59, 3.25, 2.98, 1.53, -2.68,
2.64, 1.26, 1.06,
-1.18, 0.15, 1.36, 2.61, 0.66, 1.32, 0.73, -1.06, 0.24,
0.27, 0.72, 2.28,
-0.41, -1.25, -1.33, -0.47, -0.60, -1.72, -1.74, -0.77, -0.41,
-1.20, -0.31, -0.74,
-0.45, 0.54, -0.98, 1.68, 2.25, -0.19, -0.90, 0.78, 0.05,
2.69, 0.15, 0.91,
2.01, 0.40, 2.34, -1.80, 5.00, 2.27, 6.47, 2.94, 0.47,
3.22, 0.01, -0.66)
group <- factor(rep(1:5, each = 12))
# Use treatment contrasts to compare each group to the first group.
options(contrasts = c("contr.treatment","contr.poly")) # The default
model2 <- lm(dv ~ group)
summary(model2)
# Summary table is the same--as it should be
# Intercept is Group 1 mean and other coeff. are deviations from that.
# This is what I would expect.
#summary(model1)
# Df Sum Sq Mean Sq F value Pr(>F)
# group 4 62.46 15.6151 6.9005 0.0001415 ***
# Residuals 55 124.46 2.2629
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 1.80250 0.43425 4.151 0.000116 ***
#group2 -1.12750 0.61412 -1.836 0.071772 .
#group3 -2.71500 0.61412 -4.421 4.67e-05 ***
#group4 -1.25833 0.61412 -2.049 0.045245 *
#group5 0.08667 0.61412 0.141 0.888288
# Use sum contrasts to compare each group against grand mean.
options(contrasts = c("contr.sum","contr.poly"))
model3 <- lm(dv ~ group)
summary(model3)
# Again, this is as expected. Intercept is grand mean and others are
deviatoions from that.
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.7997 0.1942 4.118 0.000130 ***
# group1 1.0028 0.3884 2.582 0.012519 *
# group2 -0.1247 0.3884 -0.321 0.749449
# group3 -1.7122 0.3884 -4.408 4.88e-05 ***
# group4 -0.2555 0.3884 -0.658 0.513399
#SO FAR, SO GOOD
# IF I wanted polynomial contrasts BY HAND I would use
# a(i) = -2 -1 0 1 2 for linear contrast (or some
linear function of this )
# Effect = Sum(a(j)M(i)) # where M = mean
# Effect(linear) = -2(1.805) -1(0.675) +0(-.912) +1(.544) +2(1.889) =
0.043
# SS(linear) = n*(Effect(linear)^2)/Sum((a(j)^2)) = 12(.043)/10 = .002
# F(linear) = SS(linear)/MS(error) = .002/2.263 = .001
# t(linear) = sqrt(.001) = .031
# To do this in R I would use
order.group <- ordered(group)
model4 <- lm(dv~order.group)
summary(model4)
# This gives:
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.79967 0.19420 4.118 0.000130 ***
# order.group.L 0.01344 0.43425 0.031 0.975422
# order.group.Q 2.13519 0.43425 4.917 8.32e-06 ***
# order.group.C 0.11015 0.43425 0.254 0.800703
# order.group^4 -0.79602 0.43425 -1.833 0.072202 .
# The t value for linear is same as I got (as are others) but I don't
understand
# the estimates. The intercept is the grand mean, but I don't see the
relationship
# of other estimates to that or to the ones I get by hand.
# My estimates are the sum of (coeff times means) i.e. 0 (intercept),
.0425, 7.989, .3483, -6.66
# and these are not a linear (or other nice pretty) function of est. from R.
# Just as a check, I forced intercept through zero with with deviation
scores or -1 in model.
# Now force intercept to 0 by using deviation scores
devdv <- dv-mean(dv)
model5 <- lm(devdv~order.group)
summary(model5)
#Same as above except intercept = 0
# Now do it by removing the intercept
model6 <- lm(dv~order.group -1)
summary(model6)
# Weird because coeff = cell means
# Estimate Std. Error t value Pr(>|t|)
# order.group1 1.8025 0.4342 4.151 0.000116 ***
# order.group2 0.6750 0.4342 1.554 0.125824
# order.group3 -0.9125 0.4342 -2.101 0.040208 *
# order.group4 0.5442 0.4342 1.253 0.215464
# order.group5 1.8892 0.4342 4.350 5.94e-05 ***
# BUT note that row labels would suggest that we were not solving for
polynomials,
# but the contrast matrix is made up of polynomial coefficients.
# contrasts(order.group)
# .L .Q .C ^4
#[1,] -6.324555e-01 0.5345225 -3.162278e-01 0.1195229
#[2,] -3.162278e-01 -0.2672612 6.324555e-01 -0.4780914
#[3,] -3.287978e-17 -0.5345225 1.595204e-16 0.7171372
#[4,] 3.162278e-01 -0.2672612 -6.324555e-01 -0.4780914
#[5,] 6.324555e-01 0.5345225 3.162278e-01 0.1195229
#So, when I use polynomials (either through contr.poly or by supplying a
matrix of
#coefficients, what do the estimates represent?
# My problem is not restricted to polynomials. If I try a set of
orthogonal linear contrasts
# on group means I get
#contrasts(group) <- cbind(c(1,1,0,-1,-1), c(1,-1,0,0,0),
c(-0,0,0,1,-1), c(1,1,4,-1,1))
#model3 <- lm(dv ~ group)
#summary(model3)
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 1.5335 0.2558 5.995 1.64e-07 ***
#group1 0.3168 0.2279 1.390 0.170185
#group2 0.5638 0.3071 1.836 0.071772 .
#group3 -1.2840 0.3369 -3.811 0.000352 ***
#group4 -0.6115 0.1387 -4.408 4.88e-05 ***
#These are not even close to what I would expect. By hand I would
compute the contrasts as
# .0442, 1.1275, 1.3450, and 8.5608 with different t values.
# Any help would be appreciated.
Thanks,
Dave Howell
More information about the R-help
mailing list