[R] Checking for orthogonal contrasts

Sun Dec 5 19:42:32 CET 2010

On Dec 3, 2010, at 17:17 , S Ellison wrote:

> David,
> 
> Thanks for the comments.
> 
> I think, though, that I have found the answer to my own post.
> 
>> Question: How would one check, in R, that [contrasts .. are 
>> 'orthogonal in the row-basis of the model matrix'] for a particular
>> fitted linear model object?
> 
> ?lm illustrates the use of crossprod() for probing the orthogonality of
> a model matrix. If I understand correctly, the necessary condition is
> essentially that all between-term off-diagonal elements of crossprod(m)
> are zero if the contrasts are orthogonal, where 'term' refers to the
> collection of columns related to a single term in the model formula.
> 
> Example:
> 
> y<-rnorm(27)
> g <- gl(3, 9)
> h <- gl(3,3,27)
> 
> m1 <- model.matrix(y~g*h,  contrasts = list(g="contr.sum",
> h="contr.sum"))
> crossprod(m1)
> 
> #Compare with
> m2 <- model.matrix(y~g*h,  contrasts = list(g="contr.treatment",
> h="contr.treatment"))
> crossprod(m2)
> 	#Note the nonzero off-diagonal elements between, say, g and h or
> g, h and the various gi:hj elements
> 
> 
> That presumably implies that one could test a linear model explicitly
> for contrast orthogonality (and, implicitly, balanced design?) using
> something like
> 
> model.orthogonal.lm <- function(l) {
> 	#l is a linear model 
> 	m <- model.matrix(l)
> 	a <- attr(m, "assign")
> 	a.outer <- outer(a, a, FUN="!=")
> 	m.xprod <- crossprod(m) 
> 	all( m.xprod[a.outer] == 0 )
> }
> 
> l1 <- lm(y~g*h,  contrasts = list(g="contr.sum", h="contr.sum"))
> 
> l2 <- lm(y~g*h,  contrasts = list(g="contr.treatment",
> h="contr.treatment"))
> 
> model.orthogonal.lm(l1) 
> 	#TRUE
> 
> model.orthogonal.lm(l2)
> 	#FALSE
> 
> Not sure how it would work on balanced incomplete block designs,
> though. I'll have to try it.

You'll find that the block and treatment terms are NOT orthogonal. That's where all the stuff about "efficiency factors" and "recovery of interblock information" comes from.

> 
> Before I do, though, a) do I have the stats right? and b) this now
> seems so obvious that someone must already have done it somewhere... ?

a) basically, yes, I think you do

b) yes, many, but there is an amazing amount of sloppily thought out "folklore" going around, including the common misconception that somehow sum-to-zero contrasts are inherently better than the other types. What does seem to be the case is just that they have computational advantages in completely balanced designs, because they then imply orthogonality of COLUMNS of the design matrix. That in turn means that you can construct the sum of squares for each model term based on its own columns only. In unbalanced designs, they just tend to give incorrect results...

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com