[R] hypothesis testing for rank-deficient linear models

Wed Jan 11 02:53:30 CET 2006

Take the following example:

    a <- rnorm(100)
    b <- trunc(3*runif(100))
    g <- factor(trunc(4*runif(100)),labels=c('A','B','C','D'))
    y <- rnorm(100) + a + (b+1) * (unclass(g)+2)
    m <- lm(y~a+b*g)
    summary(m)

Here b is discrete but not treated as a factor.  I am interested in
computing the effect of b within groups defined by the factor g.  One
way to do that is with the estimable() function from gmodels:

    ct <- cbind(1,contr.treatment(4))
    dimnames(ct)[[2]] <- c('b','b:gB','b:gC','b:gD')
    estimable(m, ct, conf.int=0.95)

Another way is the contrast() function from the Design library:

    dd <- datadist(a,b,g)
    options(datadist='dd')
    o <- ols(y~a+b*g)
    contrast(o, list(b=1,g=levels(g)), list(b=0,g=levels(g)))

Now take the situation where there are empty cells in the b x g table,
so that the model is rank deficient, i.e.:

    m <- lm(y~a+b*g, subset=(b==0 | g!='B'))
    summary(m)

Now there's trouble.  The estimable() function and Design library do
not seem to handle rank deficient models well.  Here is my current
best attempt to get what I want:

    my.coef <- function(n)
    {
        ct <- contr.treatment(levels(g), base=n)
        u <- update(m, contrasts=list(g=ct))
        uc <- coef(summary(u))['b',]
        if (any(is.na(coef(u))) && any(!is.na(alias(u)$Complete)))
            uc[1:4] <- NA
        uc
    }
    t(sapply(1:4, my.coef))

This seems adequate for my immediate purposes and I can clean it up to
be more general.  But I'm wondering if there is an easier/better way
using existing R facilities?  Here, I'm doing more model fitting than
necessary, but (for now) speed is not a factor and I was having
trouble writing a concise solution that worked on just the original
model object.

-- David Hinds