R-beta: a suggestion for constraints

Fri Sep 4 22:00:59 CEST 1998

On Fri, 4 Sep 1998, Thomas Lumley wrote:

> On Fri, 4 Sep 1998, Jim Lindsey wrote:
> 
> > One small thing that would be very nice for baseline-constraint (R
> > calls them treatment for some reason) factor variables would be an
> > option to specify which category is baseline. This would be extremely
> > useful in experimental contexts, especially medicine. For the moment,
> 
> I actually agree here. There seem to be two possibilities. One is to make
> it easy to use as.factor() to create factors with code 1 being whatever
> level you want. The other is to make it easy to use contrasts by providing
> a contr.treatment(baseline) function that creates treatment contrasts with
> the specified baseline level, so you could do
> glm(outcome~trt,contrasts=list(trt=contr.treatment("PL"))
> 
> The first seems a more natural solution to this specific problem, as we
> have a natural baseline category and the problem is that as.factor doesn't
> know that. In fact the function to change is factor(), which does all the
> work of factor creation.  I have a draft version of factor(,baseline=)
> but I would like to test it some more (and see if other people think it is
> a good idea).  I would personally prefer to call the argument 'reference'
> rather than 'baseline' but I don't really care all that much.

I posted the following functions to S-news on May 27. Might they do what
you want?

relevel <- function(x, ref, ...) UseMethod("relevel")
relevel.default <- function(x, ref, ...) stop("relevel only for factors")
relevel.ordered <- function(x, ref, ...) stop("relevel only for factors")
relevel.factor <- function(x, ref, ...)
{
  lev <- levels(x); nlev <- length(lev)
  if(is.character(ref)) ref <- match(ref, lev)
  if(is.na(ref)) stop("ref must be an existing level")
  if(ref < 1 || ref > nlev) stop(paste("ref =", ref, "must be in 1 :",
nlev))
  factor(x, levels = lev[c(ref, seq(along=lev)[-ref])] )
}
(and I seemed to prefer `ref' for reference to baseline then).  I would
oppose adding further arguments to factor: they are not needed as
using `levels' or `labels' should suffice.  But one does sometimes need to
change the ordering of levels, not least in data frames read by read.table.

This gives:

> xx <- factor(c("L", "M", "H"))
> xx
[1] L M H
> levels(xx)
[1] "H" "L" "M"
> yy <- relevel(xx, "L")
> yy
[1] L M H
> levels(yy)
[1] "L" "H" "M"

I see Thomas' second solution as the better one, or, even better, to set a
contrast on the factor by the contrasts assignment function. As
contr.treatment already has two arguments, I suggest something like

contrasts(myfactor) <- contr.reference(myfactor, baseline="one")

where

contr.reference <- function(fac, reference)
{
  lev <- levels(fac)
  if(!missing(reference)) fac <- relevel(fac, ref=reference)
  ans <- contr.treatment(levels(fac))
  ans[lev, ]	
}

(It needs to be done this way as the internal code in model.matrix
makes lots of assumptions.)

For example:

> data(warpbreaks)
> glm(breaks ~ wool + tension, poisson, data=warpbreaks)

Coefficients:
(Intercept)         wool     tensionM     tensionH  
     3.6920      -0.2060      -0.3213      -0.5185  

> contrasts(warpbreaks$tension) <- contr.reference(warpbreaks$tension, "M")
> test1 <- glm(breaks ~ wool + tension, poisson, data=warpbreaks)

Coefficients:
(Intercept)         wool     tensionL     tensionH  
     3.3706      -0.2060       0.3213      -0.1972  

and this reference level will be used in all subsequent fits with that
(local copy of) the data frame.

Alternatively,

> data(warpbreaks)
> warpbreaks$tension <- relevel(warpbreaks$tension, "M")
> test2 <- glm(breaks ~ wool + tension, poisson, data=warpbreaks)

As an answer to how to see which coding(s) were used, look at the
contrasts and xlevels components. For example:

codings <- function(object)
{
  for(nm in names(object$contrasts))
  {
     cat("factor", nm, "has levels ")
     xl <- object$xlevels[[nm]]
     if(length(xl) > 5) cat(xl[1:5], "...") else cat(xl)
     cn <- object$contrasts[[nm]]
     if(is.character(cn)) cat(" and used ", cn)
     if(is.numeric(cn)) {
       cat(" with contrast matrix\n")
       print(cn)
     }
     cat("\n")
  }
}

Then

> codings(test2)
factor wool has levels A B and used  contr.treatment
factor tension has levels M L H and used  contr.treatment

> codings(test1)
factor wool has levels A B and used  contr.treatment
factor tension has levels L M H with contrast matrix
  L H
L 1 0
M 0 0
H 0 1

This should be tidied up, but I do see this as separate from print or
summary. After all, this is the late 1990s and at least at Oxford we are no
longer submitting decks of cards and poring over line printer output. We
can ask additional questions (and that is the spirit of S, I am sure).

For those who have not seen it:

> dummy.coef(test1)
$(Intercept)
(Intercept) 
   3.370648 

$wool
         A          B 
 0.0000000 -0.2059918 

$tension
         L          M          H 
 0.3213203  0.0000000 -0.1971704 

That is slated for 0.63, again probably with a more elegant printout. (This
one follows S, although not in the method of computation.)  It gives the
coefficients for all the levels whatever the coding. 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._