[R] how to get "lsmeans"?

John Fox jfox at mcmaster.ca
Thu Mar 22 01:59:10 CET 2007


Dear Brian et al.,

My apologies for chiming in late: It's been a busy day.

First some general comments on "least-squares means" and "effect displays."
The general idea behind the two is similar -- to examine fitted values
corresponding to a term in a model while holding other terms to typical
values -- but the implementation is not identical. There are also other
similar ideas floating around as well. My formulation is more general in the
sense that it applies to a wider variety of models, both linear and
otherwise.

"Least-squares means" (a horrible term, by the way: in a 1980 paper in the
American Statistician, Searle, Speed, and Milliken suggested the more
descriptive term "population marginal means") apply to factors and
combinations of factors; covariates are set to mean values and the levels of
other factors are averaged over, in effect applying equal weight to each
level. (This is from memory, so it's possible that I'm not getting it quite
right, but I believe that I am.) In my effect displays, each level of a
factor is weighted by its proportion in the data. In models in which
least-squares means can be computed, they should differ from the
corresponding effect display by a constant (if there are different numbers
of observations in the different levels of the factors that are held
constant).

The obstacle to computing either least-squares means or effect displays in R
via predict() is that predict() wants factors in the "new data" to be set to
particular levels. The effect() function in the effects package bypasses
predict() and works directly with the model matrix, averaging over the
columns that pertain to a factor (and reconstructing interactions as
necessary). As mentioned, this has the effect of setting the factor to its
proportional distribution in the data. This approach also has the advantage
of being invariant with respect to the choice of contrasts for a factor.

The only convenient way that I can think of to implement least-squares means
in R would be to use deviation-coded regressors for a factor (that is,
contr.sum) and then to set the columns of the model matrix for the factor(s)
to be averaged over to 0. It may just be that I'm having a failure of
imagination and that there's a better way to proceed. I've not implemented
this solution because it is dependent upon the choice of contrasts and
because I don't see a general advantage to it, but since the issue has come
up several times now, maybe I should take a crack at it. Remember that I
want this to work more generally, not just for levels of factors, and not
just for linear models.

Brian is quite right in mentioning that he suggested some time ago that I
use critical values of t rather than of the standard normal distribution for
producing confidence intervals, and I agree that it makes sense to do so in
models in which the dispersion is estimated. My only excuse for not yet
doing this is that I want to undertake a more general revision of the
effects package, and haven't had time to do it. There are several changes
that I'd like to make to the package. For example, I have results for
multinomial and proportional odds logit models (described in a paper by me
and Bob Andersen in the 2006 issue of Sociological Methodology) that I want
to incorporate, and I'd like to improve the appearance of the default
graphs. But Brian's suggestion is very straightforward, and I guess that I
shouldn't wait to implement it; I'll do so very soon.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof 
> Brian Ripley
> Sent: Wednesday, March 21, 2007 12:03 PM
> To: Chuck Cleland
> Cc: r-help
> Subject: Re: [R] how to get "lsmeans"?
> 
> On Wed, 21 Mar 2007, Chuck Cleland wrote:
> 
> > Liaw, Andy wrote:
> >> I verified the result from the following with output from JMP 6 on 
> >> the same data (don't have SAS: don't need it):
> >>
> >> set.seed(631)
> >> n <- 100
> >> dat <- data.frame(y=rnorm(n), A=factor(sample(1:2, n, 
> replace=TRUE)),
> >>                   B=factor(sample(1:2, n, replace=TRUE)),
> >>                   C=factor(sample(1:2, n, replace=TRUE)),
> >>                   d=rnorm(n))
> >> fm <- lm(y ~ A + B + C + d, dat)
> >> ## Form a data frame of points to predict: all 
> combinations of the ## 
> >> three factors and the mean of the covariate.
> >> p <- data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] <- lapply(p, 
> >> factor) p <- cbind(p, d=mean(dat$d)) p <- 
> cbind(yhat=predict(fm, p), 
> >> p) ## lsmeans for the three factors:
> >> with(p, tapply(yhat, A, mean))
> >> with(p, tapply(yhat, B, mean))
> >> with(p, tapply(yhat, C, mean))
> >
> >  Using Andy's example data, these are the LSMEANS and 
> intervals I get 
> > from SAS:
> >
> > A        y LSMEAN      95% Confidence Limits
> > 1       -0.071847       -0.387507     0.243813
> > 2       -0.029621       -0.342358     0.283117
> >
> > B        y LSMEAN      95% Confidence Limits
> > 1       -0.104859       -0.397935     0.188216
> > 2        0.003391       -0.333476     0.340258
> >
> > C        y LSMEAN      95% Confidence Limits
> > 1       -0.084679       -0.392343     0.222986
> > 2       -0.016789       -0.336374     0.302795
> >
> >  One way of reproducing the LSMEANS and intervals from SAS using
> > predict() seems to be the following:
> >
> >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d, 
> data = dat) 
> >> newdat <- expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d))
> >> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
> >  A   B   C          d         fit        lwr       upr
> > 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128
> > 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165
> >
> >  However, another possibility seems to be:
> >
> >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d, 
> data = dat) 
> >> newdat <-
> > 
> expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numer
> > ic(dat$C)),d=mean(dat$d))
> >> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
> >  A    B    C          d         fit        lwr       upr
> > 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012
> > 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465
> >
> >  The predictions directly above match what effect() in the effects 
> > package by John Fox returns:
> >
> > library(effects)
> >
> >> effect("A", fm, xlevels=list(d = mean(dat$D)))
> >
> > A effect
> > A
> >          1           2
> > -0.08078243 -0.03855619
> >
> >  But for some reason the predict() and effect() intervals 
> are a little
> > different:
> >
> >> effect("A", fm, xlevels=list(d = mean(dat$D)))$lower
> >          [,1]
> > 101 -0.3924451
> > 102 -0.3440179
> >
> >> effect("A", fm, xlevels=list(d = mean(dat$D)))$upper
> >         [,1]
> > 101 0.2308802
> > 102 0.2669055
> >
> >  I would be interested in any comments on these different 
> approaches 
> > and on the difference in intervals returned by predict() 
> and effect().
> 
> AFAIR, the effects packages uses normal-based confidence 
> intervals and predict.lm uses t-based ones, and I have 
> suggested to John Fox that t-based intervals would be 
> preferable, at least as an option.
> 
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list