[R] how to get "lsmeans"?
John Fox
jfox at mcmaster.ca
Thu Mar 22 15:24:09 CET 2007
Dear Bob,
I think that you make a valid point -- and I've included "Type-III" tests in
the Anova() function in the car package, for example, though not entirely
for consistency with SAS -- but my object in writing the effects package was
different. I wanted to provide a general approach to visualizing terms in
complex models with linear predictors. If I can with reasonable effort
provide a solution that's the same as "least-squares means" for models for
which "least-squares means" are defined then I'll do so at some point, but
duplicating what SAS does was not my goal.
Regards,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
> Muenchen, Robert A (Bob)
> Sent: Thursday, March 22, 2007 9:36 AM
> To: R-help at stat.math.ethz.ch
> Subject: Re: [R] how to get "lsmeans"?
>
> Hi All,
>
> Perhaps I'm stating the obvious, but to increase the use of R
> in places where SAS & SPSS dominate, it's important to make
> getting the same answers as easy as possible. That includes
> things like lsmeans and type III sums of squares. I've read
> lots of discussions here on sums of squares & I'm not
> advocating type III use, just looking at it from a marketing
> perspective. Too many people look for excuses to not change.
> The fewer excuses, the better.
>
> Of course this is easy for me to say, as I'm not the one who
> does the work! Much thanks to those who do.
>
> Cheers,
> Bob
>
> =========================================================
> Bob Muenchen (pronounced Min'-chen), Manager
> Statistical Consulting Center
> U of TN Office of Information Technology
> 200 Stokely Management Center, Knoxville, TN 37996-0520
> Voice: (865) 974-5230
> FAX: (865) 974-4810
> Email: muenchen at utk.edu
> Web: http://oit.utk.edu/scc,
> News: http://listserv.utk.edu/archives/statnews.html
> =========================================================
>
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-
> > bounces at stat.math.ethz.ch] On Behalf Of John Fox
> > Sent: Wednesday, March 21, 2007 8:59 PM
> > To: 'Prof Brian Ripley'
> > Cc: 'r-help'; 'Chuck Cleland'
> > Subject: Re: [R] how to get "lsmeans"?
> >
> > Dear Brian et al.,
> >
> > My apologies for chiming in late: It's been a busy day.
> >
> > First some general comments on "least-squares means" and "effect
> > displays."
> > The general idea behind the two is similar -- to examine
> fitted values
> > corresponding to a term in a model while holding other terms to
> typical
> > values -- but the implementation is not identical. There are also
> other
> > similar ideas floating around as well. My formulation is
> more general
> > in the sense that it applies to a wider variety of models,
> both linear
> > and otherwise.
> >
> > "Least-squares means" (a horrible term, by the way: in a
> 1980 paper in
> > the American Statistician, Searle, Speed, and Milliken
> suggested the
> > more descriptive term "population marginal means") apply to factors
> > and combinations of factors; covariates are set to mean
> values and the
> > levels of other factors are averaged over, in effect applying equal
> > weight to each level. (This is from memory, so it's
> possible that I'm
> > not getting it quite right, but I believe that I am.) In my effect
> > displays, each level of
> a
> > factor is weighted by its proportion in the data. In models
> in which
> > least-squares means can be computed, they should differ from the
> > corresponding effect display by a constant (if there are different
> > numbers of observations in the different levels of the factors that
> > are held constant).
> >
> > The obstacle to computing either least-squares means or effect
> displays
> > in R
> > via predict() is that predict() wants factors in the "new
> data" to be
> > set to particular levels. The effect() function in the
> effects package
> > bypasses
> > predict() and works directly with the model matrix,
> averaging over the
> > columns that pertain to a factor (and reconstructing
> interactions as
> > necessary). As mentioned, this has the effect of setting
> the factor to
> > its proportional distribution in the data. This approach
> also has the
> > advantage of being invariant with respect to the choice of
> contrasts
> > for a factor.
> >
> > The only convenient way that I can think of to implement
> least-squares
> > means in R would be to use deviation-coded regressors for a factor
> > (that is,
> > contr.sum) and then to set the columns of the model matrix for the
> > factor(s)
> > to be averaged over to 0. It may just be that I'm having a
> failure of
> > imagination and that there's a better way to proceed. I've not
> > implemented this solution because it is dependent upon the
> choice of
> > contrasts and because I don't see a general advantage to
> it, but since
> > the issue has come up several times now, maybe I should
> take a crack
> > at it. Remember that I want this to work more generally,
> not just for
> > levels of factors, and not just for linear models.
> >
> > Brian is quite right in mentioning that he suggested some time ago
> that
> > I
> > use critical values of t rather than of the standard normal
> > distribution for producing confidence intervals, and I
> agree that it
> > makes sense to do so in models in which the dispersion is
> estimated.
> > My only excuse for not
> yet
> > doing this is that I want to undertake a more general
> revision of the
> > effects package, and haven't had time to do it. There are several
> > changes that I'd like to make to the package. For example, I have
> > results for multinomial and proportional odds logit models
> (described
> > in a paper
> by
> > me
> > and Bob Andersen in the 2006 issue of Sociological
> Methodology) that I
> > want to incorporate, and I'd like to improve the appearance of the
> > default graphs. But Brian's suggestion is very
> straightforward, and I
> > guess that I shouldn't wait to implement it; I'll do so very soon.
> >
> > Regards,
> > John
> >
> > --------------------------------
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> >
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof Brian
> > > Ripley
> > > Sent: Wednesday, March 21, 2007 12:03 PM
> > > To: Chuck Cleland
> > > Cc: r-help
> > > Subject: Re: [R] how to get "lsmeans"?
> > >
> > > On Wed, 21 Mar 2007, Chuck Cleland wrote:
> > >
> > > > Liaw, Andy wrote:
> > > >> I verified the result from the following with output from JMP 6
> on
> > > >> the same data (don't have SAS: don't need it):
> > > >>
> > > >> set.seed(631)
> > > >> n <- 100
> > > >> dat <- data.frame(y=rnorm(n), A=factor(sample(1:2, n,
> > > replace=TRUE)),
> > > >> B=factor(sample(1:2, n, replace=TRUE)),
> > > >> C=factor(sample(1:2, n, replace=TRUE)),
> > > >> d=rnorm(n))
> > > >> fm <- lm(y ~ A + B + C + d, dat)
> > > >> ## Form a data frame of points to predict: all
> > > combinations of the ##
> > > >> three factors and the mean of the covariate.
> > > >> p <- data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] <-
> lapply(p,
> > > >> factor) p <- cbind(p, d=mean(dat$d)) p <-
> > > cbind(yhat=predict(fm, p),
> > > >> p) ## lsmeans for the three factors:
> > > >> with(p, tapply(yhat, A, mean))
> > > >> with(p, tapply(yhat, B, mean))
> > > >> with(p, tapply(yhat, C, mean))
> > > >
> > > > Using Andy's example data, these are the LSMEANS and
> > > intervals I get
> > > > from SAS:
> > > >
> > > > A y LSMEAN 95% Confidence Limits
> > > > 1 -0.071847 -0.387507 0.243813
> > > > 2 -0.029621 -0.342358 0.283117
> > > >
> > > > B y LSMEAN 95% Confidence Limits
> > > > 1 -0.104859 -0.397935 0.188216
> > > > 2 0.003391 -0.333476 0.340258
> > > >
> > > > C y LSMEAN 95% Confidence Limits
> > > > 1 -0.084679 -0.392343 0.222986
> > > > 2 -0.016789 -0.336374 0.302795
> > > >
> > > > One way of reproducing the LSMEANS and intervals from SAS using
> > > > predict() seems to be the following:
> > > >
> > > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d,
> > > data = dat)
> > > >> newdat <-
> expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d))
> > > >> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
> > > > A B C d fit lwr upr
> > > > 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128
> > > > 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165
> > > >
> > > > However, another possibility seems to be:
> > > >
> > > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d,
> > > data = dat)
> > > >> newdat <-
> > > >
> > >
> >
> expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numer
> > > > ic(dat$C)),d=mean(dat$d))
> > > >> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
> > > > A B C d fit lwr upr
> > > > 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012
> > > > 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465
> > > >
> > > > The predictions directly above match what effect() in
> the effects
> > > > package by John Fox returns:
> > > >
> > > > library(effects)
> > > >
> > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))
> > > >
> > > > A effect
> > > > A
> > > > 1 2
> > > > -0.08078243 -0.03855619
> > > >
> > > > But for some reason the predict() and effect() intervals
> > > are a little
> > > > different:
> > > >
> > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$lower
> > > > [,1]
> > > > 101 -0.3924451
> > > > 102 -0.3440179
> > > >
> > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$upper
> > > > [,1]
> > > > 101 0.2308802
> > > > 102 0.2669055
> > > >
> > > > I would be interested in any comments on these different
> > > approaches
> > > > and on the difference in intervals returned by predict()
> > > and effect().
> > >
> > > AFAIR, the effects packages uses normal-based confidence
> intervals
> > > and predict.lm uses t-based ones, and I have suggested to
> John Fox
> > > that t-based intervals would be preferable, at least as an option.
> > >
> > >
> > > --
> > > Brian D. Ripley, ripley at stats.ox.ac.uk
> > > Professor of Applied Statistics,
> http://www.stats.ox.ac.uk/~ripley/
> > > University of Oxford, Tel: +44 1865 272861 (self)
> > > 1 South Parks Road, +44 1865 272866 (PA)
> > > Oxford OX1 3TG, UK Fax: +44 1865 272595
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html and provide commented, minimal, self-contained,
> > reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list