[R] how to get "lsmeans"?
Muenchen, Robert A (Bob)
muenchen at utk.edu
Sat Mar 24 02:02:27 CET 2007
The Exegesis paper gave me a great look at the history of all this. I
had not been aware that S-PLUS had gone that route. There is much to be
said for knowing you might be more successful but sticking to your
perspective instead. And in the long run, that may be the more
successful route anyway.
Thanks,
Bob
=========================================================
Bob Muenchen (pronounced Min'-chen), Manager
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230
FAX: (865) 974-4810
Email: muenchen at utk.edu
Web: http://oit.utk.edu/scc,
News: http://listserv.utk.edu/archives/statnews.html
=========================================================
> -----Original Message-----
> From: Liaw, Andy [mailto:andy_liaw at merck.com]
> Sent: Thursday, March 22, 2007 5:27 PM
> To: Douglas Bates; Muenchen, Robert A (Bob)
> Cc: R-help at stat.math.ethz.ch
> Subject: RE: [R] how to get "lsmeans"?
>
> From: Douglas Bates
> >
> > On 3/22/07, Muenchen, Robert A (Bob) <muenchen at utk.edu> wrote:
> >
> > > Perhaps I'm stating the obvious, but to increase the use of R in
> > > places where SAS & SPSS dominate, it's important to make
> > getting the
> > > same answers as easy as possible. That includes things like
lsmeans
> > > and type III sums of squares. I've read lots of discussions here
on
> > > sums of squares & I'm not advocating type III use, just
> > looking at it
> > > from a marketing perspective. Too many people look for
> > excuses to not change.
> > > The fewer excuses, the better.
> >
> > You may get strong reactions to such a suggestion. I
> > recommend reading Bill Venables' famous unpublished paper
> > "Exegeses on linear models" (google for the title - very few
> > people use "Exegeses" and "linear models" in the same
> > sentence - in fact I would not be surprised if Bill was the
> > only one who has ever done so).
>
> It's on the MASS page:
> http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf
> I believe it's based on a talk Bill gave at a S-PLUS User's
Conference.
> I think it deserves to be required reading for all graduate level
> linear
> models course.
>
> > You must realize that R is written by experts in statistics
> > and statistical computing who, despite popular opinion, do
> > not believe that everything in SAS and SPSS is worth copying.
> > Some things done in such packages, which trace their roots
> > back to the days of punched cards and magnetic tape when
> > fitting a single linear model may take several days because
> > your first 5 attempts failed due to syntax errors in the JCL
> > or the SAS code, still reflect the approach of "give me every
> > possible statistic that could be calculated from this model,
> > whether or not it makes sense". The approach taken in R is
> different.
> > The underlying assumption is that the useR is thinking about
> > the analysis while doing it.
> >
> > The fact that it is so difficult to explain what lsmeans are
> > and why they would be of interest is an indication of why
> > they aren't implemented in any of the required packages.
>
> Perhaps I should have made it clear in my original post: I gave the
> example and code more to show what the mysterious "least squares
means"
> are (which John explained lucidly), than how to replicate what SAS (or
> JMP) outputs. I do not understand how people can feel comfortable
> reporting things like lsmeans and p-values from type <insert your
> favorite Roman numeral here> tests when they do not know how such
> things
> arise or, at the very least, what they _really_ mean. (Given how
> simple
> lsmeans are computed, not knowing how to compute them is pretty much
> the
> same as not knowing what they are.) One of the dangers of wholesale
> output as SAS or SPSS gives is for the user to simply pick an answer
> and
> run with it, without understanding what that answer is, or if it
> corresponds to the question of interest.
>
> As to whether to weight the levels of the factors being held constant,
> my suggestion to John would be to offer both choices (unweighted and
> weighted by observed frequencies). I can see why one would want to
> weight by observed frequencies (if the data are sampled from a
> population), but there are certainly situations (perhaps more often
> than
> not in the cases I've encountered) that the observed frequencies do
not
> come close to approximating what they are in the population. In such
> cases the unweighted average would make more sense to me.
>
> Cheers,
> Andy
>
>
> > > > -----Original Message-----
> > > > From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-
> > > > bounces at stat.math.ethz.ch] On Behalf Of John Fox
> > > > Sent: Wednesday, March 21, 2007 8:59 PM
> > > > To: 'Prof Brian Ripley'
> > > > Cc: 'r-help'; 'Chuck Cleland'
> > > > Subject: Re: [R] how to get "lsmeans"?
> > > >
> > > > Dear Brian et al.,
> > > >
> > > > My apologies for chiming in late: It's been a busy day.
> > > >
> > > > First some general comments on "least-squares means" and "effect
> > > > displays."
> > > > The general idea behind the two is similar -- to examine fitted
> > > > values corresponding to a term in a model while holding
> > other terms
> > > > to
> > > typical
> > > > values -- but the implementation is not identical. There are
also
> > > other
> > > > similar ideas floating around as well. My formulation is more
> > > > general in the sense that it applies to a wider variety
> > of models,
> > > > both linear and otherwise.
> > > >
> > > > "Least-squares means" (a horrible term, by the way: in a
> > 1980 paper
> > > > in the American Statistician, Searle, Speed, and Milliken
> > suggested
> > > > the more descriptive term "population marginal means") apply to
> > > > factors and combinations of factors; covariates are set to mean
> > > > values and the levels of other factors are averaged over,
> > in effect
> > > > applying equal weight to each level. (This is from
> > memory, so it's
> > > > possible that I'm not getting it quite right, but I
> > believe that I
> > > > am.) In my effect displays, each level of
> > > a
> > > > factor is weighted by its proportion in the data. In
> > models in which
> > > > least-squares means can be computed, they should differ from the
> > > > corresponding effect display by a constant (if there are
> > different
> > > > numbers of observations in the different levels of the
> > factors that
> > > > are held constant).
> > > >
> > > > The obstacle to computing either least-squares means or effect
> > > displays
> > > > in R
> > > > via predict() is that predict() wants factors in the "new
> > data" to
> > > > be set to particular levels. The effect() function in the
effects
> > > > package bypasses
> > > > predict() and works directly with the model matrix,
> > averaging over
> > > > the columns that pertain to a factor (and reconstructing
> > > > interactions as necessary). As mentioned, this has the effect of
> > > > setting the factor to its proportional distribution in the data.
> > > > This approach also has the advantage of being invariant
> > with respect
> > > > to the choice of contrasts for a factor.
> > > >
> > > > The only convenient way that I can think of to implement
> > > > least-squares means in R would be to use deviation-coded
> > regressors
> > > > for a factor (that is,
> > > > contr.sum) and then to set the columns of the model matrix for
> the
> > > > factor(s)
> > > > to be averaged over to 0. It may just be that I'm having
> > a failure
> > > > of imagination and that there's a better way to proceed. I've
not
> > > > implemented this solution because it is dependent upon
> > the choice of
> > > > contrasts and because I don't see a general advantage to it, but
> > > > since the issue has come up several times now, maybe I
> > should take a
> > > > crack at it. Remember that I want this to work more
> > generally, not
> > > > just for levels of factors, and not just for linear models.
> > > >
> > > > Brian is quite right in mentioning that he suggested some time
> ago
> > > that
> > > > I
> > > > use critical values of t rather than of the standard normal
> > > > distribution for producing confidence intervals, and I
> > agree that it
> > > > makes sense to do so in models in which the dispersion is
> > estimated.
> > > > My only excuse for not
> > > yet
> > > > doing this is that I want to undertake a more general revision
of
> > > > the effects package, and haven't had time to do it. There are
> > > > several changes that I'd like to make to the package. For
> > example, I
> > > > have results for multinomial and proportional odds logit models
> > > > (described in a paper
> > > by
> > > > me
> > > > and Bob Andersen in the 2006 issue of Sociological
> > Methodology) that
> > > > I want to incorporate, and I'd like to improve the
> > appearance of the
> > > > default graphs. But Brian's suggestion is very
> > straightforward, and
> > > > I guess that I shouldn't wait to implement it; I'll do so
> > very soon.
> > > >
> > > > Regards,
> > > > John
> > > >
> > > > --------------------------------
> > > > John Fox
> > > > Department of Sociology
> > > > McMaster University
> > > > Hamilton, Ontario
> > > > Canada L8S 4M4
> > > > 905-525-9140x23604
> > > > http://socserv.mcmaster.ca/jfox
> > > > --------------------------------
> > > >
> > > > > -----Original Message-----
> > > > > From: r-help-bounces at stat.math.ethz.ch
> > > > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
> > Prof Brian
> > > > > Ripley
> > > > > Sent: Wednesday, March 21, 2007 12:03 PM
> > > > > To: Chuck Cleland
> > > > > Cc: r-help
> > > > > Subject: Re: [R] how to get "lsmeans"?
> > > > >
> > > > > On Wed, 21 Mar 2007, Chuck Cleland wrote:
> > > > >
> > > > > > Liaw, Andy wrote:
> > > > > >> I verified the result from the following with output
> > from JMP 6
> > > on
> > > > > >> the same data (don't have SAS: don't need it):
> > > > > >>
> > > > > >> set.seed(631)
> > > > > >> n <- 100
> > > > > >> dat <- data.frame(y=rnorm(n), A=factor(sample(1:2, n,
> > > > > replace=TRUE)),
> > > > > >> B=factor(sample(1:2, n, replace=TRUE)),
> > > > > >> C=factor(sample(1:2, n, replace=TRUE)),
> > > > > >> d=rnorm(n))
> > > > > >> fm <- lm(y ~ A + B + C + d, dat) ## Form a data
> > frame of points
> > > > > >> to predict: all
> > > > > combinations of the ##
> > > > > >> three factors and the mean of the covariate.
> > > > > >> p <- data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] <-
> > > lapply(p,
> > > > > >> factor) p <- cbind(p, d=mean(dat$d)) p <-
> > > > > cbind(yhat=predict(fm, p),
> > > > > >> p) ## lsmeans for the three factors:
> > > > > >> with(p, tapply(yhat, A, mean))
> > > > > >> with(p, tapply(yhat, B, mean))
> > > > > >> with(p, tapply(yhat, C, mean))
> > > > > >
> > > > > > Using Andy's example data, these are the LSMEANS and
> > > > > intervals I get
> > > > > > from SAS:
> > > > > >
> > > > > > A y LSMEAN 95% Confidence Limits
> > > > > > 1 -0.071847 -0.387507 0.243813
> > > > > > 2 -0.029621 -0.342358 0.283117
> > > > > >
> > > > > > B y LSMEAN 95% Confidence Limits
> > > > > > 1 -0.104859 -0.397935 0.188216
> > > > > > 2 0.003391 -0.333476 0.340258
> > > > > >
> > > > > > C y LSMEAN 95% Confidence Limits
> > > > > > 1 -0.084679 -0.392343 0.222986
> > > > > > 2 -0.016789 -0.336374 0.302795
> > > > > >
> > > > > > One way of reproducing the LSMEANS and intervals
> > from SAS using
> > > > > > predict() seems to be the following:
> > > > > >
> > > > > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d,
> > > > > data = dat)
> > > > > >> newdat <-
> > > > > >> expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d))
> > > > > >> cbind(newdat, predict(dat.lm, newdat,
> interval="confidence"))
> > > > > > A B C d fit lwr upr
> > > > > > 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128
> > > > > > 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165
> > > > > >
> > > > > > However, another possibility seems to be:
> > > > > >
> > > > > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d,
> > > > > data = dat)
> > > > > >> newdat <-
> > > > > >
> > > > >
> > > >
> > expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.num
> > > > er
> > > > > > ic(dat$C)),d=mean(dat$d))
> > > > > >> cbind(newdat, predict(dat.lm, newdat,
> interval="confidence"))
> > > > > > A B C d fit lwr upr
> > > > > > 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012
> > > > > > 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465
> > > > > >
> > > > > > The predictions directly above match what effect() in the
> > > > > > effects package by John Fox returns:
> > > > > >
> > > > > > library(effects)
> > > > > >
> > > > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))
> > > > > >
> > > > > > A effect
> > > > > > A
> > > > > > 1 2
> > > > > > -0.08078243 -0.03855619
> > > > > >
> > > > > > But for some reason the predict() and effect() intervals
> > > > > are a little
> > > > > > different:
> > > > > >
> > > > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$lower
> > > > > > [,1]
> > > > > > 101 -0.3924451
> > > > > > 102 -0.3440179
> > > > > >
> > > > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$upper
> > > > > > [,1]
> > > > > > 101 0.2308802
> > > > > > 102 0.2669055
> > > > > >
> > > > > > I would be interested in any comments on these different
> > > > > approaches
> > > > > > and on the difference in intervals returned by predict()
> > > > > and effect().
> > > > >
> > > > > AFAIR, the effects packages uses normal-based
> > confidence intervals
> > > > > and predict.lm uses t-based ones, and I have suggested
> > to John Fox
> > > > > that t-based intervals would be preferable, at least as
> > an option.
> > > > >
> > > > >
> > > > > --
> > > > > Brian D. Ripley, ripley at stats.ox.ac.uk
> > > > > Professor of Applied Statistics,
> > http://www.stats.ox.ac.uk/~ripley/
> > > > > University of Oxford, Tel: +44 1865 272861 (self)
> > > > > 1 South Parks Road, +44 1865 272866 (PA)
> > > > > Oxford OX1 3TG, UK Fax: +44 1865 272595
> > > > >
> > > > > ______________________________________________
> > > > > R-help at stat.math.ethz.ch mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> > > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained,
> > reproducible code.
> > > >
> > > > ______________________________________________
> > > > R-help at stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-
> > > > guide.html and provide commented, minimal, self-contained,
> > > > reproducible code.
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
>
>
-----------------------------------------------------------------------
> -------
> Notice: This e-mail message, together with any attachment...{{dropped}}
More information about the R-help
mailing list