[R] Are least-squares means useful or appropriate?
John Fox
jfox at mcmaster.ca
Sat Sep 24 15:04:14 CEST 2005
Dear Peter, Doug, and Felipe,
My effects package (on CRAN, also see the article at
http://www.jstatsoft.org/counter.php?id=75&url=v08/i15/effect-displays-revis
ed.pdf) will compute and graph adjusted effects of various kinds for linear
and generalized linear models -- generalizing so-called "least-squares
means" (or "population marginal means" or "adjusted means").
A couple of comments:
By default, the all.effects() function in the effects package computes
effects for high-order terms in the model, absorbing terms marginal to them.
You can ask the effect() function to compute an effect for a term that's
marginal to a higher-order term, and it will do so with a warning, but this
is rarely sensible.
Peter's mention of floating variances (or quasi-variances) in this context
is interesting, but what would most like to see, I think, are the
quasi-variances for the adjusted effects, that is for terms merged with
their lower-order relatives. These, for example, are unaffected by contrast
coding. How to define reasonable quasi-variances in this context has been
puzzling me for a while.
Regards,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Peter Dalgaard
> Sent: Friday, September 23, 2005 10:23 AM
> To: Douglas Bates
> Cc: Felipe; R-help at stat.math.ethz.ch
> Subject: Re: [R] Are least-squares means useful or appropriate?
>
> Douglas Bates <dmbates at gmail.com> writes:
>
> > On 9/20/05, Felipe <felipe at unileon.es> wrote:
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > >
> > > Hi.
> > > My question was just theoric. I was wondering if someone who were
> > > using SAS and R could give me their opinion on the topic. I was
> > > trying to use least-squares means for comparison in R, but then I
> > > found some indications against them, and I wanted to know if they
> > > had good basis (as I told earlier, they were not much detailed).
> > > Greetings.
> > >
> > > Felipe
> >
> > As Deepayan said in his reply, the concept of least squares
> means is
> > associated with SAS and is not generally part of the theory
> of linear
> > models in statistics. My vague understanding of these (I
> too am not a
> > SAS user) is that they are an attempt to estimate the
> "mean" response
> > for a particular level of a factor in a model in which that
> factor has
> > a non-ignorable interaction with another factor. There is
> no clearly
> > acceptable definition of such a thing.
>
> (PD goes and fetches the SAS manual....)
>
> Well, yes. it'll do that too, although only if you ask for
> the lsmeans of A when an interaction like A*B is present in
> the model. This is related to the tests of main effects when
> an interaction is present using type III sums of squares,
> which has been beaten to death repeatedly on the list. In
> both cases, there seems to be an implicit assumption that
> categorical variables by nature comes from an underlying
> fully balanced design.
>
> If the interaction is absent from the model, the lsmeans are
> somewhat more sensible in that they at least reproduce the
> parameter estimates as contrasts between different groups.
> All continuous variables in the design will be set to their
> mean, but values for categorical design variables are
> weighted inversely as the number of groups. So if you're
> doing an lsmeans of lung function by smoking adjusted for age
> and sex you get estimates for the mean of a population of
> which everyone has the same age and half are male and half
> are female. This makes some sense, but if you do it for sex
> adjusting for smoking and age, you are not only forcing the
> sexes to smoke equally much, but actually adjusting to
> smoking rates of 50%, which could be quite far from reality.
>
> The whole operation really seems to revolve around 2 things:
>
> (1) pairwise comparisons between factor levels. This can alternatively
> be done fairly easily using parameter estimates for the relevant
> variable and associated covariances. You don't really need all the
> mumbo-jumbo of adjusting to particular values of other variables.
>
> (2) plotting effects of a factor with error bars as if they were
> simple group means. This has some merit since the standard
> parametrizations are misleading at times (e.g. if you choose the
> group with the least data as the reference level, std. err. for
> the other groups will seem high). However, it seems to me that
> concepts like floating variances (see float() in the Epi package)
> are more to the point.
>
> > R is an interactive language where it is a simple matter to fit a
> > series of models and base your analysis on a model that is
> > appropriate. An approach of "give me the answer to any possible
> > question about this model, whether or not it make sense" is
> > unnecessary.
> >
> > In many ways statistical theory and practice has not caught up with
> > statistical computing. There are concepts that are
> regarded as part
> > of established statistical theory when they are, in fact,
> > approximations or compromises motivated by the fact that you can't
> > compute the answer you want - except now you can compute
> it. However,
> > that won't stop people who were trained in the old system from
> > assuming that things *must* be done in that way.
> >
> > In short, I agree with Deepayan - the best thing to do is to ask
> > someone who uses SAS and least squares means to explain to you what
> > they are.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
>
> --
> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph:
> (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:
> (+45) 35327907
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list