[R] Are least-squares means useful or appropriate?

Fri Sep 23 17:22:45 CEST 2005

Douglas Bates <dmbates at gmail.com> writes:

> On 9/20/05, Felipe <felipe at unileon.es> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi.
> > My question was just theoric. I was wondering if someone who were using
> > SAS and R could give me their opinion on the topic. I was trying to use
> > least-squares means for comparison in R, but then I found some
> > indications against them, and I wanted to know if they had good basis
> > (as I told earlier, they were not much detailed).
> > Greetings.
> >
> > Felipe
> 
> As Deepayan said in his reply, the concept of least squares means is
> associated with SAS and is not generally part of the theory of linear
> models in statistics.  My vague understanding of these (I too am not a
> SAS user) is that they are an attempt to estimate the "mean" response
> for a particular level of a factor in a model in which that factor has
> a non-ignorable interaction with another factor.  There is no clearly
> acceptable definition of such a thing.

(PD goes and fetches the SAS manual....)

Well, yes. it'll do that too, although only if you ask for the lsmeans
of A when an interaction like A*B is present in the model. This is
related to the tests of main effects when an interaction is present
using type III sums of squares, which has been beaten to death
repeatedly on the list. In both cases, there seems to be an implicit
assumption that categorical variables by nature comes from an
underlying fully balanced design.

If the interaction is absent from the model, the lsmeans are somewhat
more sensible in that they at least reproduce the parameter estimates
as contrasts between different groups. All continuous variables in the
design will be set to their mean, but values for categorical design
variables are weighted inversely as the number of groups. So if you're
doing an lsmeans of lung function by smoking adjusted for age and sex
you get estimates for the mean of a population of which everyone has
the same age and half are male and half are female. This makes some
sense, but if you do it for sex adjusting for smoking and age, you are
not only forcing the sexes to smoke equally much, but actually
adjusting to  smoking rates of 50%, which could be quite far from
reality. 

The whole operation really seems to revolve around 2 things: 

(1) pairwise comparisons between factor levels. This can alternatively
    be done fairly easily using parameter estimates for the relevant
    variable and associated covariances. You don't really need all the
    mumbo-jumbo of adjusting to particular values of other variables.

(2) plotting effects of a factor with error bars as if they were
    simple group means. This has some merit since the standard
    parametrizations are misleading at times (e.g. if you choose the
    group with the least data as the reference level, std. err. for
    the other groups will seem high). However, it seems to me that
    concepts like floating variances (see float() in the Epi package)
    are more to the point.

> R is an interactive language where it is a simple matter to fit a
> series of models and base your analysis on a model that is
> appropriate.  An approach of "give me the answer to any possible
> question about this model, whether or not it make sense" is
> unnecessary.
> 
> In many ways statistical theory and practice has not caught up with
> statistical computing.  There are concepts that are regarded as part
> of established statistical theory when they are, in fact, 
> approximations or compromises motivated by the fact that you can't
> compute the answer you want - except now you can compute it.  However,
> that won't stop people who were trained in the old system from
> assuming that things *must* be done in that way.
> 
> In short, I agree with Deepayan - the best thing to do is to ask
> someone who uses SAS and least squares means to explain to you what
> they are.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907