[R] Are least-squares means useful or appropriate?

Fri Sep 23 16:00:26 CEST 2005

On 9/20/05, Felipe <felipe at unileon.es> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi.
> My question was just theoric. I was wondering if someone who were using
> SAS and R could give me their opinion on the topic. I was trying to use
> least-squares means for comparison in R, but then I found some
> indications against them, and I wanted to know if they had good basis
> (as I told earlier, they were not much detailed).
> Greetings.
>
> Felipe

As Deepayan said in his reply, the concept of least squares means is
associated with SAS and is not generally part of the theory of linear
models in statistics.  My vague understanding of these (I too am not a
SAS user) is that they are an attempt to estimate the "mean" response
for a particular level of a factor in a model in which that factor has
a non-ignorable interaction with another factor.  There is no clearly
acceptable definition of such a thing.

To understand why there should be an attempt to answer a question that
doesn't make sense, remember the history of SAS, which was developed
in the era of punched cards and magnetic tape.  Beneath the surface of
SAS with its GUI, etc. is the fundamental assumption that your data
are on a reel of magnetic tape over in the "Computer Center" that
houses an IBM Sytem/360 computer and that the way you are going to use
this program is by keypunching a deck of punched cards, putting some
mysterious JCL (the IBM Job Control Language which no one understood
and you learned only by imitation) cards at the beginning and end, and
submitting them at the I/O Window.  The next day you will go to the
computer center to pick up your output only to discover that you had a
JCL error.  You will spend most of the morning tracking down the one
person on campus who can tell you that "ERROR IEH92345" was caused by
the blank between the "DD" and the "*" in the card that reads //SYSIN
DD * so you change that and submit again.  After two or three days of
this you get the JCL right but discover that you have a syntax error
in your SAS code.  Another two or three cycles finally gets you to the
point where you have a card deck that runs and produces output.  At
that point you don't really care if the output makes sense or not -
all you want is some numbers for the report that is now a week
overdue.  You also want all the numbers that you might possibly need,
which is why SAS PROCs always have the potential to produce tons of
output if you ask for it.

R is an interactive language where it is a simple matter to fit a
series of models and base your analysis on a model that is
appropriate.  An approach of "give me the answer to any possible
question about this model, whether or not it make sense" is
unnecessary.

In many ways statistical theory and practice has not caught up with
statistical computing.  There are concepts that are regarded as part
of established statistical theory when they are, in fact, 
approximations or compromises motivated by the fact that you can't
compute the answer you want - except now you can compute it.  However,
that won't stop people who were trained in the old system from
assuming that things *must* be done in that way.

In short, I agree with Deepayan - the best thing to do is to ask
someone who uses SAS and least squares means to explain to you what
they are.