[R-sig-ME] lme and prediction intervals

Mon Apr 5 16:40:02 CEST 2010

On Sun, Apr 4, 2010 at 8:54 PM, Ben Bolker <bolker at ufl.edu> wrote:
> Douglas Bates wrote:
>> On Sat, Apr 3, 2010 at 11:12 PM, D Chaws <cat.dev.urandom at gmail.com> wrote:
>>> Ok, issue solved for the most straightforward random effects cases.
>>> Not sure about nested random effects or more complex cases.
>>
>> Assuming that you can make sense of lsmeans in such a case.  You may
>> notice that lsmeans are not provided in base and recommended R
>> packages.  That isn't an oversight.  Try to explain what lsmeans are
>> in terms of the probability model.
>>
>> Anyway, if you are happy with it, then go for it.  I'll just give you
>> a warning from a professional statistician that they are a nonsensical
>> construction.
>
>  lsmeans may not make sense in general (I don't really know, I have a
> somewhat weird background that mostly doesn't include SAS), but there's
> nothing wrong with wanting predictions and standard errors of
> predictions, which be definable (?) if one can specify (a) whether a
> given random effect is set to zero or included at its conditional
> mean/mode value (or, for a simulation, chosen from a normal distribution
> with the appropriate variance-covariance structure (b) whether random
> effects not included in the prediction (and the residual error) are
> included in the SE or not.  I agree that specifying all this is not as
> easy as specifying "level", but can't one in principle do this by
> specifying which random effects are in/out of the prediction or the SE?

Your first sentence is absolutely right - those whose backgrounds do
not include an introduction to SASspeak have difficulty in
understanding what lsmeans are, and that group includes me.  Once
again, I have phrased my objections poorly.  What I should have said
is that I do not understand what lsmeans are.  I have tried to read
the documentation on them in various SAS publications and also in some
books and I still can't make sense of them.

I have a strong suspicion that, for most users, the definition of
lsmeans is "the numbers that I get from SAS when I use an lsmeans
statement".  My suggestion for obtaining such numbers is to buy a SAS
license and use SAS to fit your models.

Those who have read Bill Venables unpublished paper, "Exegeses on
Linear Models" (just put the title into a search engine) will
recognize this situation.  Insightfull or whatever their name was at
the time had important customers (read "pharmaceutical companies") who
wanted them to change S-PLUS so that it created both Type I and Type
III sums of squares.  They consulted with statisticians who knew
S-PLUS well who told them "don't do that, it doesn't make sense".  Of
course the marketing folks won out and the company proceeded to ignore
this advice and implement (poorly) the Type X sums of squares where X
is the number that means "give me something that I will regard as a
marginal sum of squares for a factor in the presence of non-ignorable
interactions".  Apparently the fact that such a concept doesn't make
sense is not an adequate reason to avoid emulating SAS in producing
these numbers.

I should have phrased my objection as a deficiency in my background.
I don't know what lsmeans are and therefore cannot advise anyone on
how to calculate them.  If you or anyone else can explain to me - in
terms of the random variables Y and B and the model parameters - what
you wish to calculate then I can indicate how it could be calculated.
I think that lsmeans are, in some sense, elements of the mean of Y but
I don't know if they are conditional on a value of B or not.  If they
are means of Y then there must be a parameter vector beta specified.
This is where I begin to lose it.  Many people believe that they can
specify an incomplete parameter vector and evaluate something that
represents means.  In other words, many people believe that when there
are multiple factors in the fixed-effects formula they can evaluate
the mean response for levels of factor A in some way that is marginal
with respect to the levels of the other factors or numerical
covariates.  I can't understand how that can be done.

So I need to know what the values of the fixed-effects parameters,
beta, should be and whether you want to condition on a particular
value, B = b, or evaluate the mean of the marginal distribution of Y,
in the sense of integrating with respect to the distribution of B.  If
the latter, then you need to specify the parameters that determine the
distribution of B.  If the former, then I imagine that you wish to
evaluate the mean of the distribution of Y conditional on B at the
BLUPs.  As you know I prefer to use the term "conditional means" or,
for more general models like GLMMs, "conditional modes", instead of
BLUPs.  The values returned by ranef for a linear mixed model are the
conditional means of B given the observed value Y = y evaluated at the
parameter estimates.  To say that you want the conditional mean of Y
given B = the conditional mean of B given the observed y is a bit too
intricate for me to understand.  I really don't know how to interpret
such a concept.

>  My hope is that, after building code for a reasonable number of
> examples, the general principles will become sufficiently clear that a
> method with an appropriate interface can then be written (note use of
> the passive voice).  The hardest part I discovered for doing this with
> existing lme and lme4 objects is recalculating the random-effects design
> matrix appropriately when a new set of data (with different
> random-effects factor structure) is specified ...

You may find the sparse.model.matrix function in the Matrix package helpful.