[R-sig-ME] lme and prediction intervals

Tue Apr 6 04:48:33 CEST 2010

I knew bringing up lsmeans would get a response out of Doug :)

I certainly believe that lsmeans can spit out garbage, like any other
function (including
R/lme/etc...), but how is the routine a "nonsensical construction"?
It generates a list of
values associated with a design matrix and set of model parameters.  That seems
reasonable, as long as you request something reasonable.  It it also
seems reasonable
to want to know the SEs of such values.  Sure lsmeans has a bunch of
crazy bells and
whistles that may be questionable, and I realize the name "least
squares means" doesn't
mean much of anything I can understand, but at its base the routine
appears to be the
same as predict.lme.  What's so wrong with that?

At the very beginning of this thread I listed exactly what I wanted to calculate
based on data in lme, a known model, set of parameters, and design
matrix.  Others
have been asking about this for years, with no solution.  Again, take the model:

fm1 <- lme(distance ~ age*Sex, random = ~ 1 + age | Subject, data = Orthodont)

Wouldn't it be interesting to see how the distance/age relationships changes
for Males and Females?  Ok, then:

newdat <- expand.grid(age=c(8,10,12,14), Sex=c("Male","Female"))
newdat$pred <- predict(fm1, newdat, level = 0)
R# newdat
 age    Sex  pred
1   8   Male 22.62
2  10   Male 24.18
3  12   Male 25.75
4  14   Male 27.32
5   8 Female 21.21
6  10 Female 22.17
7  12 Female 23.13
8  14 Female 24.09

As a professional statistician, what is the appropriate method to
compute the SE of these predictions which accounts for the variance of
the fixed and random effects?

-- DC

On Mon, Apr 5, 2010 at 10:40 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
> On Sun, Apr 4, 2010 at 8:54 PM, Ben Bolker <bolker at ufl.edu> wrote:
>> Douglas Bates wrote:
>>> On Sat, Apr 3, 2010 at 11:12 PM, D Chaws <cat.dev.urandom at gmail.com> wrote:
>>>> Ok, issue solved for the most straightforward random effects cases.
>>>> Not sure about nested random effects or more complex cases.
>>>
>>> Assuming that you can make sense of lsmeans in such a case.  You may
>>> notice that lsmeans are not provided in base and recommended R
>>> packages.  That isn't an oversight.  Try to explain what lsmeans are
>>> in terms of the probability model.
>>>
>>> Anyway, if you are happy with it, then go for it.  I'll just give you
>>> a warning from a professional statistician that they are a nonsensical
>>> construction.
>>
>>  lsmeans may not make sense in general (I don't really know, I have a
>> somewhat weird background that mostly doesn't include SAS), but there's
>> nothing wrong with wanting predictions and standard errors of
>> predictions, which be definable (?) if one can specify (a) whether a
>> given random effect is set to zero or included at its conditional
>> mean/mode value (or, for a simulation, chosen from a normal distribution
>> with the appropriate variance-covariance structure (b) whether random
>> effects not included in the prediction (and the residual error) are
>> included in the SE or not.  I agree that specifying all this is not as
>> easy as specifying "level", but can't one in principle do this by
>> specifying which random effects are in/out of the prediction or the SE?
>
> Your first sentence is absolutely right - those whose backgrounds do
> not include an introduction to SASspeak have difficulty in
> understanding what lsmeans are, and that group includes me.  Once
> again, I have phrased my objections poorly.  What I should have said
> is that I do not understand what lsmeans are.  I have tried to read
> the documentation on them in various SAS publications and also in some
> books and I still can't make sense of them.
>
> I have a strong suspicion that, for most users, the definition of
> lsmeans is "the numbers that I get from SAS when I use an lsmeans
> statement".  My suggestion for obtaining such numbers is to buy a SAS
> license and use SAS to fit your models.
>
> Those who have read Bill Venables unpublished paper, "Exegeses on
> Linear Models" (just put the title into a search engine) will
> recognize this situation.  Insightfull or whatever their name was at
> the time had important customers (read "pharmaceutical companies") who
> wanted them to change S-PLUS so that it created both Type I and Type
> III sums of squares.  They consulted with statisticians who knew
> S-PLUS well who told them "don't do that, it doesn't make sense".  Of
> course the marketing folks won out and the company proceeded to ignore
> this advice and implement (poorly) the Type X sums of squares where X
> is the number that means "give me something that I will regard as a
> marginal sum of squares for a factor in the presence of non-ignorable
> interactions".  Apparently the fact that such a concept doesn't make
> sense is not an adequate reason to avoid emulating SAS in producing
> these numbers.
>
> I should have phrased my objection as a deficiency in my background.
> I don't know what lsmeans are and therefore cannot advise anyone on
> how to calculate them.  If you or anyone else can explain to me - in
> terms of the random variables Y and B and the model parameters - what
> you wish to calculate then I can indicate how it could be calculated.
> I think that lsmeans are, in some sense, elements of the mean of Y but
> I don't know if they are conditional on a value of B or not.  If they
> are means of Y then there must be a parameter vector beta specified.
> This is where I begin to lose it.  Many people believe that they can
> specify an incomplete parameter vector and evaluate something that
> represents means.  In other words, many people believe that when there
> are multiple factors in the fixed-effects formula they can evaluate
> the mean response for levels of factor A in some way that is marginal
> with respect to the levels of the other factors or numerical
> covariates.  I can't understand how that can be done.
>
> So I need to know what the values of the fixed-effects parameters,
> beta, should be and whether you want to condition on a particular
> value, B = b, or evaluate the mean of the marginal distribution of Y,
> in the sense of integrating with respect to the distribution of B.  If
> the latter, then you need to specify the parameters that determine the
> distribution of B.  If the former, then I imagine that you wish to
> evaluate the mean of the distribution of Y conditional on B at the
> BLUPs.  As you know I prefer to use the term "conditional means" or,
> for more general models like GLMMs, "conditional modes", instead of
> BLUPs.  The values returned by ranef for a linear mixed model are the
> conditional means of B given the observed value Y = y evaluated at the
> parameter estimates.  To say that you want the conditional mean of Y
> given B = the conditional mean of B given the observed y is a bit too
> intricate for me to understand.  I really don't know how to interpret
> such a concept.
>
>>  My hope is that, after building code for a reasonable number of
>> examples, the general principles will become sufficiently clear that a
>> method with an appropriate interface can then be written (note use of
>> the passive voice).  The hardest part I discovered for doing this with
>> existing lme and lme4 objects is recalculating the random-effects design
>> matrix appropriately when a new set of data (with different
>> random-effects factor structure) is specified ...
>
> You may find the sparse.model.matrix function in the Matrix package helpful.
>