[R-sig-ME] lme and prediction intervals

Jarrod Hadfield j.hadfield at ed.ac.uk
Tue Apr 6 18:59:29 CEST 2010


I have written a predict function for MCMCglmm, would anyone like to  
beta test it before I release it?

It computes confidence intervals (credible intervals on f(x) where x  
are the model parameters) and prediction intervals (posterior  
predictive intervals of future observations).

Residuals are always marginalised. Random terms can be marginalised or  
not. I have no issue with having a random effect in x, but that's my  

For some distributions the predictions can be made on the data scale,  
but I'm having trouble formulating a general solution for things like  
ZIPs and ordinal models. Any suggestions or suggested reading welcome.

I have tested it on a few data sets and it looks reasonable. However,  
I will be busy with field work soon and will not have time to test it  
thoroughly in the near future.



On 6 Apr 2010, at 03:48, D Chaws wrote:

> I knew bringing up lsmeans would get a response out of Doug :)
> I certainly believe that lsmeans can spit out garbage, like any other
> function (including
> R/lme/etc...), but how is the routine a "nonsensical construction"?
> It generates a list of
> values associated with a design matrix and set of model parameters.   
> That seems
> reasonable, as long as you request something reasonable.  It it also
> seems reasonable
> to want to know the SEs of such values.  Sure lsmeans has a bunch of
> crazy bells and
> whistles that may be questionable, and I realize the name "least
> squares means" doesn't
> mean much of anything I can understand, but at its base the routine
> appears to be the
> same as predict.lme.  What's so wrong with that?
> At the very beginning of this thread I listed exactly what I wanted  
> to calculate
> based on data in lme, a known model, set of parameters, and design
> matrix.  Others
> have been asking about this for years, with no solution.  Again,  
> take the model:
> fm1 <- lme(distance ~ age*Sex, random = ~ 1 + age | Subject, data =  
> Orthodont)
> Wouldn't it be interesting to see how the distance/age relationships  
> changes
> for Males and Females?  Ok, then:
> newdat <- expand.grid(age=c(8,10,12,14), Sex=c("Male","Female"))
> newdat$pred <- predict(fm1, newdat, level = 0)
> R# newdat
> age    Sex  pred
> 1   8   Male 22.62
> 2  10   Male 24.18
> 3  12   Male 25.75
> 4  14   Male 27.32
> 5   8 Female 21.21
> 6  10 Female 22.17
> 7  12 Female 23.13
> 8  14 Female 24.09
> As a professional statistician, what is the appropriate method to
> compute the SE of these predictions which accounts for the variance of
> the fixed and random effects?
> -- DC
> On Mon, Apr 5, 2010 at 10:40 AM, Douglas Bates <bates at stat.wisc.edu>  
> wrote:
>> On Sun, Apr 4, 2010 at 8:54 PM, Ben Bolker <bolker at ufl.edu> wrote:
>>> Douglas Bates wrote:
>>>> On Sat, Apr 3, 2010 at 11:12 PM, D Chaws  
>>>> <cat.dev.urandom at gmail.com> wrote:
>>>>> Ok, issue solved for the most straightforward random effects  
>>>>> cases.
>>>>> Not sure about nested random effects or more complex cases.
>>>> Assuming that you can make sense of lsmeans in such a case.  You  
>>>> may
>>>> notice that lsmeans are not provided in base and recommended R
>>>> packages.  That isn't an oversight.  Try to explain what lsmeans  
>>>> are
>>>> in terms of the probability model.
>>>> Anyway, if you are happy with it, then go for it.  I'll just give  
>>>> you
>>>> a warning from a professional statistician that they are a  
>>>> nonsensical
>>>> construction.
>>> lsmeans may not make sense in general (I don't really know, I have a
>>> somewhat weird background that mostly doesn't include SAS), but  
>>> there's
>>> nothing wrong with wanting predictions and standard errors of
>>> predictions, which be definable (?) if one can specify (a) whether a
>>> given random effect is set to zero or included at its conditional
>>> mean/mode value (or, for a simulation, chosen from a normal  
>>> distribution
>>> with the appropriate variance-covariance structure (b) whether  
>>> random
>>> effects not included in the prediction (and the residual error) are
>>> included in the SE or not.  I agree that specifying all this is  
>>> not as
>>> easy as specifying "level", but can't one in principle do this by
>>> specifying which random effects are in/out of the prediction or  
>>> the SE?
>> Your first sentence is absolutely right - those whose backgrounds do
>> not include an introduction to SASspeak have difficulty in
>> understanding what lsmeans are, and that group includes me.  Once
>> again, I have phrased my objections poorly.  What I should have said
>> is that I do not understand what lsmeans are.  I have tried to read
>> the documentation on them in various SAS publications and also in  
>> some
>> books and I still can't make sense of them.
>> I have a strong suspicion that, for most users, the definition of
>> lsmeans is "the numbers that I get from SAS when I use an lsmeans
>> statement".  My suggestion for obtaining such numbers is to buy a SAS
>> license and use SAS to fit your models.
>> Those who have read Bill Venables unpublished paper, "Exegeses on
>> Linear Models" (just put the title into a search engine) will
>> recognize this situation.  Insightfull or whatever their name was at
>> the time had important customers (read "pharmaceutical companies")  
>> who
>> wanted them to change S-PLUS so that it created both Type I and Type
>> III sums of squares.  They consulted with statisticians who knew
>> S-PLUS well who told them "don't do that, it doesn't make sense".  Of
>> course the marketing folks won out and the company proceeded to  
>> ignore
>> this advice and implement (poorly) the Type X sums of squares where X
>> is the number that means "give me something that I will regard as a
>> marginal sum of squares for a factor in the presence of non-ignorable
>> interactions".  Apparently the fact that such a concept doesn't make
>> sense is not an adequate reason to avoid emulating SAS in producing
>> these numbers.
>> I should have phrased my objection as a deficiency in my background.
>> I don't know what lsmeans are and therefore cannot advise anyone on
>> how to calculate them.  If you or anyone else can explain to me - in
>> terms of the random variables Y and B and the model parameters - what
>> you wish to calculate then I can indicate how it could be calculated.
>> I think that lsmeans are, in some sense, elements of the mean of Y  
>> but
>> I don't know if they are conditional on a value of B or not.  If they
>> are means of Y then there must be a parameter vector beta specified.
>> This is where I begin to lose it.  Many people believe that they can
>> specify an incomplete parameter vector and evaluate something that
>> represents means.  In other words, many people believe that when  
>> there
>> are multiple factors in the fixed-effects formula they can evaluate
>> the mean response for levels of factor A in some way that is marginal
>> with respect to the levels of the other factors or numerical
>> covariates.  I can't understand how that can be done.
>> So I need to know what the values of the fixed-effects parameters,
>> beta, should be and whether you want to condition on a particular
>> value, B = b, or evaluate the mean of the marginal distribution of Y,
>> in the sense of integrating with respect to the distribution of B.   
>> If
>> the latter, then you need to specify the parameters that determine  
>> the
>> distribution of B.  If the former, then I imagine that you wish to
>> evaluate the mean of the distribution of Y conditional on B at the
>> BLUPs.  As you know I prefer to use the term "conditional means" or,
>> for more general models like GLMMs, "conditional modes", instead of
>> BLUPs.  The values returned by ranef for a linear mixed model are the
>> conditional means of B given the observed value Y = y evaluated at  
>> the
>> parameter estimates.  To say that you want the conditional mean of Y
>> given B = the conditional mean of B given the observed y is a bit too
>> intricate for me to understand.  I really don't know how to interpret
>> such a concept.
>>> My hope is that, after building code for a reasonable number of
>>> examples, the general principles will become sufficiently clear  
>>> that a
>>> method with an appropriate interface can then be written (note use  
>>> of
>>> the passive voice).  The hardest part I discovered for doing this  
>>> with
>>> existing lme and lme4 objects is recalculating the random-effects  
>>> design
>>> matrix appropriately when a new set of data (with different
>>> random-effects factor structure) is specified ...
>> You may find the sparse.model.matrix function in the Matrix package  
>> helpful.
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

More information about the R-sig-mixed-models mailing list