[R-sig-ME] lme and prediction intervals
Jarrod Hadfield
j.hadfield at ed.ac.uk
Tue Apr 6 18:59:29 CEST 2010
Hi,
I have written a predict function for MCMCglmm, would anyone like to
beta test it before I release it?
It computes confidence intervals (credible intervals on f(x) where x
are the model parameters) and prediction intervals (posterior
predictive intervals of future observations).
Residuals are always marginalised. Random terms can be marginalised or
not. I have no issue with having a random effect in x, but that's my
persuasion.
For some distributions the predictions can be made on the data scale,
but I'm having trouble formulating a general solution for things like
ZIPs and ordinal models. Any suggestions or suggested reading welcome.
I have tested it on a few data sets and it looks reasonable. However,
I will be busy with field work soon and will not have time to test it
thoroughly in the near future.
Cheers,
Jarrod
On 6 Apr 2010, at 03:48, D Chaws wrote:
> I knew bringing up lsmeans would get a response out of Doug :)
>
> I certainly believe that lsmeans can spit out garbage, like any other
> function (including
> R/lme/etc...), but how is the routine a "nonsensical construction"?
> It generates a list of
> values associated with a design matrix and set of model parameters.
> That seems
> reasonable, as long as you request something reasonable. It it also
> seems reasonable
> to want to know the SEs of such values. Sure lsmeans has a bunch of
> crazy bells and
> whistles that may be questionable, and I realize the name "least
> squares means" doesn't
> mean much of anything I can understand, but at its base the routine
> appears to be the
> same as predict.lme. What's so wrong with that?
>
> At the very beginning of this thread I listed exactly what I wanted
> to calculate
> based on data in lme, a known model, set of parameters, and design
> matrix. Others
> have been asking about this for years, with no solution. Again,
> take the model:
>
> fm1 <- lme(distance ~ age*Sex, random = ~ 1 + age | Subject, data =
> Orthodont)
>
> Wouldn't it be interesting to see how the distance/age relationships
> changes
> for Males and Females? Ok, then:
>
> newdat <- expand.grid(age=c(8,10,12,14), Sex=c("Male","Female"))
> newdat$pred <- predict(fm1, newdat, level = 0)
> R# newdat
> age Sex pred
> 1 8 Male 22.62
> 2 10 Male 24.18
> 3 12 Male 25.75
> 4 14 Male 27.32
> 5 8 Female 21.21
> 6 10 Female 22.17
> 7 12 Female 23.13
> 8 14 Female 24.09
>
> As a professional statistician, what is the appropriate method to
> compute the SE of these predictions which accounts for the variance of
> the fixed and random effects?
>
> -- DC
>
> On Mon, Apr 5, 2010 at 10:40 AM, Douglas Bates <bates at stat.wisc.edu>
> wrote:
>> On Sun, Apr 4, 2010 at 8:54 PM, Ben Bolker <bolker at ufl.edu> wrote:
>>> Douglas Bates wrote:
>>>> On Sat, Apr 3, 2010 at 11:12 PM, D Chaws
>>>> <cat.dev.urandom at gmail.com> wrote:
>>>>> Ok, issue solved for the most straightforward random effects
>>>>> cases.
>>>>> Not sure about nested random effects or more complex cases.
>>>>
>>>> Assuming that you can make sense of lsmeans in such a case. You
>>>> may
>>>> notice that lsmeans are not provided in base and recommended R
>>>> packages. That isn't an oversight. Try to explain what lsmeans
>>>> are
>>>> in terms of the probability model.
>>>>
>>>> Anyway, if you are happy with it, then go for it. I'll just give
>>>> you
>>>> a warning from a professional statistician that they are a
>>>> nonsensical
>>>> construction.
>>>
>>> lsmeans may not make sense in general (I don't really know, I have a
>>> somewhat weird background that mostly doesn't include SAS), but
>>> there's
>>> nothing wrong with wanting predictions and standard errors of
>>> predictions, which be definable (?) if one can specify (a) whether a
>>> given random effect is set to zero or included at its conditional
>>> mean/mode value (or, for a simulation, chosen from a normal
>>> distribution
>>> with the appropriate variance-covariance structure (b) whether
>>> random
>>> effects not included in the prediction (and the residual error) are
>>> included in the SE or not. I agree that specifying all this is
>>> not as
>>> easy as specifying "level", but can't one in principle do this by
>>> specifying which random effects are in/out of the prediction or
>>> the SE?
>>
>> Your first sentence is absolutely right - those whose backgrounds do
>> not include an introduction to SASspeak have difficulty in
>> understanding what lsmeans are, and that group includes me. Once
>> again, I have phrased my objections poorly. What I should have said
>> is that I do not understand what lsmeans are. I have tried to read
>> the documentation on them in various SAS publications and also in
>> some
>> books and I still can't make sense of them.
>>
>> I have a strong suspicion that, for most users, the definition of
>> lsmeans is "the numbers that I get from SAS when I use an lsmeans
>> statement". My suggestion for obtaining such numbers is to buy a SAS
>> license and use SAS to fit your models.
>>
>> Those who have read Bill Venables unpublished paper, "Exegeses on
>> Linear Models" (just put the title into a search engine) will
>> recognize this situation. Insightfull or whatever their name was at
>> the time had important customers (read "pharmaceutical companies")
>> who
>> wanted them to change S-PLUS so that it created both Type I and Type
>> III sums of squares. They consulted with statisticians who knew
>> S-PLUS well who told them "don't do that, it doesn't make sense". Of
>> course the marketing folks won out and the company proceeded to
>> ignore
>> this advice and implement (poorly) the Type X sums of squares where X
>> is the number that means "give me something that I will regard as a
>> marginal sum of squares for a factor in the presence of non-ignorable
>> interactions". Apparently the fact that such a concept doesn't make
>> sense is not an adequate reason to avoid emulating SAS in producing
>> these numbers.
>>
>> I should have phrased my objection as a deficiency in my background.
>> I don't know what lsmeans are and therefore cannot advise anyone on
>> how to calculate them. If you or anyone else can explain to me - in
>> terms of the random variables Y and B and the model parameters - what
>> you wish to calculate then I can indicate how it could be calculated.
>> I think that lsmeans are, in some sense, elements of the mean of Y
>> but
>> I don't know if they are conditional on a value of B or not. If they
>> are means of Y then there must be a parameter vector beta specified.
>> This is where I begin to lose it. Many people believe that they can
>> specify an incomplete parameter vector and evaluate something that
>> represents means. In other words, many people believe that when
>> there
>> are multiple factors in the fixed-effects formula they can evaluate
>> the mean response for levels of factor A in some way that is marginal
>> with respect to the levels of the other factors or numerical
>> covariates. I can't understand how that can be done.
>>
>> So I need to know what the values of the fixed-effects parameters,
>> beta, should be and whether you want to condition on a particular
>> value, B = b, or evaluate the mean of the marginal distribution of Y,
>> in the sense of integrating with respect to the distribution of B.
>> If
>> the latter, then you need to specify the parameters that determine
>> the
>> distribution of B. If the former, then I imagine that you wish to
>> evaluate the mean of the distribution of Y conditional on B at the
>> BLUPs. As you know I prefer to use the term "conditional means" or,
>> for more general models like GLMMs, "conditional modes", instead of
>> BLUPs. The values returned by ranef for a linear mixed model are the
>> conditional means of B given the observed value Y = y evaluated at
>> the
>> parameter estimates. To say that you want the conditional mean of Y
>> given B = the conditional mean of B given the observed y is a bit too
>> intricate for me to understand. I really don't know how to interpret
>> such a concept.
>>
>>> My hope is that, after building code for a reasonable number of
>>> examples, the general principles will become sufficiently clear
>>> that a
>>> method with an appropriate interface can then be written (note use
>>> of
>>> the passive voice). The hardest part I discovered for doing this
>>> with
>>> existing lme and lme4 objects is recalculating the random-effects
>>> design
>>> matrix appropriately when a new set of data (with different
>>> random-effects factor structure) is specified ...
>>
>> You may find the sparse.model.matrix function in the Matrix package
>> helpful.
>>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list