# [R-sig-ME] lme and prediction intervals

Douglas Bates bates at stat.wisc.edu
Wed Apr 7 15:53:56 CEST 2010

Thanks for the discussion on this point.  I think I am beginning to
understand what is requested in general although I am not sure I am
quite up to John's level of detail.

I have a formula in mind, based on what John describes as "teasing the
variance-covariance structure apart, then putting it back together",
but the justification for the calculation is still somewhat vague.

The left Choelseky factor for the penalized least squares problem to
determine the conditional estimates of the fixed effects and the
conditional modes of the random effects (also the conditional means,
in the case of a linear model) is of the form

L        0
RZX'  RX'

where L, RZX and RX are a sparse Cholesky decomposition (L), a dense q
by p matrix (RZX) and an upper triangular p by p matrix (RX), all
stored in the mer object returned by lmer.  The variance-covariance
matrix of the estimators of the fixed-effects parameters, conditional
on a value of the random effects vector is (RX'RX)^{-1} =
RX^{-1}RX^{-1}'.  (This is one of the few cases when it is appropriate
to evaluate the inverse of a matrix, because RX is triangular.).
When multiplied by s^2 this is the matrix used to evaluate standard
errors and correlations of the fixed-effects estimators.

I think what we are seeking is the marginal variance-covariance matrix
of the parameter estimators (marginal with respect to the random
effects random variable, B), which would have the form of the inverse
of the crossproduct of  a (q+p) by p matrix composed of the vertical
concatenation of - L^{-1}RZX RX^{-1} and RX^{-1}.  (Note: You do *not*
want to calculate the first term by inverting L, use solve(L, RZX,
system = "L"") - don't even think about using solve(L) - don't!,
don't!, don't! - have I made myself clear? - don't do that (and we all
know that someone will do exactly that for a very large L and then
send out messages about "R is SOOOOO SLOOOOW!!!! :-) )

Anyway, that is what I think should be calculated - something like

chol2inv(chol(tcrossprod(solve(RX, cbind(t(solve(L, RZX, system =
"L")), diag(ncol(RZX)))))

and then multiply by s^2

I haven't debugged that expression so it may some fixes.
Unfortunately there are other tasks to which I must first attend.

On Wed, Apr 7, 2010 at 12:47 AM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
> There is no uncertainty about SEs of predictions, once the
> prediction problem has been properly delimited.  As often,
> the problem has to be sufficiently well defined that "integrate
> random and fixed effects variance components" (not the
> language that I would use) has, in any specific instance,
> a well-defined meaning. Specifically, note that the SEs change
> depending on the population for which predictions are made.
> Use of MCMC does not cause this issue to go away, though
> it may provide a way to address it once the prediction has
> been properly delimited.
>
> The DAAG package has the dataset ant111b.  One may fit
> the model:
>
> ant111b.lme <- lme(fixed=harvwt ~ 1, random = ~1 | site,
>                   data=ant111b)
>
> One might make predictions:
> 1) for a new parcel at an existing site
> 2) for a new parcel at a new site
> 3) for the average of two parcels at a new site
>
> These will all lead to different SEs.  1 < 3 < 2
>
> In these instances it is fairly easy, if one believes
> the model, to calculate the different SEs.
>
> More generally, it is necessary to tease the
> variance-covariance structure apart, then putting
> it back together in a manner that reflects the
> desired generalization.  At least for various
> special cases, this can be automated, based on
> conventions for describing the intended
> generalization, and there is software that does
> this.
>
> This is an issue for the use of AIC to compare mixed
> effects models.  There is an implicit assumption that
> generalizations are for individual units at the population
> level.  This does not necessarily reflect the intended
> use of the model results.
>
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm
>
> On 07/04/2010, at 2:17 PM, D Chaws wrote:
>
>> Very interesting.  Is it really the state of the field that there is uncertainty
>> about the how to integrate random and fixed effects variance components?
>> That seems like quite a hole.  Does that mean the best approach is probably
>> an empirical one then, through something like MCMC?
>>
>> -- DC
>>
>> On Tue, Apr 6, 2010 at 5:00 PM, David Hsu <dhsu2 at uw.edu> wrote:
>>> D Chaws wrote, Tue Apr 6 04:48:33 CEST 2010:
>>>
>>>> As a professional statistician, what is the appropriate method to
>>>> compute the SE of these predictions which accounts for the variance of
>>>> the fixed and random effects?
>>>
>>> I asked the same question myself about prediction last month.  I think
>>> Doug's response to me was, in a nutshell, how can you statistically
>>> *define* the variance contribution of the random effects?  (Doug,
>>> please correct me if I am wrong or over-simplifying).  After thinking
>>> about it for awhile, I indeed can *not* see what the variance of the
>>> random effect means from a classical perspective.
>>>
>>> Nobody seems to have conceptual difficulties with the errors from the
>>> fixed effects, so I'll sketch out a toy example to show how I've
>>> thought about it in terms of the theoretical statistics that I dimly
>>> remember (again, corrections from anyone are most welcome).
>>>
>>> Writing it out, a simple linear model is Y ~ N( A + BX, s^2).  Our
>>> vector of parameters in the first distribution is (theta_1) = (A, B,
>>> s).  From a classical perspective, A and B are fixed parameters, so
>>> that when we estimate A, B, we are maximizing the likelihood L:
>>>
>>> L ( theta_1 | X ) = P ( Y=y | theta_1 ) = 1/sqrt (2 pi sigma^2) exp -
>>> (Y - A - BX )^2 / sigma^2
>>>
>>> where (theta_1) is set by definite numbers, (theta_1) = (A, B, s).
>>>
>>> Now, thinking about the random effects model, we're saying that A is a
>>> random effect, so it's not just a set of numbers.  Therefore A is a
>>> random variable, distributed by A ~ N(mu, sigma^2).  Our second vector
>>> of parameters for this distribution is (theta_2) = (mu, sigma).  On
>>> the face of it, this doesn't necessarily seem catastrophic, since we
>>> just want the marginal probability distribution function for Y based
>>> on the joint probability of Y, A,
>>>
>>> P (Y=y | theta_1) = \int^\infty_infty P ( Y=y, A=a | theta_1, theta_2) da
>>>
>>> However, this is where I get stuck in the classical perspective.  We
>>> need to state *something* about the quantity P ( Y=y, A=a | theta_1,
>>> theta_2) inside the integral, but it's not clear to me what we can say
>>> about the relationship of A and Y, and presumably theta_2 is derived
>>> from the observations of Y.  I think Doug's point is that if we have
>>> the probability P ( Y=y, A=a | theta_1, theta_2), you can't just use
>>> conditioning to break it down such as:
>>>
>>> P ( Y=y, A=a | theta_1, theta_2) *not* equal to P ( Y=y | theta_1)
>>> times P ( A=a | theta_2)
>>>
>>> because somehow you got (or would like to get) theta_2 based on the Y
>>> (data) as well.  So they A and Y are not independent.
>>>
>>> In the Bayesian perspective, I guess you'd say that the posterior =
>>> prior x likelihood, so that:
>>>
>>> P ( theta_1, theta_2 | A, Y)  is proportional to P ( theta_1) times P
>>> ( theta_2 | A, Y, theta_1)
>>>
>>> and then insert some belief about the the prior distribution of
>>> theta_1, and do this iteratively for each of the parameter values
>>> until you get convergence.
>>>
>>> Anyway, this is how I've thought about it.  Corrections welcome.
>>>
>>> David
>>>
>>>
>>> --
>>> David Hsu
>>> PhD Candidate, Urban Planning
>>> University of Washington
>>> (e-mail) dhsu2 at uw.edu
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>