[R-sig-ME] lme and prediction intervals

Wed Apr 7 16:15:17 CEST 2010

On Wed, Apr 7, 2010 at 8:53 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
> Thanks for the discussion on this point.  I think I am beginning to
> understand what is requested in general although I am not sure I am
> quite up to John's level of detail.
>
> I have a formula in mind, based on what John describes as "teasing the
> variance-covariance structure apart, then putting it back together",
> but the justification for the calculation is still somewhat vague.
>
> The left Choelseky factor for the penalized least squares problem to
> determine the conditional estimates of the fixed effects and the
> conditional modes of the random effects (also the conditional means,
> in the case of a linear model) is of the form
>
> L        0
> RZX'  RX'
>
> where L, RZX and RX are a sparse Cholesky decomposition (L), a dense q
> by p matrix (RZX) and an upper triangular p by p matrix (RX), all
> stored in the mer object returned by lmer.  The variance-covariance
> matrix of the estimators of the fixed-effects parameters, conditional
> on a value of the random effects vector is (RX'RX)^{-1} =
> RX^{-1}RX^{-1}'.  (This is one of the few cases when it is appropriate
> to evaluate the inverse of a matrix, because RX is triangular.).
> When multiplied by s^2 this is the matrix used to evaluate standard
> errors and correlations of the fixed-effects estimators.
>
> I think what we are seeking is the marginal variance-covariance matrix
> of the parameter estimators (marginal with respect to the random
> effects random variable, B), which would have the form of the inverse
> of the crossproduct of  a (q+p) by p matrix composed of the vertical
> concatenation of - L^{-1}RZX RX^{-1} and RX^{-1}.  (Note: You do *not*
> want to calculate the first term by inverting L, use solve(L, RZX,
> system = "L"") - don't even think about using solve(L) - don't!,
> don't!, don't! - have I made myself clear? - don't do that (and we all
> know that someone will do exactly that for a very large L and then
> send out messages about "R is SOOOOO SLOOOOW!!!! :-) )

I know that someone is going to be tempted to try it and then find
that it isn't that slow.  I should have been more precise.  If you
have a nested sequence of grouping factors for the random effects
(which includes the case of a single grouping factor) then the matrix
L has a special property that they pattern of non-zeros in the inverse
is the same as the pattern in L itself.  You still don't want to
invert large matrices to solve a linear system when you can do that
directly but evaluating the inverse will not explode in complexity on
you.  However, if you try it with non-nested grouping factors it will
explode in complexity.
> Anyway, that is what I think should be calculated - something like
>
> chol2inv(chol(tcrossprod(solve(RX, cbind(t(solve(L, RZX, system =
> "L")), diag(ncol(RZX)))))
>
> and then multiply by s^2
>
> I haven't debugged that expression so it may some fixes.
> Unfortunately there are other tasks to which I must first attend.
>
>
>
>
>
>
> On Wed, Apr 7, 2010 at 12:47 AM, John Maindonald
> <john.maindonald at anu.edu.au> wrote:
>> There is no uncertainty about SEs of predictions, once the
>> prediction problem has been properly delimited.  As often,
>> the problem has to be sufficiently well defined that "integrate
>> random and fixed effects variance components" (not the
>> language that I would use) has, in any specific instance,
>> a well-defined meaning. Specifically, note that the SEs change
>> depending on the population for which predictions are made.
>> Use of MCMC does not cause this issue to go away, though
>> it may provide a way to address it once the prediction has
>> been properly delimited.
>>
>> The DAAG package has the dataset ant111b.  One may fit
>> the model:
>>
>> ant111b.lme <- lme(fixed=harvwt ~ 1, random = ~1 | site,
>>                   data=ant111b)
>>
>> One might make predictions:
>> 1) for a new parcel at an existing site
>> 2) for a new parcel at a new site
>> 3) for the average of two parcels at a new site
>>
>> These will all lead to different SEs.  1 < 3 < 2
>>
>> In these instances it is fairly easy, if one believes
>> the model, to calculate the different SEs.
>>
>> More generally, it is necessary to tease the
>> variance-covariance structure apart, then putting
>> it back together in a manner that reflects the
>> desired generalization.  At least for various
>> special cases, this can be automated, based on
>> conventions for describing the intended
>> generalization, and there is software that does
>> this.
>>
>> This is an issue for the use of AIC to compare mixed
>> effects models.  There is an implicit assumption that
>> generalizations are for individual units at the population
>> level.  This does not necessarily reflect the intended
>> use of the model results.
>>
>> John Maindonald             email: john.maindonald at anu.edu.au
>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>> Centre for Mathematics & Its Applications, Room 1194,
>> John Dedman Mathematical Sciences Building (Building 27)
>> Australian National University, Canberra ACT 0200.
>> http://www.maths.anu.edu.au/~johnm
>>
>> On 07/04/2010, at 2:17 PM, D Chaws wrote:
>>
>>> Very interesting.  Is it really the state of the field that there is uncertainty
>>> about the how to integrate random and fixed effects variance components?
>>> That seems like quite a hole.  Does that mean the best approach is probably
>>> an empirical one then, through something like MCMC?
>>>
>>> -- DC
>>>
>>> On Tue, Apr 6, 2010 at 5:00 PM, David Hsu <dhsu2 at uw.edu> wrote:
>>>> D Chaws wrote, Tue Apr 6 04:48:33 CEST 2010:
>>>>
>>>>> As a professional statistician, what is the appropriate method to
>>>>> compute the SE of these predictions which accounts for the variance of
>>>>> the fixed and random effects?
>>>>
>>>> I asked the same question myself about prediction last month.  I think
>>>> Doug's response to me was, in a nutshell, how can you statistically
>>>> *define* the variance contribution of the random effects?  (Doug,
>>>> please correct me if I am wrong or over-simplifying).  After thinking
>>>> about it for awhile, I indeed can *not* see what the variance of the
>>>> random effect means from a classical perspective.
>>>>
>>>> Nobody seems to have conceptual difficulties with the errors from the
>>>> fixed effects, so I'll sketch out a toy example to show how I've
>>>> thought about it in terms of the theoretical statistics that I dimly
>>>> remember (again, corrections from anyone are most welcome).
>>>>
>>>> Writing it out, a simple linear model is Y ~ N( A + BX, s^2).  Our
>>>> vector of parameters in the first distribution is (theta_1) = (A, B,
>>>> s).  From a classical perspective, A and B are fixed parameters, so
>>>> that when we estimate A, B, we are maximizing the likelihood L:
>>>>
>>>> L ( theta_1 | X ) = P ( Y=y | theta_1 ) = 1/sqrt (2 pi sigma^2) exp -
>>>> (Y - A - BX )^2 / sigma^2
>>>>
>>>> where (theta_1) is set by definite numbers, (theta_1) = (A, B, s).
>>>>
>>>> Now, thinking about the random effects model, we're saying that A is a
>>>> random effect, so it's not just a set of numbers.  Therefore A is a
>>>> random variable, distributed by A ~ N(mu, sigma^2).  Our second vector
>>>> of parameters for this distribution is (theta_2) = (mu, sigma).  On
>>>> the face of it, this doesn't necessarily seem catastrophic, since we
>>>> just want the marginal probability distribution function for Y based
>>>> on the joint probability of Y, A,
>>>>
>>>> P (Y=y | theta_1) = \int^\infty_infty P ( Y=y, A=a | theta_1, theta_2) da
>>>>
>>>> However, this is where I get stuck in the classical perspective.  We
>>>> need to state *something* about the quantity P ( Y=y, A=a | theta_1,
>>>> theta_2) inside the integral, but it's not clear to me what we can say
>>>> about the relationship of A and Y, and presumably theta_2 is derived
>>>> from the observations of Y.  I think Doug's point is that if we have
>>>> the probability P ( Y=y, A=a | theta_1, theta_2), you can't just use
>>>> conditioning to break it down such as:
>>>>
>>>> P ( Y=y, A=a | theta_1, theta_2) *not* equal to P ( Y=y | theta_1)
>>>> times P ( A=a | theta_2)
>>>>
>>>> because somehow you got (or would like to get) theta_2 based on the Y
>>>> (data) as well.  So they A and Y are not independent.
>>>>
>>>> In the Bayesian perspective, I guess you'd say that the posterior =
>>>> prior x likelihood, so that:
>>>>
>>>> P ( theta_1, theta_2 | A, Y)  is proportional to P ( theta_1) times P
>>>> ( theta_2 | A, Y, theta_1)
>>>>
>>>> and then insert some belief about the the prior distribution of
>>>> theta_1, and do this iteratively for each of the parameter values
>>>> until you get convergence.
>>>>
>>>> Anyway, this is how I've thought about it.  Corrections welcome.
>>>>
>>>> David
>>>>
>>>>
>>>> --
>>>> David Hsu
>>>> PhD Candidate, Urban Planning
>>>> University of Washington
>>>> (e-mail) dhsu2 at uw.edu
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>