[R-sig-ME] [R] lmer - BLUP prediction intervals

Fri Feb 8 15:32:53 CET 2013

Dear Jarrod

Thank you for the practical advice and thank you all for the
interesting comments.

I was able to calculate Cinv. In my model,

m1 <- lmer(Difference ~ 1+ (1|Item) + (1|Examiner), data=englisho.data)

this matrix is of size 47x47, which is equal to the number of Items
(ni=24) + number of Examiners (ne=22) + number of fixed effects
(nb=1). Or p (1)+q (46) in the lme4 manual. Cinv is not symmetric and
I am wondering how to interpret the position of covariances in the
matrix. I thought the row/columns would be: [overall intercept,
uitem1...uitem24, uexaminer1...uexaminer22]. But it seems not? And is
the diagonal for the random effects for the examiners, for example,
comparable with the results of

attr(ranef(m1, postVar=T)[["Examiner"]], "postVar")[1,1,]

?

Again, thank you for your help.

All the best,
Daniel

On Thu, Feb 7, 2013 at 11:01 PM, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
> Dear Daniel,
>
> To give a practical, rather than philosophical, solution to your problem you
> could perhaps try this....
>
> W<-cBind(X,Z)
>
> Cinv =
> solve(t(W)%*%W/Vr+bdiag(Diagonal(nb)*0,Diagonal(ne)/Ve,Diagonal(ni)/Vi))
>
>
> where Vr is the residual variance, nb is the number of fixed effects, ne is
> the number of examiners and Ve the examiner variance estimate and ni is the
> number of items and Vi the item variance estimate.
>
> Cinv is the (co)variance matrix of the fixed and random effects (stacked on
> top of each other).  If you take the sub-matrix of Cinv that pertains to the
> effects in your prediction, then summing the elements of that sub-matrix
> will give you something that is close to the prediction variance. In general
> the sub-matrix will not be diagonal (i.e. there are covariances between the
> random effects and the things we are *calling* fixed effects). Vr, Ve and Vn
> are assumed known. If you want to include their uncertainty in the
> prediction interval you will probably have to wear the B-badge more visibly
> on your sleeve I'm afraid.
>
> Cheers,
>
> Jarrod
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Quoting "Doran, Harold" <HDoran at air.org> on Thu, 7 Feb 2013 10:02:49 -0500:
>
>> Doug:
>>
>> Sorry if this is too simple a question. Call \hat{\beta}} the estimate of
>> a fixed parameter and \hat{b_i}  the BLUP for unit i.
>>
>> If the following exists
>>
>> \hat{\theta} = \hat{\beta}} + \hat{b_i}
>>
>> Then, does var(\hat{\theta}) also exist?
>>
>>
>> From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf Of Douglas
>> Bates
>> Sent: Thursday, February 07, 2013 9:57 AM
>> To: Doran, Harold
>> Cc: Andrew Robinson; dcarov at gmail.com; r-sig-mixed-models at r-project.org;
>> Ben Bolker
>> Subject: Re: [R-sig-ME] [R] lmer - BLUP prediction intervals
>>
>> On Wed, Feb 6, 2013 at 9:36 AM, Doran, Harold
>> <HDoran at air.org<mailto:HDoran at air.org>> wrote:
>> Andrew
>>
>> Ignoring the important theoretical question for just a moment on whether
>> it is sensible to do this, there is a covariance term between the BLUPs and
>> the fixed effects.
>>
>> But the picky mathematician in me can't understand in what distribution
>> this covariance occurs.  It makes sense in the Bayesian formulation but not
>> in a classical (sampling theory) formulation.  The distribution of the
>> estimator of the fixed effects for known values of the parameters is a
>> multivariate normal that depends on \beta, \sigma^2 and \Sigma, the model
>> matrices X and Z being known.  The random variable B doesn't enter into it.
>> (One way of writing this variance-covariance of this multivariate normal is
>> X'V^{-1}X where V is that matrix that involves Z and Sigma^{-1} - I have
>> forgotten the exact form).
>>
>> I know these considerations sound like needless theoretical niceties but
>> to me they're not. I have to be able to formulate the theoretical basis
>> before I can make sense of the computational results and, after 20 years or
>> so, I'm still having trouble making sense of this.
>>
>>
>> If that term exists under the model, but it is ignored for purposes of
>> variance estimation, what would be the reason for doing so?
>>
>>> -----Original Message-----
>>> From:
>>> r-sig-mixed-models-bounces at r-project.org<mailto:r-sig-mixed-models-bounces at r-project.org>
>>> [mailto:r-sig-mixed-<mailto:r-sig-mixed->
>>> models-bounces at r-project.org<mailto:models-bounces at r-project.org>] On
>>> Behalf Of Andrew Robinson
>>> Sent: Tuesday, February 05, 2013 9:15 PM
>>> To: dcarov at gmail.com<mailto:dcarov at gmail.com>
>>> Cc:
>>> r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>;
>>> Ben Bolker
>>> Subject: Re: [R-sig-ME] [R] lmer - BLUP prediction intervals
>>>
>>> I think that it is a reasonable way to proceed just so long as you
>>> interpret the
>>> intervals guardedly and document your assumptions carefully.
>>>
>>> Cheers
>>>
>>> Andrew
>>>
>>> On Wednesday, February 6, 2013, Daniel Caro wrote:
>>>
>>> > Dear all
>>> >
>>> > I have not been able to follow the discussion. But I would like to
>>> > know if it makes sense to calculate prediction intervals like this:
>>> >
>>> > var(fixed effect+random effect)= var(fixed effect) + var(random
>>> > effect) + 0 (i.e., the cov is zero)
>>> >
>>> > and based on this create the prediction intervals. Does this make
>>> > sense?
>>> >
>>> > All the best,
>>> > Daniel
>>> >
>>> > On Tue, Feb 5, 2013 at 8:54 PM, Douglas Bates
>>> > <bates at stat.wisc.edu<mailto:bates at stat.wisc.edu>>
>>>
>>> wrote:
>>> > > On Tue, Feb 5, 2013 at 2:14 PM, Andrew Robinson <
>>> > > A.Robinson at ms.unimelb.edu.au<mailto:A.Robinson at ms.unimelb.edu.au>>
>>> > > wrote:
>>> > >
>>> > >> I'd have thought that the joint correlation matrix would be of the
>>> > >> estimates of the fixed effects and the random effects, rather than
>>> > >> the things themselves.
>>> > >>
>>> > >
>>> > > Well, it may be because I have turned into a grumpy old man but I
>>> > > get
>>> > picky
>>> > > about terminology and the random effects are not parameters - they
>>> > > are unobserved random variables.  They don't have "estimates" in the
>>> > > sense of parameter estimates.  The quantities returned by the ranef
>>> > > function are
>>> > the
>>> > > conditional means (in the case of a linear mixed model, conditional
>>> > > modes in general) of the random effects given the observed data
>>> > > evaluated with the parameters at their estimated values. In the
>>> > > Bayesian point of view none of this is problematic because they're
>>> > > all random variables but otherwise I struggle with the
>>> > > interpretation of how these can be
>>> > considered
>>> > > jointly.   If you want to consider the distribution of the random
>>> > > effects
>>> > > you need to have known values of the parameters.
>>> > >
>>> > >
>>> > >> The estimates are statistical quantities, with specified
>>> > >> distributions, under the model.  The model posits these different
>>> > >> roles (parameter,
>>> > random
>>> > >> variable) for the quantities that are the targets of the estimates,
>>> > >> but
>>> > the
>>> > >> estimates are just estimates, and as such, they have a correlation
>>> > >> structure under the model, and that correlation structure can be
>>> > estimated.
>>> > >>
>>> > >> An imperfect analogy from least-squares regression is the
>>> > >> correlation structure of residual estimates, induced by the model.
>>> > >> We say that the errors are independent, but the model creates a
>>> > >> (modest) correlation structure than can be measured, again,
>>> > >> conditional
>>> on the model.
>>> > >>
>>> > >
>>> > > Well the residuals are random variables and we can show that at the
>>> > > least squares estimates of the parameters they will have a known
>>> > > Gaussian distribution which, it turns out, doesn't depend on the
>>> > > values of the coefficients.  But those are the easy cases.  In the
>>> > > linear mixed model
>>> > we
>>> > > still have a Gaussian distribution and a linear predictor but that
>>> > > is for the conditional distribution of the response given the random
>>> effects.
>>> >  For
>>> > > the complete model things get much messier.
>>> > >
>>> > > I'm not making these points just to be difficult.  I have spent a
>>> > > lot of time thinking about these models and trying to come up with a
>>> > > coherent
>>> > way
>>> > > of describing them.  Along the way I have come to the conclusion
>>> > > that the way these models are often described is, well, wrong.  And
>>> > > those descriptions include some that I have written.  For example,
>>> > > you often
>>> > see
>>> > > the model described as the linear predictor for an observation plus
>>> > > a "noise" term, epsilon, and the statement that the distribution of
>>> > > the random effects is independent of the distribution of the noise
>>> > > term.  I
>>> > now
>>> > > view the linear predictor as a part of the conditional distribution
>>> > > of
>>> > the
>>> > > response given the random effects so it wouldn't make sense to talk
>>> > > about these distributions being independent.  The biggest pitfall in
>>> > transferring
>>> > > your thinking from a linear model to any other kind (GLM, LMM, GLMM)
>>> > > is
>>> > the
>>> > > fact that we can make sense of a Gaussian distribution minus its
>>> > > mean so
>>> > we
>>> > > write the linear model in the "signal plus noise" form as Y = X\beta
>>> > > + \epsilon where Y is an n-dimensional random variable, X is the n
>>> > > by p
>>> > model
>>> > > matrix, \beta is the p-dimensional vector of coefficients and
>>> > > \epsilon is an n-dimensional Gaussian with mean zero.  That doesn't
>>> > > work in the other cases, despite the heroic attempts of many people
>>> > > to write things in that way.
>>> > >
>>> > > Here endeth the sermon.
>>> > >
>>> > >
>>> > >> On Wed, Feb 6, 2013 at 6:34 AM, Douglas Bates
>>> > >> <bates at stat.wisc.edu<mailto:bates at stat.wisc.edu>>
>>>
>>> > wrote:
>>> > >>
>>> > >>> It is possible to create the correlation matrix of the fixed
>>> > >>> effects
>>> > and
>>> > >>> the random effects jointly using the results from lmer but I have
>>> > >>> difficulty deciding what this would represent statistically.  If
>>> > >>> you
>>> > adopt
>>> > >>> a B
>>>
>>>
>>>
>>> --
>>> Andrew Robinson
>>> Director (A/g), ACERA
>>> Senior Lecturer in Applied Statistics                      Tel:
>>> +61-3-8344-6410<tel:%2B61-3-8344-6410>
>>> Department of Mathematics and Statistics            Fax: +61-3-8344
>>> 4599<tel:%2B61-3-8344%204599>
>>>
>>> University of Melbourne, VIC 3010 Australia
>>> Email: a.robinson at ms.unimelb.edu.au<mailto:a.robinson at ms.unimelb.edu.au>
>>> Website:
>>>
>>> http://www.ms.unimelb.edu.au
>>>
>>> FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
>>> SPuR: http://www.ms.unimelb.edu.au/spuRs/
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org>
>>> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org>
>> mailing list
>>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>