[R-sig-ME] [R] lmer - BLUP prediction intervals

Fri Feb 8 06:24:00 CET 2013

Hi Harold,

I think that we're on the same page.  My opinion is that the estimates
and the predictions are both random variables.  They have a joint
distribution under the model, and it is possible to estimate the
parameters of that joint distribution, under the model.  

I tend to think that if I can't write down their joint distribution
then I'm not sure what are the grounds for taking any linear
combination - including the very useful adding of them - either.

If obtaining the covariance term is reasonable within epsilon time and
effort, then I would advocate doing that and following your
prescription.

Andrew.

On Thu, Feb 07, 2013 at 09:45:38AM -0500, Doran, Harold wrote:
> Andrew:
> 
> I agree with you and think that is reasonable too, but would offer the following thoughts. Doug and I have talked about this for a while. As usual, he makes excellent points and I've struggled with this specific issue. 
> 
> In the end, I have found myself taking the position that you argued. It remains uncontroversial to add the estimate of the fixed effects with the BLUPs. Doug noted one is _estimated_ as a parameter (i.e., the fixed effect) and the BLUP is not estimated in the same way the fixed effect is as a model parameter.
> 
> But, it is _kind of_ an estimate. It is predicted conditional on the variance components and the fixed effects themselves and it has a sampling distribution.
> 
> So, I've taken the position that these are two statistical "things" both with model-based sampling distributions and a covariance term. 
> 
> In my world, there are often stakes associated with the use of the data and if I were to ignore the covariance term between the BLUPs and the fixed effects I could risk underestimating the true sampling variance of that linear combination.
> 
> So, in my world, I view them as a linear combination with a covariance term and represent the variance of them as such. 
> 
> 
> 
> 
> > -----Original Message-----
> > From: Andrew Robinson [mailto:A.Robinson at ms.unimelb.edu.au]
> > Sent: Wednesday, February 06, 2013 6:29 PM
> > To: Doran, Harold
> > Cc: dcarov at gmail.com; r-sig-mixed-models at r-project.org; Ben Bolker
> > Subject: Re: [R-sig-ME] [R] lmer - BLUP prediction intervals
> > 
> > Hi Harold,
> > 
> > just pragmatism.  If the model says that there's a quantity but it's hard to
> > estimate, eg the software doesn't provide it, then I wouild contend that - so
> > long as one is honest - it's reasonable to ignore it.
> > 
> > Intuitively I would expect that the correlation between the fixed estimates
> > and random predictions would be negative.  This is because, in a sense,
> > they're trying to rerepsent the same variation. I know that you have a lot of
> > experience and a lot of time thinking about these models.  How does that
> > conjecture sound to you?  If that were the case then ignoring the correlation
> > between them when computing a prediction interval would be conservative.
> > 
> > Best wishes
> > 
> > Andrew
> > 
> > 
> > 
> > On Wed, Feb 06, 2013 at 10:36:40AM -0500, Doran, Harold wrote:
> > > Andrew
> > >
> > > Ignoring the important theoretical question for just a moment on whether
> > it is sensible to do this, there is a covariance term between the BLUPs and
> > the fixed effects.
> > >
> > > If that term exists under the model, but it is ignored for purposes of
> > variance estimation, what would be the reason for doing so?
> > >
> > > > -----Original Message-----
> > > > From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-
> > > > models-bounces at r-project.org] On Behalf Of Andrew Robinson
> > > > Sent: Tuesday, February 05, 2013 9:15 PM
> > > > To: dcarov at gmail.com
> > > > Cc: r-sig-mixed-models at r-project.org; Ben Bolker
> > > > Subject: Re: [R-sig-ME] [R] lmer - BLUP prediction intervals
> > > >
> > > > I think that it is a reasonable way to proceed just so long as you
> > > > interpret the intervals guardedly and document your assumptions
> > carefully.
> > > >
> > > > Cheers
> > > >
> > > > Andrew
> > > >
> > > > On Wednesday, February 6, 2013, Daniel Caro wrote:
> > > >
> > > > > Dear all
> > > > >
> > > > > I have not been able to follow the discussion. But I would like to
> > > > > know if it makes sense to calculate prediction intervals like this:
> > > > >
> > > > > var(fixed effect+random effect)= var(fixed effect) + var(random
> > > > > effect) + 0 (i.e., the cov is zero)
> > > > >
> > > > > and based on this create the prediction intervals. Does this make
> > sense?
> > > > >
> > > > > All the best,
> > > > > Daniel
> > > > >
> > > > > On Tue, Feb 5, 2013 at 8:54 PM, Douglas Bates
> > > > > <bates at stat.wisc.edu>
> > > > wrote:
> > > > > > On Tue, Feb 5, 2013 at 2:14 PM, Andrew Robinson <
> > > > > > A.Robinson at ms.unimelb.edu.au> wrote:
> > > > > >
> > > > > >> I'd have thought that the joint correlation matrix would be of
> > > > > >> the estimates of the fixed effects and the random effects,
> > > > > >> rather than the things themselves.
> > > > > >>
> > > > > >
> > > > > > Well, it may be because I have turned into a grumpy old man but
> > > > > > I get
> > > > > picky
> > > > > > about terminology and the random effects are not parameters -
> > > > > > they are unobserved random variables.  They don't have
> > > > > > "estimates" in the sense of parameter estimates.  The quantities
> > > > > > returned by the ranef function are
> > > > > the
> > > > > > conditional means (in the case of a linear mixed model,
> > > > > > conditional modes in general) of the random effects given the
> > > > > > observed data evaluated with the parameters at their estimated
> > > > > > values. In the Bayesian point of view none of this is
> > > > > > problematic because they're all random variables but otherwise I
> > > > > > struggle with the interpretation of how these can be
> > > > > considered
> > > > > > jointly.   If you want to consider the distribution of the random effects
> > > > > > you need to have known values of the parameters.
> > > > > >
> > > > > >
> > > > > >> The estimates are statistical quantities, with specified
> > > > > >> distributions, under the model.  The model posits these
> > > > > >> different roles (parameter,
> > > > > random
> > > > > >> variable) for the quantities that are the targets of the
> > > > > >> estimates, but
> > > > > the
> > > > > >> estimates are just estimates, and as such, they have a
> > > > > >> correlation structure under the model, and that correlation
> > > > > >> structure can be
> > > > > estimated.
> > > > > >>
> > > > > >> An imperfect analogy from least-squares regression is the
> > > > > >> correlation structure of residual estimates, induced by the model.
> > > > > >> We say that the errors are independent, but the model creates a
> > > > > >> (modest) correlation structure than can be measured, again,
> > > > > >> conditional
> > > > on the model.
> > > > > >>
> > > > > >
> > > > > > Well the residuals are random variables and we can show that at
> > > > > > the least squares estimates of the parameters they will have a
> > > > > > known Gaussian distribution which, it turns out, doesn't depend
> > > > > > on the values of the coefficients.  But those are the easy
> > > > > > cases.  In the linear mixed model
> > > > > we
> > > > > > still have a Gaussian distribution and a linear predictor but
> > > > > > that is for the conditional distribution of the response given
> > > > > > the random
> > > > effects.
> > > > >  For
> > > > > > the complete model things get much messier.
> > > > > >
> > > > > > I'm not making these points just to be difficult.  I have spent
> > > > > > a lot of time thinking about these models and trying to come up
> > > > > > with a coherent
> > > > > way
> > > > > > of describing them.  Along the way I have come to the conclusion
> > > > > > that the way these models are often described is, well, wrong.
> > > > > > And those descriptions include some that I have written.  For
> > > > > > example, you often
> > > > > see
> > > > > > the model described as the linear predictor for an observation
> > > > > > plus a "noise" term, epsilon, and the statement that the
> > > > > > distribution of the random effects is independent of the
> > > > > > distribution of the noise term.  I
> > > > > now
> > > > > > view the linear predictor as a part of the conditional
> > > > > > distribution of
> > > > > the
> > > > > > response given the random effects so it wouldn't make sense to
> > > > > > talk about these distributions being independent.  The biggest
> > > > > > pitfall in
> > > > > transferring
> > > > > > your thinking from a linear model to any other kind (GLM, LMM,
> > > > > > GLMM) is
> > > > > the
> > > > > > fact that we can make sense of a Gaussian distribution minus its
> > > > > > mean so
> > > > > we
> > > > > > write the linear model in the "signal plus noise" form as Y =
> > > > > > X\beta
> > > > > > + \epsilon where Y is an n-dimensional random variable, X is the
> > > > > > + n
> > > > > > by p
> > > > > model
> > > > > > matrix, \beta is the p-dimensional vector of coefficients and
> > > > > > \epsilon is an n-dimensional Gaussian with mean zero.  That
> > > > > > doesn't work in the other cases, despite the heroic attempts of
> > > > > > many people to write things in that way.
> > > > > >
> > > > > > Here endeth the sermon.
> > > > > >
> > > > > >
> > > > > >> On Wed, Feb 6, 2013 at 6:34 AM, Douglas Bates
> > > > > >> <bates at stat.wisc.edu>
> > > > > wrote:
> > > > > >>
> > > > > >>> It is possible to create the correlation matrix of the fixed
> > > > > >>> effects
> > > > > and
> > > > > >>> the random effects jointly using the results from lmer but I
> > > > > >>> have difficulty deciding what this would represent
> > > > > >>> statistically.  If you
> > > > > adopt
> > > > > >>> a B
> > > >
> > > >
> > > >
> > > > --
> > > > Andrew Robinson
> > > > Director (A/g), ACERA
> > > > Senior Lecturer in Applied Statistics                      Tel:
> > > > +61-3-8344-6410
> > > > Department of Mathematics and Statistics            Fax: +61-3-8344 4599
> > > > University of Melbourne, VIC 3010 Australia
> > > > Email: a.robinson at ms.unimelb.edu.au    Website:
> > > > http://www.ms.unimelb.edu.au
> > > >
> > > > FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
> > > > SPuR: http://www.ms.unimelb.edu.au/spuRs/
> > > >
> > > > 	[[alternative HTML version deleted]]
> > > >
> > > > _______________________________________________
> > > > R-sig-mixed-models at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> > 
> > --
> > Andrew Robinson
> > Director (A/g), ACERA
> > Department of Mathematics and Statistics            Tel: +61-3-8344-6410
> > University of Melbourne, VIC 3010 Australia               (prefer email)
> > http://www.ms.unimelb.edu.au/~andrewpr              Fax: +61-3-8344-4599
> > http://www.acera.unimelb.edu.au/
> > 
> > Forest Analytics with R (Springer, 2011)
> > http://www.ms.unimelb.edu.au/FAwR/
> > Introduction to Scientific Programming and Simulation using R (CRC, 2009):
> > http://www.ms.unimelb.edu.au/spuRs/

-- 
Andrew Robinson  
Director (A/g), ACERA 
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia               (prefer email)
http://www.ms.unimelb.edu.au/~andrewpr              Fax: +61-3-8344-4599
http://www.acera.unimelb.edu.au/

Forest Analytics with R (Springer, 2011) 
http://www.ms.unimelb.edu.au/FAwR/
Introduction to Scientific Programming and Simulation using R (CRC, 2009): 
http://www.ms.unimelb.edu.au/spuRs/