[R-sig-ME] [R] lmer - BLUP prediction intervals

Thu Feb 7 00:28:42 CET 2013

Hi Harold,

just pragmatism.  If the model says that there's a quantity but it's
hard to estimate, eg the software doesn't provide it, then I wouild
contend that - so long as one is honest - it's reasonable to ignore
it.

Intuitively I would expect that the correlation between the fixed
estimates and random predictions would be negative.  This is because,
in a sense, they're trying to rerepsent the same variation. I know
that you have a lot of experience and a lot of time thinking about
these models.  How does that conjecture sound to you?  If that were
the case then ignoring the correlation between them when computing a
prediction interval would be conservative.

Best wishes

Andrew

On Wed, Feb 06, 2013 at 10:36:40AM -0500, Doran, Harold wrote:
> Andrew
> 
> Ignoring the important theoretical question for just a moment on whether it is sensible to do this, there is a covariance term between the BLUPs and the fixed effects. 
> 
> If that term exists under the model, but it is ignored for purposes of variance estimation, what would be the reason for doing so?
> 
> > -----Original Message-----
> > From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-
> > models-bounces at r-project.org] On Behalf Of Andrew Robinson
> > Sent: Tuesday, February 05, 2013 9:15 PM
> > To: dcarov at gmail.com
> > Cc: r-sig-mixed-models at r-project.org; Ben Bolker
> > Subject: Re: [R-sig-ME] [R] lmer - BLUP prediction intervals
> > 
> > I think that it is a reasonable way to proceed just so long as you interpret the
> > intervals guardedly and document your assumptions carefully.
> > 
> > Cheers
> > 
> > Andrew
> > 
> > On Wednesday, February 6, 2013, Daniel Caro wrote:
> > 
> > > Dear all
> > >
> > > I have not been able to follow the discussion. But I would like to
> > > know if it makes sense to calculate prediction intervals like this:
> > >
> > > var(fixed effect+random effect)= var(fixed effect) + var(random
> > > effect) + 0 (i.e., the cov is zero)
> > >
> > > and based on this create the prediction intervals. Does this make sense?
> > >
> > > All the best,
> > > Daniel
> > >
> > > On Tue, Feb 5, 2013 at 8:54 PM, Douglas Bates <bates at stat.wisc.edu>
> > wrote:
> > > > On Tue, Feb 5, 2013 at 2:14 PM, Andrew Robinson <
> > > > A.Robinson at ms.unimelb.edu.au> wrote:
> > > >
> > > >> I'd have thought that the joint correlation matrix would be of the
> > > >> estimates of the fixed effects and the random effects, rather than
> > > >> the things themselves.
> > > >>
> > > >
> > > > Well, it may be because I have turned into a grumpy old man but I
> > > > get
> > > picky
> > > > about terminology and the random effects are not parameters - they
> > > > are unobserved random variables.  They don't have "estimates" in the
> > > > sense of parameter estimates.  The quantities returned by the ranef
> > > > function are
> > > the
> > > > conditional means (in the case of a linear mixed model, conditional
> > > > modes in general) of the random effects given the observed data
> > > > evaluated with the parameters at their estimated values. In the
> > > > Bayesian point of view none of this is problematic because they're
> > > > all random variables but otherwise I struggle with the
> > > > interpretation of how these can be
> > > considered
> > > > jointly.   If you want to consider the distribution of the random effects
> > > > you need to have known values of the parameters.
> > > >
> > > >
> > > >> The estimates are statistical quantities, with specified
> > > >> distributions, under the model.  The model posits these different
> > > >> roles (parameter,
> > > random
> > > >> variable) for the quantities that are the targets of the estimates,
> > > >> but
> > > the
> > > >> estimates are just estimates, and as such, they have a correlation
> > > >> structure under the model, and that correlation structure can be
> > > estimated.
> > > >>
> > > >> An imperfect analogy from least-squares regression is the
> > > >> correlation structure of residual estimates, induced by the model.
> > > >> We say that the errors are independent, but the model creates a
> > > >> (modest) correlation structure than can be measured, again, conditional
> > on the model.
> > > >>
> > > >
> > > > Well the residuals are random variables and we can show that at the
> > > > least squares estimates of the parameters they will have a known
> > > > Gaussian distribution which, it turns out, doesn't depend on the
> > > > values of the coefficients.  But those are the easy cases.  In the
> > > > linear mixed model
> > > we
> > > > still have a Gaussian distribution and a linear predictor but that
> > > > is for the conditional distribution of the response given the random
> > effects.
> > >  For
> > > > the complete model things get much messier.
> > > >
> > > > I'm not making these points just to be difficult.  I have spent a
> > > > lot of time thinking about these models and trying to come up with a
> > > > coherent
> > > way
> > > > of describing them.  Along the way I have come to the conclusion
> > > > that the way these models are often described is, well, wrong.  And
> > > > those descriptions include some that I have written.  For example,
> > > > you often
> > > see
> > > > the model described as the linear predictor for an observation plus
> > > > a "noise" term, epsilon, and the statement that the distribution of
> > > > the random effects is independent of the distribution of the noise
> > > > term.  I
> > > now
> > > > view the linear predictor as a part of the conditional distribution
> > > > of
> > > the
> > > > response given the random effects so it wouldn't make sense to talk
> > > > about these distributions being independent.  The biggest pitfall in
> > > transferring
> > > > your thinking from a linear model to any other kind (GLM, LMM, GLMM)
> > > > is
> > > the
> > > > fact that we can make sense of a Gaussian distribution minus its
> > > > mean so
> > > we
> > > > write the linear model in the "signal plus noise" form as Y = X\beta
> > > > + \epsilon where Y is an n-dimensional random variable, X is the n
> > > > by p
> > > model
> > > > matrix, \beta is the p-dimensional vector of coefficients and
> > > > \epsilon is an n-dimensional Gaussian with mean zero.  That doesn't
> > > > work in the other cases, despite the heroic attempts of many people
> > > > to write things in that way.
> > > >
> > > > Here endeth the sermon.
> > > >
> > > >
> > > >> On Wed, Feb 6, 2013 at 6:34 AM, Douglas Bates <bates at stat.wisc.edu>
> > > wrote:
> > > >>
> > > >>> It is possible to create the correlation matrix of the fixed
> > > >>> effects
> > > and
> > > >>> the random effects jointly using the results from lmer but I have
> > > >>> difficulty deciding what this would represent statistically.  If
> > > >>> you
> > > adopt
> > > >>> a B
> > 
> > 
> > 
> > --
> > Andrew Robinson
> > Director (A/g), ACERA
> > Senior Lecturer in Applied Statistics                      Tel:
> > +61-3-8344-6410
> > Department of Mathematics and Statistics            Fax: +61-3-8344 4599
> > University of Melbourne, VIC 3010 Australia
> > Email: a.robinson at ms.unimelb.edu.au    Website:
> > http://www.ms.unimelb.edu.au
> > 
> > FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
> > SPuR: http://www.ms.unimelb.edu.au/spuRs/
> > 
> > 	[[alternative HTML version deleted]]
> > 
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Andrew Robinson  
Director (A/g), ACERA 
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia               (prefer email)
http://www.ms.unimelb.edu.au/~andrewpr              Fax: +61-3-8344-4599
http://www.acera.unimelb.edu.au/

Forest Analytics with R (Springer, 2011) 
http://www.ms.unimelb.edu.au/FAwR/
Introduction to Scientific Programming and Simulation using R (CRC, 2009): 
http://www.ms.unimelb.edu.au/spuRs/