[R-sig-ME] Fwd: lmer stand dev of coefficients

Sun Dec 21 21:35:20 CET 2008

On Sun, Dec 21, 2008 at 2:12 PM, Andrew Robinson
<A.Robinson at ms.unimelb.edu.au> wrote:
> Hi all,
>
> This article might help:
>
> The BLUPs are not "best" when it comes to bootstrapping
>
> Jeffrey S. Morris
>
> Statistics & Probability Letters 56 (2002) 425-430
>
> In the setting of mixed models, some researchers may construct a
> semiparametric bootstrap by sampling from the best linear unbiased
> predictor residuals.  This paper demonstrates both mathematically and
> by simulation that such a bootstrap will consistently underestimate
> the variation in the data in finite samples.
>
> Cheers,
>
> Andrew

Thanks, Andrew.

It occurred to me after I wrote my response that simulation would be a
good way of seeing this effect.  In other words, simulate data from a
simple model with a known variance for the random effects and the
noise then check what the mle and REML estimates of the variance are
and what the variance or standard deviation of the conditional modes
are.

Also, are there formulas for the BLUPs in the case of a simple
one-factor balanced design like the Dyestuff data?  Can these be used
to show that the BLUPs will tend to have an empirical standard
deviation whose expectation is less than the standard deviation of the
random effects?

>
> On Sun, Dec 21, 2008 at 10:59:01AM -0600, Douglas Bates wrote:
>> On Sun, Dec 21, 2008 at 9:40 AM, Daniel Ezra Johnson
>> <danielezrajohnson at gmail.com> wrote:
>> > ---------- Forwarded message ----------
>> > From: Daniel Ezra Johnson <danielezrajohnson at gmail.com>
>> > Date: Sun, Dec 21, 2008 at 3:39 PM
>> > Subject: Re: [R-sig-ME] lmer stand dev of coefficients
>> > To: Douglas Bates <bates at stat.wisc.edu>
>>
>> > Can you explain briefly what circumstances would lead these quantities
>> > to be quite different?
>>
>> First, I misspoke. (Note to self: Don't try to answer questions on
>> theory before the second cup of coffee.)  The standard deviation of
>> the BLUPs (or, as I prefer to call them, the conditional modes) of the
>> random effects are not an estimate of the conditional standard
>> deviation of the random effects given the data.  I can only make sense
>> of the conditional standard deviation of a particular random effect
>> and that would be much smaller than the observed standard deviation of
>> the conditional modes.
>>
>> What I should have said is somewhat more subtle.  We know that the
>> conditional modes of the random effects have less variability than the
>> corresponding individual estimates of a parameter.  I enclose a script
>> and its output for a particularly simple example - a random-effects
>> model fit to the Dyestuff data from the lme4 package.  The design is a
>> balanced, one-way classification so the estimate of the mean yield is
>> simply the mean of the Yield variable.
>>
>> We see that the conditional modes are always smaller in magnitude than
>> the deviations of the individual means from the overall mean.  The
>> fact that the ratio is constant is a consequence of the balanced
>> design.   We say that the conditional modes are shrunk towards zero
>> because the random effects have a finite variance.
>>
>> The conditional modes are also shrunk relative to what would be
>> expected from the unconditional variance of the random effects, but I
>> find it more difficult to explain why.  It makes sense to me that the
>> mle of the unconditional standard deviation would be larger than the
>> standard deviation of the conditional modes but of the way the way the
>> likelihood criterion is formulated.
>>
>> Perhaps someone else can explain why.
>>
>> > Suppose the random effect grouping factor is Subject.
>> >
>> > On what basis would the software estimate the unconditional SD of (the
>> > population of) Subjects to be something quite different (and as you
>> > say, usually larger) than that of the particular group of Subjects in
>> > the data?
>> >
>> > Dan
>> >
>> > On Sun, Dec 21, 2008 at 3:32 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
>> >> On Sun, Dec 21, 2008 at 3:55 AM, Iasonas Lamprianou
>> >> <lamprianou at yahoo.com> wrote:
>> >>> Dear friends
>> >>> when I use sd(coef(mymodel)$myvariable) I get 0.21
>> >>> However, the summary of the model gives
>> >>> Error terms:
>> >>>  Groups      Name        Std.Dev.
>> >>>  myvariable (Intercept) 0.33
>> >>>  Residual               0.76
>> >>>
>> >>> Why dont I get the same value (0.21 instead of 0.33)?
>> >>
>> >> Because they are estimates of different quantities:
>> >> sd(coef(mymodel)$myvariable) is an estimate (although it is not
>> >> entirely clear what the properties of such an estimate would be) of
>> >> the conditional standard deviation of the random effects given the
>> >> data, whereas 0.33 is the maximum likelihood estimate or REML estimate
>> >> of the unconditional standard deviation of the random effects.  We
>> >> would expect the conditional standard deviation to be smaller than the
>> >> unconditional standard deviation.
>> >>
>> >> P.S. If you are starting a new topic on the mailing list you don't
>> >> need to quote a previous message to the list and especially not an
>> >> entire digest message.
>> >>
>> >> _______________________________________________
>> >> R-sig-mixed-models at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>
>> >
>> > _______________________________________________
>> > R-sig-mixed-models at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >
>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
> --
> Andrew Robinson
> Department of Mathematics and Statistics            Tel: +61-3-8344-6410
> University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
> http://www.ms.unimelb.edu.au/~andrewpr
> http://blogs.mbs.edu/fishing-in-the-bay/
>