[R-sig-ME] basic question
bbolker at gmail.com
Fri Aug 3 23:37:58 CEST 2012
Rachel Cohen <stat.list at ...> writes:
> Hi, thank you both for your replies. However I remain a little
> confused. Firstly apologies as I noticed that in my original
> posting I had left out the sqrt in the formula for the residual
> standard error given by Sprugel (1983) - RSE =
> sqrt((sum(residuals^2))/(N - no.of parameters)). In addition to a
> random effects term for the intercept my model (see below) also has
> a random effects term for both explanatory variables (dbh and
> height) so I guess this changes things and makes it more
(I was a bit amused that Sprugel 1983 makes a big deal about
the denominator needing to contain N-2 instead of N-1 ; yes,
he's right, but in practical situations it should rarely make
any difference ... his point about the base of the logarithm
is much more important.)
Yes (I should have noticed this previously). The mean of the
predicted distribution for any particular set of predictors (e.g. a
tree of a particular size) will be exp([predicted log
mean])*exp([predicted log variance]/2); I believe (but haven't worked
it out) that when there are random effects of continuous predictors
that the variance as well as the mean will then depend on the
continuous predictors, e.g.
log(Y) = a + eps_1 + (b+eps_2)*dbh
var(log(Y)) = var(eps_1) + var(eps_2)*dbh^2
so the bias correction term will depend on the predictors.
(It's even a little bit worse than that, since some of the
random effects are correlated with each other ...)
> Although I understand what the code you suggested is doing (summing
> the variances) I'm afraid my understanding of statistics is perhaps
> not good enough to know how and if this is equivalent to the formula
> given by Sprugel (which seems to be the accepted way of working out
> the residual error and correcting estimates for log-bias). Can
> Sprugels formula as it's given not be used with mixed effects
> models? I used resid(model) to grab the residuals for use in the
> above formula, maybe this wasn't correct? I'm unclear as to why
> all the variances should be used to correct the final estimates of
> biomass for each tree and not just the residual variance (as in
Sprugel's formula is definitely intended for simpler cases -- i.e.
simple log-log allometric regressions.
The basic idea is that, if mu is the mean of the data on the log
scale, and if the data (or the predictions) have any variability,
then the mean of the data on the original scale is greater than
exp(mu) -- this is a consequence of a mathematical rule called
Jensen's inequality. As Jim Baldwin says, you may need to think
hard about what situation you're actually trying to predict the
> When I work out the CF using the above formula I get a value of
> ~1.0444 (very reasonable) and using the code you sent me I get a
> value of 1.529! (not very reasonable and quite worrying). So any
> further help in explaining this would be much appreciated as I am
> now quite concerned about the goodness-of-fit of my model if the
> value of ~1.5 is correct.
It sounds like you might need to dig a little deeper ...
> Many thanks for your help,
> From: "Baldwin, Jim -FS" <jbaldwin at ...>
> r-sig-mixed-models at ..."
> <r-sig-mixed-models at ...>
> Sent: Friday, 3 August 2012, 2:37
> Subject: RE: [R-sig-ME] basic question
> If the objective is to obtain an approximately unbiased estimate for
> the prediction of a future observation that would have the same
> variance components as found in the fitted model, then multiplying
> the backtransformed value by exp of "something like
> sum(sapply(VarCorr(model),diag))+sigma(model)^2 ..."/2 is a
> reasonable thing to do as Dr. Bolker suggests.
> But if the approximately unbiased prediction is for a different
> situation which might not include either the same or all of the
> variance components in the fitted model, you'll need to think about
> which ones to use.
> Also, (and this is likely very obvious) if the multiplicative
> correction is much larger than around 1.5, I'd have to wonder about
> the goodness-of-fit of the distributional assumptions of the model.
> Sometimes this bias correction can give results that are way outside
> of observed values on the original scale. (And, at least for me,
> I'd also check my code in such situations. It's many times my fault
> and not the data's fault.)
> Jim Baldwin
> Station Statistician
> Pacific Southwest Research Station
> USDA Forest Service
[snipping stuff to make Gmane happy]
More information about the R-sig-mixed-models