[R-sig-ME] basic question

Baldwin, Jim -FS jbaldwin at fs.fed.us
Fri Aug 3 03:37:59 CEST 2012

If the objective is to obtain an approximately unbiased estimate for the prediction of a future observation that would have the same variance components as found in the fitted model, then multiplying the backtransformed value by exp of "something like sum(sapply(VarCorr(model),diag))+sigma(model)^2 ..."/2 is a reasonable thing to do as Dr. Bolker suggests.

But if the approximately unbiased prediction is for a different situation which might not include either the same or all of the variance components in the fitted model, you'll need to think about which ones to use.

Also, (and this is likely very obvious) if the multiplicative correction is much larger than around 1.5, I'd have to wonder about the goodness-of-fit of the distributional assumptions of the model.  Sometimes this bias correction can give results that are way outside of observed values on the original scale.  (And, at least for me, I'd also check my code in such situations.  It's many times my fault and not the data's fault.)


Jim Baldwin
Station Statistician
Pacific Southwest Research Station
USDA Forest Service

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben Bolker
Sent: Thursday, August 02, 2012 5:56 PM
To: Rachel Cohen; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] basic question

 [cc'ing back to r-sig-mixed-models]

On 12-08-02 07:14 AM, Rachel Cohen wrote:
> Hi, thanks for your reply.  I was reading the Pinheiro and Bates book
> (pg.21) which says that for a basic model the parameters are: β, σ2b ,
> and σ2 which is why I thought the residual variance was counted as a
> parameter?  Also, on the same page: "Although the random effects, bi,
> i = 1, . . .,M may behave like parameters, formally they are just
> another level of random variation in the model so we do not “estimate”
> them as such."  That lead me to believe that the random effects are
> not counted as parameters?

   First, you need to make the distinction between the b_i value ('BLUPs', or 'conditional modes', or 'random effect estimates' -- the estimates of what's happening in each grouping level) -- and the theta values (the variances of the random effects).
> I am intending on using the RSE in the formula: CF = exp (RSE^2
> /2) which is supposed to correct for log-bias in estimates (in this
> case estimated tree biomass).  Each unlogged estimate is multiplied by
> the correction factor (CF).  I haven't come across any mention in the
> literature of whether it is appropriate to use the RSE from a
> mixed-effects model in this way.  I hope so?

  If you want to fit on the log scale but estimate the *mean* effects on the unlogged scale, you should (I think) add the exp(TOTAL_variance/2); in this case TOTAL_variance will be the sum of *all of the variance
components* in the model, including the residual variance (I'm going to assume you have only intercept random effects in the model, otherwise things would get a bit more complicated ...) I *think* it would be something like sum(sapply(VarCorr(model),diag))+sigma(model)^2 ...  I think there has been a reasonably recent post by Jarrod Hadfield answering someone's question related to this topic ...

  If I'm right, then there's no need to go messing around with numbers of parameters ...
  Ben Bolker

> *From:* Ben Bolker <bbolker at gmail.com>
> *To:* r-sig-mixed-models at r-project.org
> *Sent:* Thursday, 2 August 2012, 2:51
> *Subject:* Re: [R-sig-ME] basic question
> Rachel Cohen <stat.list at ...> writes:
>>  Hi, I have a very basic query (I think).  If I fit a model of the
>> specification given below how many model parameters are there?  Am I
>> correct in thinking that the model parameters are the fixed effects
>> intercept and coefficients (in this case two of them) and also a
>> parameter for within and for between group variance?  So in total 5
>> fitted parameters? or are there also covariance parameters?.  I am
>> trying to calculate the standard error of the residuals as:
>   Unfortunately, this is an easy question to state but not (in my
> opinion) an easy question to answer unambiguously.
>> RSE = ((sum(residuals^2))/(N - no.of parameters))
>> and therefore need to know how many parameters my model actually has.
>> I've gotten a little confused by the
>> literature definitions of what constitutes a parameter.
> It's confusing!
> [snip]
> A *reasonable* definition (although not the only one) would be to
> count the number of fixed-effect parameters ('beta' in much of the
> literature) and the number of random-effect parameters (generally
> referred to as 'theta' in the lme4 documentation, but varying a great
> deal among references):
> length(fixef(model))+length(getME(model,"theta"))
>   in your case that's 3 parameters for the fixed effects and 6 RE
> parameters (you have a 3x3 variance-covariance matrix of the random
> effects, the matrix is symmetric, so counting the diagonal plus one
> triangle gives 3*(3+1)/2 = 6).  Generally one doesn't count the
> residual variance since that is estimated from the (penalized)
> residual sum of squares.
>   It really depends what you want to use the RSE for.  It may very
> well not have the properties you're expecting (i.e. the properties
> that it has in a simple (non-mixed) linear model ...)
>   There's a bit more about parameter-counting issues at
> http://glmm.wikidot.com/faq, I think ...
>> model<-lmer((log.mass)~centre.log.dbh+centre.log.height+
>> (1+centre.log.dbh+centre.log.height|species_site),
>> data=allometry_2,REML=T)
>>  model summary output table:
>> Linear mixed model fit by REML
>> Formula: (log.mass) ~ centre.log.dbh + centre.log.height +
>> (1 + centre.log.dbh + centre.log.height | species_site)
> [snip]
>> Random effects:
>>  Groups       Name              Variance  Std.Dev. Corr
>>  species_site (Intercept)       0.061636  0.24827
>>               centre.log.dbh    0.279362  0.52855   0.575
>>               centre.log.height 0.285753  0.53456  -0.441 -0.755
>>  Residual                       0.086963  0.29489
>> Number of obs: 337, groups: species_site, 18
> _______________________________________________
> R-sig-mixed-models at r-project.org
> <mailto:R-sig-mixed-models at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

More information about the R-sig-mixed-models mailing list