[R] standardized residuals (random effects) using nlme and ranef

Mon Jul 31 18:48:39 CEST 2006

Thank you for providing the reproducible example and the explanation
of what you are seeking to calculate.

On 7/31/06, Dirk Enzmann <dirk.enzmann at uni-hamburg.de> wrote:
> As suggested I try another post.
>
> First I give a reproducible example. The example data set has been
> provided by I. Plewis (1997), Statistics in Education. London: Arnold (I
> include the residuals obtained by MLWin):
>
> # ---------------------------------------------
>
> library(nlme)
>
> # Example data from Plewis
>
> BaseURL="http://www.ottersbek.de/"
> ds32 = read.fwf(paste(BaseURL,'ds32.dat',sep=''),
>                        widths=c(9,10,10,10,10,10),
>                        col.names=c('class','pupil','zcurric',
>                                    'curric','zmath1','math2'),
>                        colClasses=c('factor','factor','numeric',
>                                     'numeric','numeric','numeric'))
>
> # Random slopes model (p. 44):
> model.4b = lme(math2~zcurric,random=~zcurric|class,method='ML',data=ds32)
>
> # Random effects (intercept and slope residuals) of level 1 (class):
> RandEff = ranef(model.4b,level=1)
>
> # Results of MLWin (c300 = intercept residuals of level "class",
> #                   c301 = slope residuals of level "class",
> #                   c304 = standardized intercept residuals,
> #                   c305 = standardized slope residuals):
> MLWinRes = read.fwf(paste(BaseURL,'resid_ex32.txt',sep=''),
>                      widths=c(15,15,15,15),
>                      col.names=c('c300','c301','c304','c305'),
>                      colClasses=c('numeric','numeric','numeric','numeric'))
>
> # They are identical to the results of MLWin (c300, c301):
> round(cbind(RandEff,MLWinRes[,1:2]),5)
>
> # However, using option "standard" the results differ considerably:
> round(cbind(ranef(model.4b,level=1,standard=T),MLWinRes[,3:4]),5)

Your summation below of the cause of this inconsistency is correct.
As Harold explains the "standardized" random effects returned by ranef
in the nlme package are the "estimates" (actually the "predictors") of
the random effects divided by the estimated standard deviation of
those random effects, not by the estimated standard error as stated in
the documentation.  I will correct the documentation.

That is, "standardized random effects" in the nlme package are
produced by dividing all the intercept random effects by the same
number (2.9723) and all the slope random effects by 2.0014.
> rr1 <- ranef(model.4b)
> rr2 <- ranef(model.4b, standard = TRUE)
> rr1/rr2
          (Intercept)  zcurric
        8    2.972373 2.001491
       12    2.972373 2.001491
       17    2.972373 2.001491
       22    2.972373 2.001491
       23    2.972373 2.001491
       28    2.972373 2.001491
       29    2.972373 2.001491
       30    2.972373 2.001491
       31    2.972373 2.001491
       32    2.972373 2.001491
       33    2.972373 2.001491
       34    2.972373 2.001491
       35    2.972373 2.001491
       36    2.972373 2.001491
       37    2.972373 2.001491
       38    2.972373 2.001491
       39    2.972373 2.001491
       41    2.972373 2.001491
       42    2.972373 2.001491
       43    2.972373 2.001491
       44    2.972373 2.001491
       45    2.972373 2.001491
       46    2.972373 2.001491
       47    2.972373 2.001491

Another way of defining a standardization would be to use what Harold
Doran and others call the "posterior variance-covariance" of the
random effects.  On reflection I think I would prefer the term
"conditional variance" but I use "posterior" below.

Strictly speaking the random effects are not parameters in the model -
they are unobserved random variables.  Conditional on the values of
the parameters in the model and on the observed data, the random
effects are normally distributed with a mean and variance-covariance
that can be calculated.

Influenced by Harold I added an optional argument "postVar" to the
ranef extractor function for lmer models (but apparently I failed to
document it - I will correct that).  If you fit your model with lmer,
as you do below, you can standardize the  random effects according to
the conditional variances by

> fm1 <- lmer(math2 ~ zcurric + (zcurric|class), ds32, method='ML')
> rr <- ranef(model.4b, postVar = TRUE)
> str(rr)  # shows the structure of the object rr
Formal class 'ranef.lmer' [package "Matrix"] with 1 slots
  ..@ .Data:List of 1
  .. ..$ :`data.frame':	24 obs. of  2 variables:
  .. .. ..$ (Intercept): num [1:24]  0.860 -1.962 -1.105  4.631 -0.781 ...
  .. .. ..$ zcurric    : num [1:24]  0.2748 -1.3194  0.0841  0.5958  0.8080 ...
  .. .. ..- attr(*, "postVar")= num [1:2, 1:2, 1:24]  2.562 -0.585
-0.585  1.837  3.027 ...

The two sets of conditional standard deviations are

> sqrt(attr(rr1, "postVar")[1,1,])
 [1] 1.600589 1.739768 1.466261 1.542172 1.942629 1.506112 1.892815 1.665414
 [9] 1.377917 1.574244 1.755679 2.039292 1.887651 1.451916 1.732030 1.659786
[17] 1.987397 1.795273 1.542049 1.784441 1.629663 1.753223 1.585520 1.470161
> sqrt(attr(rr1, "postVar")[2,2,])
 [1] 1.355537 1.630075 1.506560 1.378282 1.790906 1.310406 1.341185 1.512708
 [9] 1.397492 1.808535 0.859909 1.867273 1.626857 1.669984 1.520560 1.489380
[17] 1.989123 1.326610 1.613399 1.074365 1.764128 1.455202 1.489311 1.942117

However, as you have seen, standardizing the conditional means by
these conditional standard deviations still does not reproduce the
results from MLWin.

If you can determine what is being calculated by MLWin or if you can
determine that there is a bug in the lmer calculations I would be
delighted to hear about it.

Thanks again for a very thorough report and for the reproducible example.

> # ---------------------------------------------
>
>  From the RSiteSearch("MLWin") there are two hits that seem to be
> relevant: One of Douglas Bates
>
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/53097.html
>
> where he explains why he regards getting standard errors of the
> estimates of variance components as not being important. I think (but
> I'm not sure) that this implies that I should not use standardized
> residuals, as well.
>
> Secondly a post of Roel de Jong
>
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/62280.html
>
> However, I can't figure out how I could make use of his suggestion to
> obtain the standardized residuals I am looking for.
>
> A part of the answer has been given in an answer to my previous post by
> Harold Doran. He showed that the so called "standardized residuals"
> obtained by ranef() using the option "standard=T" are the residuals
> devided by the standard deviation of the random effects, not divided by
> the "corresponding standard error" as stated in the ranef() help - if I
> understood him correctly.
>
> In the multilevel list he also showed how to create a caterpillar plot
> using lmer() and the errbar() function of Hmisc (I modified his example
> to adapt it to my data):
>
> # ------------------------------------
>
> detach(package:nlme)
> library(Matrix)
> library(Hmisc)
>
> # Fit a sample model:
> fm1 = lmer(math2 ~ zcurric + (zcurric|class), ds32, method='ML',
> control=list(gradient=FALSE, niterEM=0))
>
> .Call("mer_gradComp", fm1, PACKAGE = "Matrix")
>
> # extract random effects:
> fm1.blup = ranef(fm1)$class
>
> #extract variances and multiple by scale factor:
> fm1.post.var = fm1 at bVar$class * attr(VarCorr(fm1),"sc")**2
>
> # posterior variance of slope on fm1:
> fm1.post.slo = sqrt(fm1.post.var[2,2,])
>
> # Slopes:
> x = fm1.blup[,2]+outer(fm1.post.slo, c(-1.96,0,1.96))
>
> # This code creates a catepillar plot using the errbar function:
> x <- x[order(x[,2]) , ]
> print(errbar(1:dim(x)[1], x[,2],x[,3],x[,1],ylab='Slopes', xlab='Classes'))
> abline(h=0)
>
> # ------------------------------------
>
> Is it correct that I can obtain a respecive plot for the intercepts in a
> similar way?
>
> # ------------------------------------
>
> # posterior variance of intercept on fm1.post.int:
> fm1.post.int = sqrt(fm1.post.var[1,1,])
>
> # Intercepts:
> x = fm1.blup[,1]+outer(fm1.post.int, c(-1.96,0,1.96))
>
> # This code creates a catepillar plot using the errbar function
> x <- x[order(x[,2]) , ]
> print(errbar(1:dim(x)[1], x[,2],x[,3],x[,1],ylab='Intercepts',
> xlab='Classes'))
> abline(h=0)
>
> # ------------------------------------
>
> To sum up, I can't figure out how MLWin calculates the standardized
> residuals. But I understand that this is not a question for the R list.
> Nevertheless, it would help if someone could point me to some arguments
> why not to use them and stick to the results obtainable by ranef().
>
> Spencer Graves wrote:
>
> >       Have you tried RSiteSearch("MLWin")?  I just got 29 hits.  I
> > wonder if any one of these might relate to your question?
> >
> >       If you would like more help on this issue from this listserve,
> > please submit another post, preferably illustrating your question with
> > the simplest possible self-contained example that illustrates your
> > question, perhaps like the following:
> >
> >       fm1.16 <- lme(distance~age, data=Orthodont[1:16,],
> >            random=~age|Subject)
> >
> >       Hope this helps.
> >       Spencer Graves
> > p.s.  PLEASE do read the posting guide
> > "www.R-project.org/posting-guide.html" and provide commented, minimal,
> > self-contained, reproducible code.
> >
> > Dirk Enzmann wrote:
> >
> >> Using ranef() (package nlme, version 3.1-75) with an 'lme' object I
> >> can obtain random effects for intercept and slope of a certain level
> >> (say: 1) - this corresponds to (say level 1) "residuals" in MLWin.
> >> Maybe I'm mistaken here, but the results are identical.
> >>
> >> However, if I try to get the standardized random effects adding the
> >> paramter "standard=T" to the specification of ranef(), the results
> >> differ considerably from the results of MLWin (although MLWin defines
> >> "standardized" in the same way as "divided by its estimated
> >> (diagnostic) standard error").
> >>
> >> Why do the results differ although the estimates (random effects and
> >> thus their variances) are almost identical? I noticed that lme() does
> >> not compute the standard errors of the variances of the random effects
> >> - for several reasons, but if this is true, how does ranef() calculate
> >> the standardized random effects (the help says: '"standardized" (i.e.
> >> divided by the corresponding estimated standard error)').
> >>
> >> Is there a way to obtain similar results as in MLWin (or: should I
> >> prefer the results of ranef() for certain reasons)?
> >>
> >> Dirk
> >>
> >> -----------------------------
> >> R version: 2.3.1 Patched (2006-06-21 r38367)
>
> --
> *************************************************
> Dr. Dirk Enzmann
> Institute of Criminal Sciences
> Dept. of Criminology
> Edmund-Siemers-Allee 1
> D-20146 Hamburg
> Germany
>
> phone: +49-(0)40-42838.7498 (office)
>         +49-(0)40-42838.4591 (Billon)
> fax:   +49-(0)40-42838.2344
> email: dirk.enzmann at uni-hamburg.de
> www:
> http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>