[R-sig-ME] z transform versus random intercept

Thu Dec 3 16:12:42 CET 2009

As you can imagine the two models handle the variability between the
subjects very differently.  The use of scaling is a rather crude way
of handling this variability, which is understandable in that it
originated at a time when computing resources were much less powerful
than they are today.  The problem with scaling is that it is very
sensitive to the extreme observations, which are the ones that are
most likely to be outliers or in some way problematic.

The mixed-effects model determines the distribution of the random
effects for the subjects to balance fidelity to the data with
complexity of the model.  John Tukey referred to this as "borrowing
strength" between the subjects.  You assume that the subjects come
from a population of varying abilities and damp down the individual
estimates toward the population means, as long as doing so can produce
a reasonable fit.

If, like me, you think in pictures, you might find the slides at
http://lme4.r-forge.r-project.org/slides/2009-07-21-Seewiesen/5LongitudinalD.pdf
helpful in understanding this shrinkage idea.

On Thu, Dec 3, 2009 at 5:22 AM, espesser <robert.espesser at lpl-aix.fr> wrote:
> Dear all
>
> In my domain (phonetics) , it is usual to z-transform the response
> (i.e. using  scale( ... ,scale=T)   -by subjects, for example-  )
> before doing classical regression analysis. I'm not awared enough in
> statistics to
> see and explain all the statistical and fundamental differences there are
> between this approach
> and mixed models.
>
> For example , with the data "sleepstudy" from package lme4, is there
> something wrong or
> dubious  with the following model ?  :
>
>
> (a)    lm ( zreaction ~ Days ,data=sleepstudy)
> where zreaction is   Reaction   scaled by Subject.
>
>
> to be compared with:
>
> (b)    lmer( Reaction ~Days +(1|Subject), data=sleepstudy)
>
>
> (a) still considers all the z-measures as independant, and I think that
> it is still "dubious" , despite the fact that after the scaling, all the
> zreaction
> have a  mean==0 and a sd ==1. Am I right ?
>
> Apparently, there are no differences between (a) and the following mixed
> model :
>
> (c) lmer(zreac ~ Days + (1 | Subject) ...)
>
> (Of course this last  model found a null variance inter-subject).
>
> Would differences be appeared if  I run simulations on (a) ,(b) and (c) to
> test
> the effect of Days ?
>
> I'm looking for "good" arguments to convince my colleagues that
> mixed model is a better way than z-transform, even for such a simple model,
> for which it would be not only an easier or more elegant way to do the same
> .
>
> (I know that the "good model" is the mixed model with a random slope, and
> that
> this time  the "z-model" and the mixed one cannot be compared )
>
>
> Thank you for your help.
>
>
> Dear all
>
> In my domain (phonetics) , it is usual to z-transform the response
> (i.e. using  scale( )   -by subjects, for example-  )
> before doing classical regression analysis. I'm not awared enough in
> statistics to
> see and explain all the statistical and fundamental differences there are
> between this approach
> and mixed models.
>
> For example , with the data "sleepstudy" from package lme4, is there
> something wrong or
> dubious  with the following model ?  :
>
>
> (a)    lm( zreaction ~ Days ,data=sleepstudy)
> where zreaction is   Reaction   scaled by Subject.
>
>
> to be compared with:
>
> (b)    lmer( Reaction ~Days +(1|Subject), data=sleepstudy)
>
>
> (a) still considers all the z-measures as independant, and I think that
> it is still "dubious" , despite the fact that after the scaling, all the
> zreaction
> have a  mean==0 and a sd ==1. Am I right ?
>
> Apparently, there are no differences between (a) and the following mixed
> model :
>
> (c) lmer(zreac ~ Days + (1 | Subject) ...)
>
> (Of course this last  model found a null variance inter-subject).
>
> Would differences be appeared if  I run simulations on (a) ,(b) and (c) to
> test
> the effect of Days ?
>
> I'm looking for "good" arguments to convince my colleagues that
> mixed model is a better way than z-transform, even for such a simple model,
> for which it would be not only an easier or more elegant way to do the same
> .
>
> (I know that the "good model" is the mixed model with a random slope, and
> that
> this time  the "z-model" and the mixed one cannot be compared )
>
>
> Thank you for your help.
>
>
> ######   output from   classical lm   on z scaling data
>
> Dear all
>
> In my domain (phonetics) , it is usual to z-transform the response
> (i.e. using  scale( )   -by subjects, for example-  )
> before doing classical regression analysis. I'm not awared enough in
> statistics to
> see and explain all the statistical and fundamental differences there are
> between this approach
> and mixed models.
>
> For example , with the data "sleepstudy" from package lme4, is there
> something wrong or
> dubious  with the following model ?  :
>
>
> (a)    lm( zreaction ~ Days ,data=sleepstudy)
> where zreaction is   Reaction   scaled by Subject.
>
>
> to be compared with:
>
> (b)    lmer( Reaction ~Days +(1|Subject), data=sleepstudy)
>
>
> the model (a) still considers all the z-measures as independant, and I think
> that
> it is still "dubious" , despite the fact that after the scaling, all the
> zreaction
> have a  mean==0 and a sd ==1. Am I right ?
>
> Apparently, there are no differences between (a) and the following mixed
> model :
>
> (c) lmer(zreac ~ Days + (1 | Subject) ...)
>
> (Of course this last  model found a null variance inter-subject).
>
> Am I wrong when I expect some (hidden) differences betwenn (a) and (c) ?
>
> Would differences be appeared if  I run simulations on (a) ,(b) and (c) to
> test
> the effect of Days ?
>
> I'm looking for "good" arguments to convince my colleagues that
> mixed model is a better way than z-transform for such a simple model,
> for which it would be not only an easier or more elegant way to do the same
> .
> (I know that the "good model" is the mixed model with a random slope, and
> that
> this time  the "z-model" and the mixed one cannot be compared )
>
>
> Thank you for your help.
>
>
> ######   : output from   classical lm   on the z-scaled data ,model (a)
>
>> summary( fm0z.lm)
>
> Call:
> lm(formula = zreac ~ Days, data = zsleep)
>
> Residuals:
>    Min       1Q   Median       3Q      Max
> -1.98864 -0.36035  0.01233  0.35292  2.55175
>
> Coefficients:
>           Estimate Std. Error t value Pr(>|t|)   (Intercept) -1.06168
>  0.09249  -11.48   <2e-16 ***
> Days         0.23593    0.01733   13.62   <2e-16 ***
>
>
> Residual standard error: 0.6676 on 178 degrees of freedom
>
> ###  output from lmer on z-scaled data  , model (c)
>
>> summary( fm0z.lmer)
> Linear mixed model fit by REML
> Formula: zreac ~ Days + (1 | Subject)
>  Data: zsleep
>  AIC   BIC logLik deviance REMLdev
> 381.8 394.6 -186.9    363.4   373.8
> Random effects:
> Groups   Name        Variance Std.Dev.
> Subject  (Intercept) 0.00000  0.00000
> Residual             0.44574  0.66764
> Number of obs: 180, groups: Subject, 18
>
> Fixed effects:
>           Estimate Std. Error t value
> (Intercept) -1.06168    0.09249  -11.48
> Days         0.23593    0.01733   13.62
>
> Correlation of Fixed Effects:
>    (Intr)
> Days -0.843
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>