[R-sig-ME] Expected correlation in a mixed model

Tue Dec 21 17:46:34 CET 2010

Thanks, all, for the discussion on this.

My take-home message seems to be that messing around with lme or
lmer-like calculations is simply not very sensible with small numbers of
groups. Is that fair? 

(I already knew that anyone who's asking for variance info with four
instances is asking for very imprecise estimates indeed; the question is
more whether lmer or lme, as opposed to some classical estimate, would
be more unsafe because of the sometimes inconvenient convergence than is
already implied by the low number of instances.)

Or is the message less restrictive; for example that one needs to take
singular or zero variance/covariance information as a warning that there
really wasn't enough info in the data to form sensible estimates?

Steve Ellison

>>> Douglas Bates <bates at stat.wisc.edu> 08/12/2010 16:53:49 >>>
On Mon, Nov 22, 2010 at 9:07 AM, Ben Bolker <bbolker at gmail.com> wrote:
>  However: centering the concentration doesn't actually have much
effect
> in this example (which it would if the remoteness from the origin
were
> the problem), i.e.
>
> concsc <- scale(conc)
> (lmer1<-lmer(od~concsc+(concsc|run)))
>
>  This setup seems a little bit odd to me:
>  * the data set size is fairly small -- only 4 levels of the random
> effect ('run'), which often leads to this sort of collapse (zero
> variances and/or perfect correlations)

Exactly.  You are trying to estimate three variance components (two
variances and a covariance) from four groups.  That is not very much
information for what you are trying to estimate.

Bear in mind that it is generally more difficult to estimate a
variance or a covariance than it is to estimate a coefficient in a
linear predictor.  If you use the verbose=TRUE option you can see the
path of the iterations with respect to the three relative variance
component parameters. In lme4a the verbose output shows

> (lmer1<-lmer(od~conc+(conc|run), verbose=TRUE))
npt = 7 , n =  3
rhobeg =  0.2 , rhoend =  2e-07
   0.020:  12:     -2.66406; 1.07341 0.0671561  0.00000
  0.0020:  19:     -2.73319; 1.12766 0.161299  0.00000
 0.00020:  27:     -2.73356; 1.11840 0.161578  0.00000
 2.0e-05:  34:     -2.73357; 1.11786 0.161331 2.74933e-05
 2.0e-06:  40:     -2.73357; 1.11777 0.161302  0.00000
 2.0e-07:  44:     -2.73357; 1.11777 0.161301  0.00000
At return
 48:    -2.7335660:  1.11777 0.161300 1.10253e-08
Linear mixed model fit by REML ['merMod']
Formula: od ~ conc + (conc | run)
REML criterion at convergence: -2.7336

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 run      (Intercept) 0.053559 0.23143
          conc        0.001115 0.03340  1.000
 Residual             0.042867 0.20704
Number of obs: 60, groups: run, 4

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.52654    0.12148   4.334
conc         0.92236    0.07701  11.977

Correlation of Fixed Effects:
     (Intr)
conc 0.001

The important thing about the iterations is that the first and third
parameters are constrained to be non-negative and the third parameter
immediately is driven to zero.

Both the REML and the ML criteria try to balance complexity of the
model versus the fidelity to the data.  It happens that the way that
the complexity is defined, the least complex models have a singular
variance-covariance matrix for the random effects.  Unless there is
sufficient information in the data to make a non-singular
variance-covariance matrix then the criterion will drive it to
singularity.

Also, as Ben notes you are not simulating from the model that you are
fitting.  You are simulating from a simpler model and that's what the
fit tends towards.

>  * there is no variation in slopes across runs (the only randomness
> here is the error term).  Perhaps what you're looking for is
>
> (lmer2<-lmer(od~concsc+(1|run) + (0+concsc|run)))
>
>  which fixes the correlation at zero.
>
>  * it's also the case here that the random effect on the intercept
of
> 'run' is uniformly distributed, rather than normal -- I don't know
if
> that would have an effect.
>
>  Ben Bolker
>
>
>
> On 11/22/2010 09:44 AM, Andrew Robinson wrote:
>> Yes indeed --- remoteness of the data from the origin is a
plausible
>> explanation.
>>
>> Cheers
>>
>> Andrew
>>
>> On Mon, Nov 22, 2010 at 8:50 PM, S Ellison <S.Ellison at lgc.co.uk>
wrote:
>>
>>> Forgive the possibly numb-brained question, but is there a reason
why
>>> the correlation between random effects coefficients in lmer should
come
>>> out as identically 1.0 in a model of the form
>>>
>>> lmer(x ~ a + (a|b) )
>>>
>>> ?
>>>
>>> An example:
>>> set.seed(403)
>>> require(lme4)
>>> run <- gl(4, 15)
>>> conc <- rep(rep(c(0,0.1, 0.2, 0.4, 1.0), 3), 4)
>>> boxplot(conc~run)
>>> offset=0.2*as.numeric(run)
>>> od <- offset+conc+rnorm(60, 0, 0.2)
>>> plot(conc, od)
>>>
>>> (lmer1<-lmer(od~conc+(conc|run)))
>>> VarCorr(lmer1)
>>>
>>>
>>> S Ellison
>>> LGC
>>>
>>>
*******************************************************************
>>> This email and any attachments are confidential. Any
u...{{dropped:21}}
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models 
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models 
>

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}