[R-sig-ME] questions about parsimonious mixed models

Wed Oct 26 16:13:44 CEST 2016

Hi there,

I tried to follow the suggestions in Bates et al.'s parsimonious mixed model
paper (link <https://arxiv.org/abs/1506.04967> ) and I'm having some
problems. Basically, I started with a maximal model (m0) and then a
zero-correlation-parameter model (m1), then I reduced the model by taking
out near-zero variance components (m2). Once I extended the reduced model
with correlation parameters (m3), the model becomes degenerate (some
correlation parameters are 1 or -1). However, the likelihood ratio test
(anova()) suggests that m3 provides a significantly better fit than m2. How
should I proceed? What would be the optimal model in cases like this? 

I'm not sure if I did something wrong in the process, so I'll detail what I
did below.

Thanks!

Wing-Yee Chow

==================================

My experiment has a 2x2 within-participant design with 24 participants and
48 items. Both factors (Congruity and SentenceType) are categorical and I
specified the contrasts this way:

> contrasts(tempdata$congruity) <- contr.sum(2)/2

> contrasts(tempdata$sentencetype) <- contr.sum(2)/2

I started with a maximal model, and rePCA() suggests that it's
over-parameterised:

> summary(m0 <- lmer(value ~ sentencetype * congruity + (sentencetype *
congruity|subj) + (sentencetype * congruity|item), REML=FALSE,

           data=tempdata))

Linear mixed model fit by maximum likelihood  ['lmerMod']

Formula: value ~ sentencetype * congruity + (sentencetype * congruity |
subj) + (sentencetype * congruity | item)

   Data: tempdata

     AIC      BIC   logLik deviance df.resid 

 14415.9  14540.2  -7182.9  14365.9     1045 

Scaled residuals: 

    Min      1Q  Median      3Q     Max 

-2.6228 -0.6275 -0.1785  0.4124  6.1246 

Random effects:

Groups   Name                     Variance Std.Dev. Corr             

 item     (Intercept)               5412.2   73.568                   

          sentencetype1             3316.3   57.588   0.92            

          congruity1                 866.0   29.429   0.58  0.85      

          sentencetype1:congruity1   138.1   11.752   0.51  0.15 -0.40

subj     (Intercept)              10789.8  103.874                   

          sentencetype1               56.2    7.497   1.00            

          congruity1                 835.1   28.898  -1.00 -1.00      

          sentencetype1:congruity1  2279.8   47.747   1.00  1.00 -1.00

Residual                          34511.5  185.773                   

Number of obs: 1070, groups:  item, 48; subj, 24

Fixed effects:

                         Estimate Std. Error t value

(Intercept)                388.84      24.39  15.944

sentencetype1               56.27      14.18   3.968

congruity1                 -18.19      13.51  -1.346

sentencetype1:congruity1   -20.90      24.82  -0.842

Correlation of Fixed Effects:

            (Intr) sntnc1 cngrt1

sentenctyp1  0.328              

congruity1  -0.300  0.109       

sntnctyp1:1  0.357  0.047 -0.186

>  summary(rePCA(m0))

$item

Importance of components:

                         [,1]    [,2]      [,3] [,4]

Standard deviation     0.5070 0.15787 1.228e-06    0

Proportion of Variance 0.9116 0.08838 0.000e+00    0

Cumulative Proportion  0.9116 1.00000 1.000e+00    1

$subj

Importance of components:

                        [,1]      [,2]      [,3] [,4]

Standard deviation     0.636 1.045e-06 1.895e-07    0

Proportion of Variance 1.000 0.000e+00 0.000e+00    0

Cumulative Proportion  1.000 1.000e+00 1.000e+00    1

Then, I converted the factors to numeric covariates using model.matrix() and
did the zero-correlation-parameter model:

> summary(m1 <- lmer(value ~ sentencetype * congruity + (cSenttype *
cCongruity||subj) + (cSenttype * cCongruity||item), REML=FALSE,

           data=tempdata))

Linear mixed model fit by maximum likelihood  ['lmerMod']

Formula: value ~ sentencetype * congruity + ((1 | subj) + (0 + cSenttype |  

    subj) + (0 + cCongruity | subj) + (0 + cSenttype:cCongruity |
subj)) + ((1 | item) + (0 + cSenttype | item) + (0 + cCongruity |  

    item) + (0 + cSenttype:cCongruity | item))

   Data: tempdata

     AIC      BIC   logLik deviance df.resid 

 14419.9  14484.5  -7196.9  14393.9     1057 

Scaled residuals: 

    Min      1Q  Median      3Q     Max 

-2.4709 -0.6110 -0.1823  0.4011  6.1156 

Random effects:

Groups   Name                 Variance  Std.Dev. 

 item     cSenttype:cCongruity 0.000e+00 0.000e+00

item.1   cCongruity           0.000e+00 0.000e+00

item.2   cSenttype            2.478e+03 4.978e+01

item.3   (Intercept)          5.268e+03 7.258e+01

subj     cSenttype:cCongruity 0.000e+00 0.000e+00

subj.1   cCongruity           3.473e+01 5.893e+00

subj.2   cSenttype            3.819e-11 6.180e-06

subj.3   (Intercept)          1.079e+04 1.039e+02

Residual                      3.543e+04 1.882e+02

Number of obs: 1070, groups:  item, 48; subj, 24

Fixed effects:

                         Estimate Std. Error t value

(Intercept)                388.41      24.34  15.957

sentencetype1               57.23      13.59   4.210

congruity1                 -17.72      11.60  -1.527

sentencetype1:congruity1   -20.81      23.07  -0.902

Correlation of Fixed Effects:

            (Intr) sntnc1 cngrt1

sentenctyp1 -0.002              

congruity1   0.000 -0.001       

sntnctyp1:1  0.000 -0.002 -0.009

Once again rePCA() suggests that the model is over-parameterised:

> summary(rePCA(m1))

$item

Importance of components:

                         [,1]   [,2] [,3] [,4]

Standard deviation     0.3856 0.2645    0    0

Proportion of Variance 0.6801 0.3200    0    0

Cumulative Proportion  0.6801 1.0000    1    1

$subj

Importance of components:

                         [,1]    [,2]      [,3] [,4]

Standard deviation     0.5518 0.03131 3.283e-08    0

Proportion of Variance 0.9968 0.00321 0.000e+00    0

Cumulative Proportion  0.9968 1.00000 1.000e+00    1

So then I reduced the model by taking out 4 variance components that are
zero/near-zero, and rePCA suggests that this reduced model is no longer
over-parameterised:

> summary(m2<-lmer(value ~ sentencetype * congruity + (cCongruity||subj) +
(cSenttype||item),REML=FALSE, data=tempdata))

Linear mixed model fit by maximum likelihood  ['lmerMod']

Formula: value ~ sentencetype * congruity + ((1 | subj) + (0 + cCongruity |
subj)) + ((1 | item) + (0 + cSenttype | item))

   Data: tempdata

     AIC      BIC   logLik deviance df.resid 

 14411.9  14456.6  -7196.9  14393.9     1061 

Scaled residuals: 

    Min      1Q  Median      3Q     Max 

-2.4709 -0.6110 -0.1823  0.4011  6.1156 

Random effects:

Groups   Name        Variance Std.Dev.

item     cSenttype    2478.42  49.784 

 item.1   (Intercept)  5267.94  72.581 

 subj     cCongruity     34.73   5.894 

 subj.1   (Intercept) 10785.48 103.853 

 Residual             35428.11 188.224 

Number of obs: 1070, groups:  item, 48; subj, 24

Fixed effects:

                         Estimate Std. Error t value

(Intercept)                388.41      24.34  15.957

sentencetype1               57.23      13.59   4.210

congruity1                 -17.72      11.60  -1.527

sentencetype1:congruity1   -20.81      23.07  -0.902

Correlation of Fixed Effects:

            (Intr) sntnc1 cngrt1

sentenctyp1 -0.002              

congruity1   0.000 -0.001       

sntnctyp1:1  0.000 -0.002 -0.009

> summary(rePCA(m2))

$item

Importance of components:

                         [,1]   [,2]

Standard deviation     0.3856 0.2645

Proportion of Variance 0.6801 0.3200

Cumulative Proportion  0.6801 1.0000

$subj

Importance of components:

                         [,1]    [,2]

Standard deviation     0.5518 0.03131

Proportion of Variance 0.9968 0.00321

Cumulative Proportion  0.9968 1.00000

Then I followed Bates et al's guidelines in extending this reduced model
with correlation parameters:

> summary(m3<-lmer(value ~ sentencetype * congruity + (cCongruity|subj) +
(cSenttype|item),REML=FALSE, data=tempdata))

Linear mixed model fit by maximum likelihood  ['lmerMod']

Formula: value ~ sentencetype * congruity + (cCongruity | subj) + (cSenttype
|      item)

   Data: tempdata

     AIC      BIC   logLik deviance df.resid 

 14395.2  14450.0  -7186.6  14373.2     1059 

Scaled residuals: 

    Min      1Q  Median      3Q     Max 

-2.6509 -0.6383 -0.1679  0.4151  5.9618 

Random effects:

Groups   Name        Variance Std.Dev. Corr 

 item     (Intercept)  5391.4   73.43        

          cSenttype    3085.8   55.55   1.00 

 subj     (Intercept) 10803.3  103.94        

          cCongruity    826.1   28.74   -1.00

Residual             35022.7  187.14        

Number of obs: 1070, groups:  item, 48; subj, 24

Fixed effects:

                         Estimate Std. Error t value

(Intercept)                388.76      24.40  15.933

sentencetype1               56.88      13.99   4.066

congruity1                 -18.10      12.88  -1.405

sentencetype1:congruity1   -20.94      22.93  -0.913

Correlation of Fixed Effects:

            (Intr) sntnc1 cngrt1

sentenctyp1  0.247              

congruity1  -0.396 -0.001       

sntnctyp1:1  0.000 -0.002 -0.007

The correlation parameters of 1 and -1 suggest that this model is
degenerate. This is consistent with the results of rePCA():

> summary(rePCA(m3))

$item

Importance of components:

                        [,1] [,2]

Standard deviation     0.492    0

Proportion of Variance 1.000    0

Cumulative Proportion  1.000    1

$subj

Importance of components:

                         [,1]      [,2]

Standard deviation     0.5762 1.678e-07

Proportion of Variance 1.0000 0.000e+00

Cumulative Proportion  1.0000 1.000e+00

However, the likelihood ration test suggests that m3 provides a
significantly better fit than m2:

> anova(m2, m3)

Data: tempdata

Models:

m2: value ~ sentencetype * congruity + ((1 | subj) + (0 + cCongruity | 

m2:     subj)) + ((1 | item) + (0 + cSenttype | item))

m3: value ~ sentencetype * congruity + (cCongruity | subj) + (cSenttype | 

m3:     item)

   Df   AIC   BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    

m2  9 14412 14457 -7196.9    14394                             

m3 11 14395 14450 -7186.6    14373 20.639      2  3.298e-05 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Questions: Should I have reduced the model further before extending it with
correlation parameters? How should I proceed (or what should I have done
differently)? 

	[[alternative HTML version deleted]]