[R-sig-ME] Anomalous results with glmer().
Rolf Turner
r.turner at auckland.ac.nz
Wed May 28 03:59:33 CEST 2014
Some months back I sent an inquiry to this list concerning the analysis
of some linguistics data with which I am involved. I am *still*
(psigh!!!) struggling with these data and am getting results which are
making no sense to me.
Basically if I fit a (reasonably sensible) model using an old version of
lme4 (0.999999-0) I get sensible looking estimates for the fixed effect
coefficients, but the estimates of the variances of the random effects
are essentially zero. Which is silly.
If I fit the same model using lme4 version 1.1-7 (and ignore the warning
about failure to converge) I get sensible looking estimates of the
variances of the random effects, but an impossibly wrong estimate
of at least one of the fixed effect coefficients. (The estimate says
that the success probability is larger for phoneme type "Mclus" than it
is for the baseline type "Fclus". However a raw tabulation show that
the success probability for Mclus is much, much smaller than for Fclus.
I have included more detail in the attached file notesME.txt for those
who are interested. This file induces explicit specification of the
model that I used. The results of the fit using version 0.999999-0 are
in the file oldLme4Rslts.txt; the results from version 1.1-7 are in
newLme4Rslts.txt.
The data set is a bit too big to attach; it has 62601 records. I have
therefore made it available (as a *.csv file) on my web page:
https://www.stat.auckland.ac.nz/~rolf
Click on "Linguistics data for R-SIG-ME".
I am really being driven nuts by this weirdness and would appreciate
some avuncular advice from the knowledgeable. (Ben???)
cheers,
Rolf
--
Rolf Turner
Technical Editor ANZJS
-------------- next part --------------
The model that I used in R with the lme4 package was:
fit <- glmer(y ~ sex + type + age + (1|student) + (1|word),
family=binomial,data=X)
The type variable is the type of phoneme. Of course sex has two
levels, M and F. The age variable consists of the midpoint of the
age intervals (5.0 - 5.5, 5.6 - 5.9, 6.0 - 6.5, etc.) into which the
students were classified. This variable is treated as a numerical
covariate in the model that I use. The variables student and word
are treated as random effects. The response y is binary; 1 for
correct pronunciation of the phoneme in question, 0 for incorrect.
The model given above is simplistic and I was previously advised
to include interactions between the fixed effects and the random
effects. However (a) this brings my computer to its knees, and
(b) sufficient unto the day is the evil thereof.
What is puzzling me is the estimates of the type coefficients.
When I fit the model using an elderly version of lme4 (0.999999-0)
I get what appear to me to be sensible results for these (and
for the other fixed effects) but the variance estimates for the
random effects are essentially both zero. This *can't* be so!
When I fit the model using the latest version of lme4 (1.1-7), installed
from the r-forge repository, I get sensible estimates for the variances
of the random effects, but something is very strange about the
estimates of the type coefficients. Explicitly the Mclus (medial
consonant cluster) coefficient is positive --- "significantly" so
at the 0.10 level (p-value = about 0.08).
This is saying that Mclus yields higher success rate than the
baseline type Fclus (which is the first, and hence reference,
level of the type factor). Note that I stick with the default
"treatment contrasts".
However when I tabulated successes by phoneme type and I got the
following table:
type
y Fclus Fcon Iclus Icon Mclus Mcon Vowel
0 245 1216 2451 793 1077 1113 768
1 733 9216 2439 8661 227 9645 24017
Note that the (raw) success rate for "Mclus" is ***MUCH*** smaller
than for any other phoneme type, in particular than for Fclus.
So how can the model produce an indication that it is larger than
the success rate for "Fclus"?
Admittedly the raw tabulation does not (of course) allow for the random
student effect and word effect which are included in the "formal" model.
Even so, I find it hard to believe that there could be such a huge
discrepancy between the "raw result" an the "model result".
Also: Why is there this complete discordance between the results from
lme4 0.999999-0 and those from lme4 1.1-7?
I am toadally mystified.
-------------- next part --------------
Results from lme4 version 0.999999-0:
=====================================
Generalized linear mixed model fit by the Laplace approximation
Formula: y ~ sex + type + age + (1 | student) + (1 | word)
Data: lingDat
AIC BIC logLik deviance
35707 35807 -17843 35685
Random effects:
Groups Name Variance Std.Dev.
student (Intercept) 3.8939e-13 6.2401e-07
word (Intercept) 0.0000e+00 0.0000e+00
Number of obs: 62601, groups: student, 326; word, 50
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.80082 0.13114 -6.107 1.02e-09 ***
sexM -0.10162 0.02838 -3.581 0.000343 ***
typeFcon 0.93906 0.08036 11.686 < 2e-16 ***
typeIclus -1.11890 0.07972 -14.035 < 2e-16 ***
typeIcon 1.30674 0.08309 15.726 < 2e-16 ***
typeMclus -2.69497 0.10455 -25.778 < 2e-16 ***
typeMcon 1.07398 0.08080 13.292 < 2e-16 ***
typeVowel 2.36251 0.08287 28.507 < 2e-16 ***
age 0.30475 0.01667 18.280 < 2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Correlation of Fixed Effects:
(Intr) sexM typFcn typIcl typIcn typMcl typMcn typVwl
sexM -0.170
typeFcon -0.533 -0.002
typeIclus -0.510 0.005 0.861
typeIcon -0.518 -0.003 0.827 0.833
typeMclus -0.370 0.009 0.656 0.663 0.635
typeMcon -0.531 -0.002 0.850 0.856 0.822 0.653
typeVowel -0.523 -0.003 0.829 0.835 0.802 0.636 0.824
age -0.815 0.057 0.012 -0.023 0.014 -0.041 0.013 0.019
-------------- next part --------------
Results from lme4 version 1.1-7:
================================
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: y ~ sex + type + age + (1 | student) + (1 | word)
Data: lingDat
AIC BIC logLik deviance df.resid
27456.3 27555.8 -13717.2 27434.3 62590
Scaled residuals:
Min 1Q Median 3Q Max
-37.166 0.049 0.122 0.267 10.350
Random effects:
Groups Name Variance Std.Dev.
student (Intercept) 0.166 0.4074
word (Intercept) 3.301 1.8170
Number of obs: 62601, groups: student, 326 word, 50
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.55421 0.37306 -9.53 <2e-16 ***
sexM -0.13841 0.05597 -2.47 0.0134 *
typeFcon 4.45879 0.17727 25.15 <2e-16 ***
typeIclus 1.47383 0.17061 8.64 <2e-16 ***
typeIcon 4.33003 0.18051 23.99 <2e-16 ***
typeMclus 0.33574 0.19876 1.69 0.0912 .
typeMcon 3.41373 0.17724 19.26 <2e-16 ***
typeVowel 5.67714 0.17745 31.99 <2e-16 ***
age 0.39553 0.03253 12.16 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Correlation of Fixed Effects:
(Intr) sexM typFcn typIcl typIcn typMcl typMcn typVwl
sexM -0.120
typeFcon -0.435 -0.005
typeIclus -0.410 -0.001 0.912
typeIcon -0.431 -0.005 0.947 0.883
typeMclus -0.375 0.001 0.838 0.804 0.841
typeMcon -0.429 -0.004 0.936 0.881 0.938 0.850
typeVowel -0.436 -0.006 0.953 0.906 0.943 0.846 0.939
age -0.568 0.068 0.020 0.002 0.019 -0.006 0.015 0.025
More information about the R-sig-mixed-models
mailing list