[R-sig-ME] Troublesome example of lmer fitting an overidentified model; Stata bad too.
Paul Johnson
pauljohn32 at gmail.com
Wed Feb 24 18:12:00 CET 2016
I'm teaching mixed models, working through a lot of homework problems
in the 2 volume set Multilevel and Longitudinal Modeling using Stata,
by S Rave-Hesketh and A. Skrondal. If you did not look at it yet, I
expect you'll find lots of useful, interesting exercises.
Their Exercise 4.4 is about yield from 10 wheat varieties. The variables are:
"plot" "variety" "yield" "moist"
variety is the grouping variable in the exercise, but it comes in as a
floating point number. I thought that was the root of the surprise
described here, but now I don't think so.
I was reading one student's homework in Stata and I was
stunned/surprised that Stata's "mixed" function would fit a random
effects model that included the grouping variable as a fixed effect
predictor as well. This appeared to be grossly overidentified to me.
Here's the stata input and output,
. mixed yield c.moist##i.variety || variety:moist, mle stddev
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -42.123671
Iteration 1: log likelihood = -41.540417
Iteration 2: log likelihood = -41.520012
Iteration 3: log likelihood = -41.520012
Computing standard errors:
Mixed-effects ML regression Number of obs = 60
Group variable: variety Number of groups = 10
Obs per group:
min = 6
avg = 6.0
max = 6
Wald chi2(19) = 37795.42
Log likelihood = -41.520012 Prob > chi2 = 0.0000
---------------------------------------------------------------------------------
yield | Coef. Std. Err. z P>|z| [95%
Conf. Interval]
----------------+----------------------------------------------------------------
moist | .6077128 .0124644 48.76 0.000 .583283
.6321425
|
variety |
2 | -2.844581 .8429087 -3.37 0.001 -4.496652
-1.192511
3 | -1.871277 .7903544 -2.37 0.018 -3.420343
-.3222105
4 | -.3432833 .7323671 -0.47 0.639 -1.778696
1.09213
5 | .2294764 1.121912 0.20 0.838 -1.96943
2.428383
6 | 3.46207 .690504 5.01 0.000 2.108707
4.815433
7 | -12.05622 .6729943 -17.91 0.000 -13.37527
-10.73718
8 | 1.175528 .7302294 1.61 0.107 -.2556957
2.606751
9 | -1.434985 .7917805 -1.81 0.070 -2.986846
.1168767
10 | 3.494313 1.392236 2.51 0.012 .7655807
6.223044
|
variety#c.moist |
2 | -.0354326 .0260328 -1.36 0.173 -.0864559
.0155908
3 | .1297872 .0190785 6.80 0.000 .0923941
.1671804
4 | .0264085 .0210603 1.25 0.210 -.0148689
.067686
5 | .0284318 .0240342 1.18 0.237 -.0186744
.075538
6 | .0790988 .0161154 4.91 0.000 .0475132
.1106844
7 | .1164752 .0189384 6.15 0.000 .0793566
.1535938
8 | .0791545 .0188487 4.20 0.000 .0422118
.1160972
9 | .080266 .0190236 4.22 0.000 .0429804
.1175516
10 | .0027604 .0282729 0.10 0.922 -.0526535
.0581742
|
_cons | 34.58378 .5478187 63.13 0.000 33.51007
35.65748
---------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
variety: Independent |
sd(moist) | 3.46e-15 2.27e-14 8.96e-21 1.34e-09
sd(_cons) | 1.18e-11 1.34e-07 0 .
-----------------------------+------------------------------------------------
sd(Residual) | .4833867 .0441285 .4041925 .5780976
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 0.00 Prob > chi2 = 1.0000
Note: LR test is conservative and provided only for reference.
As you can see, Stata deals with the overidentification by reporting
miniscule estimates for the random effects.
I think its a bug, wondered what lme4 would do. I predicted it would
refuse to fit at all, as soon as we declare variety as a factor
(varietyf). However, that does not happen. The full R runable example
is below, but here's the bad part:
> m.wrong <- lmer(yield ~ moist*varietyf + (moist|varietyf), data = dat)
> summary(m.wrong)
Linear mixed model fit by REML ['lmerMod']
Formula: yield ~ moist * varietyf + (moist | varietyf)
Data: dat
REML criterion at convergence: 157.7
Scaled residuals:
Min 1Q Median 3Q Max
-1.74078 -0.52638 -0.00039 0.56385 1.79231
Random effects:
Groups Name Variance Std.Dev. Corr
varietyf (Intercept) 0.3505 0.592
moist 0.3505 0.592 0.00
Residual 0.3505 0.592
Number of obs: 60, groups: varietyf, 10
Fixed effects:
Estimate Std. Error t value
(Intercept) 34.58378 0.89479 38.65
moist 0.60771 0.59222 1.03
varietyf2 -2.84458 1.32918 -2.14
varietyf3 -1.87128 1.27984 -1.46
varietyf4 -0.34328 1.22700 -0.28
varietyf5 0.22948 1.60904 0.14
varietyf6 3.46207 1.19003 2.91
varietyf7 -12.05622 1.17489 -10.26
varietyf8 1.17553 1.22509 0.96
varietyf9 -1.43498 1.28116 -1.12
varietyf10 3.49431 1.89960 1.84
moist:varietyf2 -0.03543 0.83786 -0.04
moist:varietyf3 0.12979 0.83758 0.15
moist:varietyf4 0.02641 0.83765 0.03
moist:varietyf5 0.02843 0.83777 0.03
moist:varietyf6 0.07910 0.83748 0.09
moist:varietyf7 0.11647 0.83757 0.14
moist:varietyf8 0.07915 0.83757 0.09
moist:varietyf9 0.08027 0.83757 0.10
moist:varietyf10 0.00276 0.83797 0.00
Here' s my full R
## Paul Johnson <pauljohn at ku.edu>
## 20160224
library(foreign)
dat <- read.dta("http://www.stata-press.com/data/mlmus3/wheat.dta")
## write.dta(dat, file = "whead.dta12")
## dat <- read.dta("wheat.dta12")
summary(dat)
## Variety should be an integer, so should plot, but we don't use it
dat$varietyf <- as.factor(dat$variety)
library(lme4)
## This is obviously overidentified/incoherent.
## lmer should bounce the user out.
m.wrong <- lmer(yield ~ moist*varietyf + (moist|varietyf), data = dat)
summary(m.wrong)
## Continue with the homework
m1 <- lmer(yield ~ moist + (1|varietyf), data = dat)
m2 <- lmer(yield ~ moist + (moist|varietyf), data = dat)
library(lattice)
dotplot(ranef(m2, condVar = TRUE))
## Here's a shocker
anova(m2, m1)
## Really? Look at the pictures!
## Here's a spaghetti plot.
plot(m2)
m2coef <- coef(m2)[["varietyf"]]
m2coef$variety <- rownames(m2coef)
plot(yield ~ moist, data = dat, col = dat$variety)
apply(m2coef, 1, function(x){ browser(); abline(a = x[1], b = x[2],
col = x["variety"])})
## If Var(e) is small enough, even trivial slope differences are
## "statistically significant".
--
Paul E. Johnson
Professor, Political Science Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org http://crmda.ku.edu
More information about the R-sig-mixed-models
mailing list