# [R-sig-ME] Troublesome example of lmer fitting an overidentified model; Stata bad too.

Paul Johnson pauljohn32 at gmail.com
Wed Feb 24 18:12:00 CET 2016

```I'm teaching mixed models, working through a lot of homework problems
in the 2 volume set Multilevel and Longitudinal Modeling using Stata,
by S Rave-Hesketh and A. Skrondal. If you did not look at it yet, I
expect you'll find lots of useful, interesting exercises.

Their Exercise 4.4 is about yield from 10 wheat varieties.  The variables are:

"plot"     "variety"  "yield"    "moist"

variety is the grouping variable in the exercise, but it comes in as a
floating point number. I thought that was the root of the surprise
described here, but now I don't think so.

I was reading one student's homework in Stata and I was
stunned/surprised that Stata's "mixed" function would fit a random
effects model that included the grouping variable as a fixed effect
predictor as well.  This appeared to be grossly overidentified to me.

Here's the stata input and output,

. mixed yield c.moist##i.variety || variety:moist, mle stddev

Performing EM optimization:

Iteration 0:   log likelihood = -42.123671
Iteration 1:   log likelihood = -41.540417
Iteration 2:   log likelihood = -41.520012
Iteration 3:   log likelihood = -41.520012

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =         60
Group variable: variety                         Number of groups  =         10

Obs per group:
min =          6
avg =        6.0
max =          6

Wald chi2(19)     =   37795.42
Log likelihood = -41.520012                     Prob > chi2       =     0.0000

---------------------------------------------------------------------------------
yield |      Coef.   Std. Err.      z    P>|z|     [95%
Conf. Interval]
----------------+----------------------------------------------------------------
moist |   .6077128   .0124644    48.76   0.000      .583283
.6321425
|
variety |
2  |  -2.844581   .8429087    -3.37   0.001    -4.496652
-1.192511
3  |  -1.871277   .7903544    -2.37   0.018    -3.420343
-.3222105
4  |  -.3432833   .7323671    -0.47   0.639    -1.778696
1.09213
5  |   .2294764   1.121912     0.20   0.838     -1.96943
2.428383
6  |    3.46207    .690504     5.01   0.000     2.108707
4.815433
7  |  -12.05622   .6729943   -17.91   0.000    -13.37527
-10.73718
8  |   1.175528   .7302294     1.61   0.107    -.2556957
2.606751
9  |  -1.434985   .7917805    -1.81   0.070    -2.986846
.1168767
10  |   3.494313   1.392236     2.51   0.012     .7655807
6.223044
|
variety#c.moist |
2  |  -.0354326   .0260328    -1.36   0.173    -.0864559
.0155908
3  |   .1297872   .0190785     6.80   0.000     .0923941
.1671804
4  |   .0264085   .0210603     1.25   0.210    -.0148689
.067686
5  |   .0284318   .0240342     1.18   0.237    -.0186744
.075538
6  |   .0790988   .0161154     4.91   0.000     .0475132
.1106844
7  |   .1164752   .0189384     6.15   0.000     .0793566
.1535938
8  |   .0791545   .0188487     4.20   0.000     .0422118
.1160972
9  |    .080266   .0190236     4.22   0.000     .0429804
.1175516
10  |   .0027604   .0282729     0.10   0.922    -.0526535
.0581742
|
_cons |   34.58378   .5478187    63.13   0.000     33.51007
35.65748
---------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
variety: Independent         |
sd(moist) |   3.46e-15   2.27e-14      8.96e-21    1.34e-09
sd(_cons) |   1.18e-11   1.34e-07             0           .
-----------------------------+------------------------------------------------
sd(Residual) |   .4833867   .0441285      .4041925    .5780976
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 0.00                  Prob > chi2 = 1.0000

Note: LR test is conservative and provided only for reference.

As you can see, Stata deals with the overidentification by reporting
miniscule estimates for the random effects.

I think its a bug, wondered what lme4 would do. I predicted it would
refuse to fit at all, as soon as we declare variety as a factor
(varietyf).  However, that does not happen. The full R runable example
is below, but here's the bad part:

> m.wrong <- lmer(yield ~ moist*varietyf + (moist|varietyf), data = dat)
> summary(m.wrong)
Linear mixed model fit by REML ['lmerMod']
Formula: yield ~ moist * varietyf + (moist | varietyf)
Data: dat

REML criterion at convergence: 157.7

Scaled residuals:
Min       1Q   Median       3Q      Max
-1.74078 -0.52638 -0.00039  0.56385  1.79231

Random effects:
Groups   Name        Variance Std.Dev. Corr
varietyf (Intercept) 0.3505   0.592
moist       0.3505   0.592    0.00
Residual             0.3505   0.592
Number of obs: 60, groups:  varietyf, 10

Fixed effects:
Estimate Std. Error t value
(Intercept)       34.58378    0.89479   38.65
moist              0.60771    0.59222    1.03
varietyf2         -2.84458    1.32918   -2.14
varietyf3         -1.87128    1.27984   -1.46
varietyf4         -0.34328    1.22700   -0.28
varietyf5          0.22948    1.60904    0.14
varietyf6          3.46207    1.19003    2.91
varietyf7        -12.05622    1.17489  -10.26
varietyf8          1.17553    1.22509    0.96
varietyf9         -1.43498    1.28116   -1.12
varietyf10         3.49431    1.89960    1.84
moist:varietyf2   -0.03543    0.83786   -0.04
moist:varietyf3    0.12979    0.83758    0.15
moist:varietyf4    0.02641    0.83765    0.03
moist:varietyf5    0.02843    0.83777    0.03
moist:varietyf6    0.07910    0.83748    0.09
moist:varietyf7    0.11647    0.83757    0.14
moist:varietyf8    0.07915    0.83757    0.09
moist:varietyf9    0.08027    0.83757    0.10
moist:varietyf10   0.00276    0.83797    0.00

Here' s my full R

## Paul Johnson <pauljohn at ku.edu>
## 20160224
library(foreign)
## write.dta(dat, file = "whead.dta12")
## dat <- read.dta("wheat.dta12")
summary(dat)
## Variety should be an integer, so should plot, but we don't use it
dat\$varietyf <- as.factor(dat\$variety)

library(lme4)

## This is obviously overidentified/incoherent.
## lmer should bounce the user out.
m.wrong <- lmer(yield ~ moist*varietyf + (moist|varietyf), data = dat)
summary(m.wrong)

## Continue with the homework
m1 <-  lmer(yield ~ moist + (1|varietyf), data = dat)

m2 <- lmer(yield ~ moist + (moist|varietyf), data = dat)
library(lattice)
dotplot(ranef(m2, condVar = TRUE))

## Here's a shocker
anova(m2, m1)
## Really? Look at the pictures!

## Here's a spaghetti plot.
plot(m2)
m2coef <- coef(m2)[["varietyf"]]
m2coef\$variety <- rownames(m2coef)
plot(yield ~ moist, data = dat, col = dat\$variety)
apply(m2coef, 1, function(x){ browser();  abline(a = x, b = x,
col = x["variety"])})

## If Var(e) is small enough, even trivial slope differences are
## "statistically significant".

--
Paul E. Johnson
Professor, Political Science        Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org              http://crmda.ku.edu

```