[R-sig-ME] Number of random effects estimated with different lmer specifications

Wed May 8 10:51:12 CEST 2024

Hello!

I'm confused about the error the lmer function sometimes gives, "Error: number of observations (=n) <= number of random effects (=n) for term (x| id); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable"
Regarding this error, I'm confused about how the "number of random effects" is defined. My na�ve understanding was that if you have, say, 10 clusters, then a random intercept model estimates 10 random effects (i.e., a random intercept for all clusters). If you have one random intercept and one random slope, the model would estimate 10+10+1=21 random effects (random intercept and slope for all participants plus the correlation between them). With 2 random intercepts and 2 slopes it would be 10+10+10+10+2 (or 3, depending on the exact random effects syntax)=42 (or 43).

However, experimenting a little showed me that the correlations between random effects do not seem to be included into the number of random effects, so I'll forget them for now.

My main puzzlement comes from this: I generated a simple dataset of 10 clusters ("id") and 3 observations per cluster ("time"), as well as a level 1 predictor ("x") and ran the following models:

mod1<-lmer(y ~ x + time + (time|id) + (x|id), data=d)

This model converges (though the correlations between random effects are 1 and -1, which is probably just due to my sloppy data generation process) with no errors or warnings.

Then I ran this model:

mod2<-lmer(y ~ x + time + (time + x |id), data=d)

This model won't converge and I get the error:

Error: number of observations (=30) <= number of random effects (=30) for term (time + x | id); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable

********************
I thought that the first model would estimate 40 random effects (two intercepts and two slopes for each of the 10 clusters), and the second model would estimate 30 (1 intercept and 2 slopes for each cluster). This seems to be correct regarding the second model, but why is the first model seemingly estimating less random effects (less than 40, and apparently also less than 30)?

I do apologize if this is very basic; I don't have math or proper stats background, just applied stats. I did read the manual, and several online discussions regarding this error before posting. I didn't find the answer in the manual (this may well be due to my own incompetence), and the online discussions (e.g. https://stats.stackexchange.com/questions/193678/number-of-random-effects-is-not-correct-in-lmer-model) seem to support my initial intuition, which however is clearly wrong. What am I missing?

Thank you in advance if someone can help!

-Sointu

*********************
My code for the above:

set.seed(12345)
id1<-c(1:10)
id<-rep(id1, each=3)
t<-c(1:3)
time<-rep(t, times=10)
x<-rnorm(30, 3,1)
err<-rnorm(10,0,1)
err2<-rep(err, each=3)
y<-3+0.2*x+0.3*time+rnorm(30)+err2
d<-data.frame(id, time, x, y)
mod1<-lmer(y ~ x + time + (time|id) + (x|id),data=d)
summary(mod1)
mod2<-lmer(y ~ x + time + (time + x|id),data=d)

	[[alternative HTML version deleted]]