[R-sig-ME] recover simulated individual-level random effects with glmer?

Thierry Onkelinx thierry.onkelinx at inbo.be
Tue Jun 23 14:11:11 CEST 2015


Dear Ian,

Just a little remark on the variance of the observation level random
effect (ORLE). sd = 3 leads to a huge effect.

- We assume ORLE ~ N(0, sigma)
- Let's look at the 2.5% quantile for small values and the 97.5% for
high values.
- Hence the 2.5% and 97.5% quantiles are low = -1.96 * sigma and high
= 1.96 * sigma.
- high - low = 3.912 * sigma
- The ORLE is estimated on the log-scale. Backtransforming the
difference give the ratio between both quantiles:
- log(high - low) = log(high/low) = 3.912* sigma
- high / low = exp(3.912 * sigma)

The last expression yields 127999 when sigma = 3. So high = 127999 * low!!!!

Bottom-line: be very, very, very careful when you use an OLRE when its
variance is large. The tricky part is that even low numbers like sigma
= 3 trigger huge effects.
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey


2015-06-22 20:47 GMT+02:00 Ben Bolker <bbolker op gmail.com>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 15-06-22 11:36 AM, Ian Carroll wrote:
>> I raised this question yesterday on Cross Validated (
>> stats.stackexchange.com/q/158043/43122), but now realize this might
>> be the more appropriate forum for help. Please excuse the
>> cross-post if you track both.
>>
>> I simulated count data with an observation level random effect,
>> then fit a Poisson-family GLMM using glmer (lme4 version 1.1-7).
>> The fitted random intercepts show a strange pattern when compared
>> to the simulated data, and produce a kinked QQ plot. If this is
>> expected, why is it okay? If not, what am I doing wrong?
>>
>> Simulated data and fitting:
>>> n <- 1000 df <- data.frame(obs=factor(1:n)) df$x <- rnorm(n) df$r
>>> <- rnorm(n, sd=3) df$y <- rpois(n=n, lambda=exp(2*df$x + df$r))
>>> glmer.fit <- glmer(y ~ x + (1|obs), family='poisson', data=df)
>>
>> Questioned results:
>>> df$r.est <- ranef(glmer.fit)$obs[ , '(Intercept)'] plot(df$r,
>>> df$r.est) qqnorm(df$r.est)
>>
>> My results (posted on CV) are two overlapping clusters of points:
>> in one r and r.est show the expected correlation, in the other
>> there is no correlation. The normal QQ plot makes it look like
>> different normal distributions are obtained for positive and
>> negative intercepts.
>>
>> Thanks, Ian
>
>
>   I think this is inevitable.  If you think about it, the distribution
> of the *conditional* modes of the observations with y=0 must be
> different from that of the conditional modes of the obs. with y>0.
> This is much easier to see with an even more trivial model that
> doesn't include a covariate (see below ...)
>
>   I don't think actually violates any of the assumptions we're using.
> If you're worried about its effects on estimation & inference you
> could do some simulation examples to see if it affects whatever it is
> you're interested in (type I error rate, bias/variance of parameter
> estimation, etc.)
>
> library(lme4)
> library(plyr) ## for mutate
> set.seed(101)
> n <- 1000
> df <- data.frame(obs=factor(1:n),x=rnorm(n))
> df <- mutate(df,
>              r=rnorm(n, sd=3),
>              y=rpois(n=n, lambda=exp(2*x + r)),
>              y2 = rpois(n=n,lambda=exp(r)))
>
>
> glmer.fit <- glmer(y ~ x + (1|obs), family='poisson', data=df)
> glmer.fit2 <- update(glmer.fit,nAGQ=20)
> glmer.fit3 <- update(glmer.fit, y2 ~ 1 + (1|obs))
>
> ## df$r.est <- ranef(glmer.fit2)$obs[ , '(Intercept)']
> df$r.est <- ranef(glmer.fit3)$obs[ , '(Intercept)']
> cvec <- c("#000000","#FF000080")
> cvec2 <- cvec[as.numeric(df$y2==0)+1]
> plot(df$r, df$r.est,col=cvec2)
>
> qqnorm(df$r.est,col=cvec[as.numeric(df$y2==0)+1])
>
> df$y2cat <- cut(df$y2,right=FALSE,include.lowest=TRUE,
>                     breaks=c(0:6,Inf))
> library("ggplot2"); theme_set(theme_bw())
> ggplot(df,aes(r.est,fill=y))+
>     geom_histogram(alpha=0.5,position="identity",
>                    aes(y = ..density..))
> ggplot(df,aes(r.est,fill=ycat))+
>     geom_density(alpha=0.5,position="identity")
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> iQEcBAEBAgAGBQJViFhMAAoJEOCV5YRblxUH8gMIAMiLYfW2C57AYTyrVjb660wg
> uvAkJGQjZ9ORIdm/MqDRydKTuNV16r3WfwSPXWCdLDBJjpdDq2w3dO5CxsbYjOH+
> q7L0NWdX0KRjb1oKF2muLbz38dvH6Z4ExjOmJlW3funuVzCVvzhIGV7N/iIMqp5D
> yTnK2y/YeVySuO91MqBgbKNtH/mGTfH1iGMbmhrssF1y8EfeWrw6RF7/uti8vhWE
> cvAUDtkz74Q1gfJVyeCRPVRj7ADRb6tN7FuWbqukfEoK5etqYkO5QRIv6BLR4qoU
> Hs+zPCymno6JZZDpY/D1UG5HChyXJZjFN6Xj7e+gnPtAvBUa5UTIxfy+z1cblT4=
> =Ygc1
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> R-sig-mixed-models op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



More information about the R-sig-mixed-models mailing list