[R-sig-ME] Simulating from a fitted genealised linear mixed model.

Rolf Turner r@turner @end|ng |rom @uck|@nd@@c@nz
Tue Aug 9 07:19:48 CEST 2022

This is a second try at conveying what I am trying to do, my first try
("newdata" argument of simulate.merMod()) having failed.

John Maindonald tried to help me in respect of my first attempt, but I
think that his response missed the point.  Or maybe I'm just being
thick, as is so often the case.

My eventual goal is to take the parameters of a fitted generalised
linear mixed model (binomial) and using those parameters simulate data
from a different experimental design.  (Say one with a greater number
of replicates and/or a larger number of binomial trials for each

Note that the experimental design is determined by the predictors
in the data set in question, and the binomial "size", i.e. the
totals of successes and failures corresponding to each observation.

As I said in my previous (failed) attempt to explain myself, I thought
or hoped to be able to adjust the experimental design by providing a
"newdata" data set to simulate.merMod() where the new data set
conformed to the desired design.  However I have not been able to
get this to work.

In this current email I suppress the issue of dealing with the "newdata"
argument and simply focus on the problem of simulating data from
a fitted generalised linear model.

I have implemented a "roll-your-own" procedure wherein I construct
a linear predictor using

    * fitted fixed effect coefficients from the fitted mode
    * random effects simulated on the basis of the fitted
      random effects variances and covarances

I then form probabilities as the logistic function of the linear
predictor, and finally use rbinom() to generate a random sample
from these probabilities and the desired "sizes".

I illustrated this procedure in the file "demo.txt" that I attached
to my previous post.  I have reattached demo.txt to this current
post as "demo01.txt".

In demo01.txt the model is fitted (to the cbpp data from the lme4
package) using the formula:

    cbind(incidence, size - incidence) ~ 0 + period + (1 | herd)

The value of the linear predictor corresponding to the i-th row of the
underlying data frame is

    coef_j(i) + Z_k(i)

where coef_j(i) is the estimated fixed effect coefficient for the
j-th level of the fixed effect where the j-th level is that found
in the i-th row of the data.  Likewise Z_k(i) is the k-th entry of a
vector of N(0,sigma^2) values generated using rnorm(nrep,0,sigma) where
nrep is the number of replicates.  The index k(i) is the index of the
level of "Rep" that is found in the i-th row of the data.

For a *single* random effect the roll-your-own results agree with
the results from simulate.merMod(), provided that seeds are properly
set (and provided that the experimental design is unchanged from that of
the data set to which the model was fitted ---  no "newdata" involved.)

However I cannot get roll-your-own to agree with simulate.merMod() when
there are *two* random effects in the model.  I have illustrated this in
the attached file demo02.txt.  In this example there is a numeric
predictor "x", a treatment factor "trtmnt" (with four levels) and
a random effect "batch".

The model is fitted using the formula

    cbind(Good,Bad) ~ (trtment + 0)/x + (x | batch)

so there are two random effects; an intercept and a slope corresponding
to each batch.

The linear predictor must (I think) take the form

    alpha_j(i) + beta_j(i)*x + Z[k(i),1] + Z[k(i),2]*x

Here Z is an nbatch x 2 matrix of randomly generated values produced
by mvrnorm(nbatch,c(0,0),Sigma) where Sigma is the estimated covariance
matrix of the random effects, obtained from the fitted model.  Of course
"nbatch" is the number of levels of the random batch effect.

The results from simulate.merMod() differ here from my roll-your-own
results, although they are not "too different".  Plotting s.mer against
s.ryo produces a scatter which is "close" to a straight line.  For some
value of "close".

It seems that simulate.merMod() is not doing (quite) what my
roll-your-own procedure does.  Why not? Am I misunderstanding what the
random effects are, in the second, slightly more complicated model?

I would like to be confident that my roll-your-own procedure is
actually correct, because I can very easily apply my roll-your-own
procedure to a new data set corresponding to a different experimental

I would be grateful for any enlightenment.


Rolf Turner

P.S. It is probably  easier to understand the code in demo01.txt and
demo02.txt than it is to understand my somewhat prolix explanation
given above.

R. T.

Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: demo01.txt
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20220809/d3ff5e8f/attachment.txt>

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: demo02.txt
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20220809/d3ff5e8f/attachment-0001.txt>

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: data.txt
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20220809/d3ff5e8f/attachment-0002.txt>

More information about the R-sig-mixed-models mailing list