[R-sig-ME] Simulation of Multilevel (nested) data

Fri Dec 4 15:31:43 CET 2009

Hi everyone,

I asked this question right before Thanksgiving, so I wonder if maybe it slid under the radar, so I'll ask again.  Does anyone know of any available code or actual R modules that can facilitate the process of simulating multilevel data?  

I would like to generate this data to use for some simulation for an already developed method for looking into various patterns of missingness and the subsequent models with lmer.  Specifically, I am interested in 3-level models where I can specify the correlation between predictors within and across levels.  

If this is not in the correct place, I would welcome suggestions as to where else I could look for help.

Thanks,
Chris Swoboda
University of Wisconsin - Madison

----- Original Message -----
From: r-sig-mixed-models-request at r-project.org
Date: Friday, December 4, 2009 5:01 am
Subject: R-sig-mixed-models Digest, Vol 36, Issue 5
To: r-sig-mixed-models at r-project.org

> Send R-sig-mixed-models mailing list submissions to
> 	r-sig-mixed-models at r-project.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> or, via email, send a message with subject or body 'help' to
> 	r-sig-mixed-models-request at r-project.org
> 
> You can reach the person managing the list at
> 	r-sig-mixed-models-owner at r-project.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-sig-mixed-models digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: z transform versus random intercept (Daniel Ezra Johnson)
>    2. Re: lmer models-confusing results - more information! (Ben Bolker)
>    3. Convergence problem with lmer with non-centered WWheat	data
>       (Phillip Chapman)
>    4. Re: Convergence problem with lmer with non-centered	WWheat
>       data (Douglas Bates)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 3 Dec 2009 11:09:49 -0500
> From: Daniel Ezra Johnson <danielezrajohnson at gmail.com>
> Subject: Re: [R-sig-ME] z transform versus random intercept
> To: Douglas Bates <bates at stat.wisc.edu>
> Cc: r-sig-mixed-models at r-project.org
> Message-ID:
> 	<a46630750912030809j3ce9856bv8f9f49011fa7bb27 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Is another, perhaps less important, difference that the z-scaling
> approach does "handle" (or remove) heteroscedasticity between subjects
> - at least in some fashion - whereas an lmer() model with random
> intercept would assume that each subject's observations have the same
> variance?
> 
> DEJ
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 03 Dec 2009 14:43:06 -0500
> From: Ben Bolker <bolker at ufl.edu>
> Subject: Re: [R-sig-ME] lmer models-confusing results - more
> 	information!
> To: Gwyneth Wilson <gwynwilson3 at hotmail.com>
> Cc: "r-sig-mixed-models at r-project.org"
> 	<r-sig-mixed-models at r-project.org>
> Message-ID: <4B1814CA.5000802 at ufl.edu>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Gwyneth Wilson wrote:
> > I have been running lmer models in R, looking at what effects
> > reproductive success in Ground Hornbills (a South African Bird). My
> > response variable is breeding success and is binomial (0-1) and my
> > random effect is group ID. My response variables include rainfall,
> > vegetation, group size, year, nests, and proportion of open woodland.
> > 
> > 
> > I have run numerous models with success but I am confused about what
> > the outputs are. When I run my first model with all my variables (all
> > additive) then i get a low AIC value with only a few of the variables
> > being significant. When i take out the varaibles that are not
> > significant then my AIC becomes higher but I have more significant
> > variables! When I keep taking out the unsignificant variables, I am
> > left with a model that has nests, open woodland, and group size as
> > being extremely significant BUT the AIC is high! Why is my AIC value
> > increasing when I have fewer varaibles that are all significant and
> > seem to be best explaining my data? Do i look at only the AIC when
> > choosing the 'best' model or do I look at only the p-values? or both?
> > The model with the lowest AIC at the moment has the most variables
> > and most are not significant?
> 
>    This happens a lot when you have correlated variables: although I
> don't agree with absolutely everything it says, Zuur et al 2009 is a
> good start for looking at this. When you have correlated variables, it's
> easy for them collectively to explain a lot of the pattern but
> individually not to explain much.
> 
> Zuur, A. F., E. N. Ieno, and C. S. Elphick. 2009. A protocol for data
> exploration to avoid common statistical problems. Methods in Ecology and
> Evolution. doi: 10.1111/j.2041-210X.2009.00001.x.
> 
>   In general, you should *either* (1)fit all sensible models and
> model-average the results (if you are interested in prediction) or (2)
> use the full model to evaluate p-values, test hypotheses etc. (providing
> you have _already_ removed correlated variables).  In general (although
> Murtaugh 2009 provides a counterexample of sorts), you should **not**
> select a model and then (afterwards) evaluate the significance of the
> parameters in the model ...
> 
> Murtaugh, P. A. 2009. Performance of several variable-selection methods
> applied to real ecological data. Ecology Letters 12:1061-1068. doi:
> 10.1111/j.1461-0248.2009.01361.x.
> 
> 
> > 
> > Please help. Any suggestions would be great!!
> > 
> > 
> > 
> > Here is some more information and some of my outputs:
> > 
> > 
> > 
> > The first model has all my variables included and i get a low AIC
> > with only grp.sz and wood being significant:
> >
> > model1<-lmer(br.su~factor(art.n)+factor(yr)+grp.sz+rain+veg+wood+(1|grp.id),data=hornbill,family=binomial)
> > 
> >> summary(model1)
> > Generalized linear mixed model fit by the Laplace approximation 
> > Formula: br.su ~ factor(art.n) + factor(yr) + grp.sz + rain + veg +
> > wood +      (1 | grp.id) Data: hornbill AIC   BIC    logLik
> > deviance 138.5 182.3  -55.26    110.5 Random effects: Groups Name
> > Variance Std.Dev. grp.id (Intercept) 1.2913   1.1364 Number of obs:
> > 169, groups: grp.id, 23
> > 
> > Fixed effects: Estimate      Std. Error  z value  Pr(>|z|) 
> > (Intercept)      -3.930736   3.672337  -1.070    0.2845 
> > factor(art.n)1  1.462829   0.903328   1.619     0.1054 factor(yr)2002
> > -2.592315   1.764551  -1.469   0.1418 factor(yr)2003 -3.169365
> > 1.759981  -1.801   0.0717 . factor(yr)2004  0.702210   1.341524
> > 0.523   0.6007 factor(yr)2005 -2.264257   1.722130  -1.315   0.1886
> >  factor(yr)2006  2.129728   1.270996   1.676   0.0938 . 
> > factor(yr)2007 -0.579961   1.390345  -0.417   0.6766 factor(yr)2008
> > -1.062933   1.640774  -0.648   0.5171 grp.sz             1.882616
> > 0.368317   5.111   3.2e-07 *** rain                -0.005896
> > 0.003561  -1.656   0.0977 . veg                 -1.993443   1.948738
> > -1.023   0.3063 wood               6.832543   3.050573   2.240
> > 0.0251 *
> > 
> > 
> > Then i carry on and remove varaibles i think are not having an
> > influence on breeding success like the year, vegetation and rain. And
> > i get this:
> > 
> > model3<-lmer(br.su~factor(art.n)+grp.sz+wood+(1|grp.id),data=hornbill,family=binomial)
> > 
> >> summary(model3)
> > Generalized linear mixed model fit by the Laplace approximation 
> > Formula: br.su ~ factor(art.n) + grp.sz + wood + (1 | grp.id) Data:
> > hornbill AIC    BIC    logLik deviance 143.8  159.4  -66.88    133.8 
> 
> > Random effects: Groups Name        Variance Std.Dev. grp.id
> > (Intercept)     0.75607  0.86953 Number of obs: 169, groups: grp.id,
> > 23
> > 
> > Fixed effects: Estimate Std. Error  z value   Pr(>|z|) (Intercept)
> > -8.6619     1.3528   -6.403   1.52e-10 *** factor(art.n)1   1.5337
> > 0.6420    2.389    0.0169 * grp.sz            1.6631     0.2968
> > 5.604    2.09e-08 *** wood              3.2177     1.5793    2.037
> > 0.0416 *
> > 
> > So all the variables are significant but the AIC value is higher!
> > 
> > I thought that with fewer variables and they are all showing
> > significance which means they are influencing breeding success-then
> > why is my AIC higher in this model?? Do i only look at the AIC values
> > and ignore the p-values? or only look at the p-values??
> > 
> > Thanks!!
> > 
> > 
> >  _________________________________________________________________
> > 
> > 
> > [[alternative HTML version deleted]]
> > 
> > _______________________________________________ 
> > R-sig-mixed-models at r-project.org mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> -- 
> Ben Bolker
> Associate professor, Biology Dep't, Univ. of Florida
> bolker at ufl.edu / www.zoology.ufl.edu/bolker
> GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Thu, 03 Dec 2009 15:18:17 -0700
> From: Phillip Chapman <pchapman at stat.colostate.edu>
> Subject: [R-sig-ME] Convergence problem with lmer with non-centered
> 	WWheat	data
> To: "r-sig-mixed-models at r-project.org"
> 	<r-sig-mixed-models at r-project.org>
> Message-ID: <4B183929.7020904 at stat.colostate.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Dear All,
> 
> I am trying to run a random coefficients regression using the winter 
> wheat data from the "SAS System for Mixed Models" book. lmer appears 
> to 
> be converging to an erroneous solution for the second model (diagonal 
> 
> cov matrix for random effects).   Is there a better way to specify my 
> 
> lmer model (h1 in the text below)?  Note: this convergence problem 
> goes 
> away when I center (i.e. subtract the mean of) the continuous 
> predictor, 
> Moisture.  I would usually center predictors in random coefficients 
> models anyway.
> 
> The data frame is WWheat from the "SASmixed" package.  The response is 
> 
> Yield; the continuous predictor is Moisture; and the factor with 10 
> levels is Variety.
> 
> I have no problem with the first model in the book; it fits random 
> intercept/slope pairs for each variety.  The following three analyses 
> 
> produce almost same results, out to a reasonable number of significant 
> 
> digits:
> 
> 1.   Using lme4:       g1 <- lmer(Yield ~ Moisture + (Moisture | 
> Variety), data=WWheat)
> 
> 2.   Using nlme:        g2 <- lme(Yield ~ Moisture, random = ~ 
> Moisture 
> | Variety, data=WWheat)
> 
> 3.  Using SAS:
>    proc mixed data=wheat scoring=8;
>    class variety;
>    model yield = moist/solution;
>    random int moist/sub=variety type=un solution G Gcorr;
>    run;
> 
> The second model restricts the covariance matrix of the 
> intercept/slope 
> pairs to be diagonal.  For this model I get the same result using lme 
> 
> and SAS, but lmer converges to a different solution, which appears to 
> be 
> erroneous:
> 
> 1.  Using lme4:       h1 <- lmer(Yield ~ Moisture + (1 | Variety) + 
> (Moisture -1 | Variety), data=WWheat)
> 
> 2. Using nlme (I could only figure out how to do this after creating a 
> 
> grouped data object):
>                       WWheat2 <- groupedData(Yield ~ Moisture | 
> Variety, 
> data=WWheat)
>                        h2 <- lme(Yield ~ Moisture, random =  pdDiag(~ 
> 
> Moisture), data=WWheat2)
> 
> 3. Using SAS:
>    proc mixed data=wheat scoring=8;
>    class variety;
>    model yield = moist/solution;
>    random int moist/sub=variety type=un(1) solution G Gcorr;
>    run;
> 
> I took the variance estimates from lmer and put them into SAS Proc 
> Mixed 
> to see if the REML deviances match: they do. Both SAS and lmer give 
> the 
> same deviance of 205.7, which is much greater than the optimum value 
> of 
> 187.1 produced by SAS and lme.  The fixed effects estimates from SAS 
> and 
> lmer also match when these variance estimates are used.
> proc mixed data=wheat scoring=8;
>    class variety;
>    model yield = moist/solution;
>    random int moist/sub=variety type=un(1) solution G Gcorr;
>   parms (1.6854e+01)(0) ( 2.6133e-16) (7.5295e-01)/noiter; *the 
> solution 
> from lmer;
>   run;
> 
> Thanks,
> 
> Phil Chapman
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Thu, 3 Dec 2009 17:12:30 -0600
> From: Douglas Bates <bates at stat.wisc.edu>
> Subject: Re: [R-sig-ME] Convergence problem with lmer with
> 	non-centered	WWheat data
> To: Phillip Chapman <pchapman at stat.colostate.edu>
> Cc: "r-sig-mixed-models at r-project.org"
> 	<r-sig-mixed-models at r-project.org>
> Message-ID:
> 	<40e66e0b0912031512l1dd01ff8yca47c39666968711 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> On Thu, Dec 3, 2009 at 4:18 PM, Phillip Chapman
> <pchapman at stat.colostate.edu> wrote:
> > Dear All,
> >
> > I am trying to run a random coefficients regression using the winter 
> wheat
> > data from the "SAS System for Mixed Models" book. lmer appears to be
> > converging to an erroneous solution for the second model (diagonal cov
> > matrix for random effects). ? Is there a better way to specify my 
> lmer model
> > (h1 in the text below)? ?Note: this convergence problem goes away 
> when I
> > center (i.e. subtract the mean of) the continuous predictor, 
> Moisture. ?I
> > would usually center predictors in random coefficients models anyway.
> >
> > The data frame is WWheat from the "SASmixed" package. ?The response 
> is
> > Yield; the continuous predictor is Moisture; and the factor with 10 
> levels
> > is Variety.
> >
> > I have no problem with the first model in the book; it fits random
> > intercept/slope pairs for each variety. ?The following three analyses
> > produce almost same results, out to a reasonable number of significant
> > digits:
> >
> > 1. ? Using lme4: ? ? ? g1 <- lmer(Yield ~ Moisture + (Moisture | Variety),
> > data=WWheat)
> >
> > 2. ? Using nlme: ? ? ? ?g2 <- lme(Yield ~ Moisture, random = ~ 
> Moisture |
> > Variety, data=WWheat)
> >
> > 3. ?Using SAS:
> > ?proc mixed data=wheat scoring=8;
> > ?class variety;
> > ?model yield = moist/solution;
> > ?random int moist/sub=variety type=un solution G Gcorr;
> > ?run;
> >
> > The second model restricts the covariance matrix of the intercept/slope
> > pairs to be diagonal. ?For this model I get the same result using 
> lme and
> > SAS, but lmer converges to a different solution, which appears to be
> > erroneous:
> >
> > 1. ?Using lme4: ? ? ? h1 <- lmer(Yield ~ Moisture + (1 | Variety) +
> > (Moisture -1 | Variety), data=WWheat)
> > 2. Using nlme (I could only figure out how to do this after creating 
> a
> > grouped data object):
> > ? ? ? ? ? ? ? ? ? ? WWheat2 <- groupedData(Yield ~ Moisture | Variety,
> > data=WWheat)
> > ? ? ? ? ? ? ? ? ? ? ?h2 <- lme(Yield ~ Moisture, random = ?pdDiag(~
> > Moisture), data=WWheat2)
> >
> > 3. Using SAS:
> > ?proc mixed data=wheat scoring=8;
> > ?class variety;
> > ?model yield = moist/solution;
> > ?random int moist/sub=variety type=un(1) solution G Gcorr;
> > ?run;
> >
> > I took the variance estimates from lmer and put them into SAS Proc 
> Mixed to
> > see if the REML deviances match: they do. Both SAS and lmer give the 
> same
> > deviance of 205.7, which is much greater than the optimum value of 187.1
> > produced by SAS and lme. ?The fixed effects estimates from SAS and 
> lmer also
> > match when these variance estimates are used.
> > proc mixed data=wheat scoring=8;
> > ?class variety;
> > ?model yield = moist/solution;
> > ?random int moist/sub=variety type=un(1) solution G Gcorr;
> > ?parms (1.6854e+01)(0) ( 2.6133e-16) (7.5295e-01)/noiter; *the 
> solution from
> > lmer;
> > ?run;
> 
> It does look like a convergence problem because of the 2.6133e-16 for
> the variance of the random effects for slope.  Could you tell us what
> version of the lme4 package you are using?  It is easiest just to send
> the output of sessionInfo()
> 
> This problem will occur because of the need to use a constrained
> nonlinear optimizer to determine the ML or REML estimates.  The
> evaluation of the deviance is done very carefully so as to give the
> optimizer a good shot at the getting the optimum but it doesn't always
> work.
> 
> In the development branch of the lme4 package (the lme4a branch) there
> is the capability of switching to a different optimizer.  I have found
> the bobyqa optimizer in the package by Kate Mullen and John Nash to be
> more robust and faster in most cases than is nlminb, which is the
> default. However, even nlminb converges to the correct optimum in that
> branch.
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> R-sig-mixed-models mailing list
> R-sig-mixed-models at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> End of R-sig-mixed-models Digest, Vol 36, Issue 5
> *************************************************