[R-sig-ME] simulation
Scott Raynaud
scott.raynaud at yahoo.com
Mon Dec 19 15:36:59 CET 2011
I played around with this some more and did some simple simulations of 75
level 2 units pairing temprature with each of my other predictors individually.
Still nothing converged. Everything is fine with the full model minus temperature.
Temperature is an imprtant predictor which I want to include. Can anyone think
of anything else I can try besides omitting temperature?
----- Original Message -----
From: Scott Raynaud <scott.raynaud at yahoo.com>
To: "r-sig-mixed-models at r-project.org" <r-sig-mixed-models at r-project.org>
Cc:
Sent: Friday, December 16, 2011 7:33 PM
Subject: simulation
I'm using an R program (which I did not write) to simulate multilevel data
(subjects in locations) used in power calculations. It uses lmer to fit a
mixed logistic model to the simulated data based on inputs of means,
variances, slopes and proportions:
###~~~~~~~~~~~~~~~~~ Required packages ~~~~~~~~~~~~~~~~~~~~~###
library(MASS)
library(lme4)
###~~~~~~~~~~~~~~~~~~~ Initial inputs ~~~~~~~~~~~~~~~~~~~~###
set.seed(12345)
siglevel<-0.03
z1score<-abs(qnorm(siglevel))
simus<-1000
n1list<-matrix(c(800,2239,3678,5117,6557,7996, 267,746,1226,1706,2186,2665, 160,448,736,1023,1311,1599, 114,320,525,731,937,1142),nrow=6,ncol=4)
sizeclass<-matrix(c( 5, 4, 3, 3, 3, 2, 15,12, 9, 9, 9, 6, 25,20,15,15,15,10, 35,28,21,21,21,14),nrow=6,ncol=4)
n2low<-20
n2high<-140
n2step<-40
npred<-10
randsize<-1
beta<-c(-1.586,-2.469,-1.197,-2.194,-2.949,-3.628,-5.227,0.198,-0.371,0.202,1.634000)
betasize<-length(beta)
effectbeta<-abs(beta)
sgnbeta<-sign(beta)
xtype<-c(2.000000,1.000000,1.000000,1.000000,1.000000,2.000000,1.000000,1.000000,1.000000,1.000000)
xprob<-c(0,0.000000,0.102000,0.141900,0.244900,0.441600,0.000000,0.533200,0.826000,0.261400,0.280000)
meanpred<-c(0,3.590,0.000,0.000,0.000,0.000,1.032,0.000,0.000,0.000,0.000)
varpred<-c(0,.010,0.000,0.000,0.000,0.000,0.083,0.000,0.000,0.000,0.000)
varpred2<-c(0,0.000,0.000,0.000,0.000,0.000,0.021,0.000,0.000,0.000,0.000)
sigma2u<-matrix(c(0.250),randsize,randsize)
n1range<-as.vector(n1list)
n2range<-seq(n2low,n2high,n2step)
n1size<-1
n2size<-length(n2range)
totalsize<-n1size*n2size
finaloutput<-matrix(0,totalsize,6*betasize)
rowcount<-1
n1col<-1
##------------------- Inputs for model fitting ---------------##
fixname<-c("x0","x1","x2","x3","x4","x5","x6","x7","x8","x9","x10")
fixform<-"1+x1+x2+x3+x4+x5+x6+x7+x8+x9+x10"
randform<-"(1|l2id)"
expression<-paste(c(fixform,randform),collapse="+")
modelformula<-formula(paste("y ~",expression))
data<-vector("list",2+length(fixname))
names(data)<-c("l2id","y",fixname)
...
## +++++++++++++++++++ Set up X matrix +++++++++++++++++++ ##
micpred<-rnorm(length,meanpred[2],sqrt(varpred[2]))
macpred<-rnorm(n2,0,sqrt(varpred2[2]))
x[,2]<-micpred+macpred[l2id]
micpred<-rmultinom(length,1,c(.102,.1419,.2449,.4416,.0696))
x[,3]<-micpred[1,]
x[,4]<-micpred[2,]
x[,5]<-micpred[3,]
x[,6]<-micpred[4,]
micpred<-rnorm(length,meanpred[7],sqrt(varpred[7]))
macpred<-rnorm(n2,0,sqrt(varpred2[7]))
x[,7]<-micpred+macpred[l2id]
x[,8]<-rbinom(length,1,xprob[8])
x[,9]<-rbinom(length,1,xprob[9])
x[,10]<-rbinom(length,1,xprob[10])
x[,11]<-rbinom(length,1,xprob[11])
##--------------------------------------------------------------##
u<-mvrnorm(n2,rep(0,randsize),sigma2u)
fixpart<-x%*%beta
randpart<-rowSums(z*u[l2id,])
binomprob<-exp(fixpart+randpart)/(1+exp(fixpart+randpart))
y<-rbinom(length,1,binomprob)
##------------------- Inputs for model fitting ---------------##
data$l2id<-as.factor(l2id)
data$y<-y
data$x0<-x[,1]
data$x1<-x[,2]
data$x2<-x[,3]
data$x3<-x[,4]
data$x4<-x[,5]
data$x5<-x[,6]
data$x6<-x[,7]
data$x7<-x[,8]
data$x8<-x[,9]
data$x9<-x[,10]
data$x10<-x[,11]
###~~~~~~~~~~ Fitting the model using lmer funtion ~~~~~~~~~~###
(fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=1))
...
where l2id is the level 2 sample size and the ellipses indicate loop code.
Locations are treated as random and the model is random intercept only.
The program is set to run 1000 simulations.
I have temperature, five levels of gestational age (GA), birth wieght (BW) and four
other categorical pedictors, all binary. I scaled everything so that all my slopes are in the
range of -5.2 to 1.6 and variances from .01 to .08. I have a couple of categories
of GA that have small probabilities (<.10). I'm using a structured sampling approach
looking at 20, 60, 100, and 140 locations with a total n=75k. The first looks like this:
# groups n
5 800
4 2239
3 3678
3 5117
3 6557
2 7996
Total 20 75000
As the level 2 sizes increase, the cell sizes decrease. When I run this model in
the simulation I get:
Warning: glm.fit: algorithm did not converge
every time the model is fit (I killed this long before it ran 1000 times).
I tried increasing the number of iterations to no avail. I suspected linear
dependencies among the predictors, so I took out GA (same result), put
GA back and took out BW (same result) and then took out both GA and
BW. This ran about half the time with th other half passing warnings
such as:
Warning: glm.fit: algorithm did not converge
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
or
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
in addition to some like the original warning.
If I leave everything in but temperature, then it runs fine. I also tested the full
model separately at 50 and 75 level 2 units each with total n=75k. Nothing converged.
I want to include temperature, but I'm not sure what else to try. I realize 20 level 2
units is probably too small. I think the threshold would be crossed by 60,
however. Any suggestions?
More information about the R-sig-mixed-models
mailing list