[R] Simulate binary data for a logistic regression Monte Carlo
Ben Bolker
bolker at ufl.edu
Wed Apr 8 03:27:12 CEST 2009
Rolf Turner-3 wrote:
>
>
> On 8/04/2009, at 1:26 AM, jjh21 wrote:
>
>>
>> Hello,
>>
>> I am trying to simulate binary outcome data for a logistic
>> regression Monte
>> Carlo study. I need to eventually be able to manipulate the
>> structure of the
>> error term to give groups of observations a random effect. Right
>> now I am
>> just doing a very basic set up to make sure I can recover the
>> parameters
>> properly. I am running into trouble with the code below. It works
>> if you
>> take out the object 'epsilon,' but I need that object because it is
>> where I
>> plan to add in the random effects eventually. Any suggestions? Thanks.
>
> Well, it's the epsilon term that's doing you in. The model
> that you are fitting takes no cognizance of this random effect.
> You can *probably* fit a model which does take cognizance of it
> using one of the variants of glm() (available in R) which include
> random effects, but don't ask me how!
>
> I don't know if it is possible to work out analytically what the
> bias should be if the epsilon term is ignored when fitting the
> model, but it might be. Or an approximation might be achieved.
>
> Good luck.
>
> cheers,
>
> Rolf Turner
>
I agree that that the individual-level random effect is probably the issue.
I played with this some today but didn't manage to resolve it --
tried JAGS/R2jags and glmer from lme4 but didn't manage to
get an estimate of epsilon that matched the input value. I'm
a little worried about binary data with an underlying random
effect, I think there's an identifiability problem there ...
set.seed(1)
nsim <- 100
N <- 1000
alpha <- alpha.lmer <- numeric(nsim) # Vector to store coefficient estimates
X1 <- X1.lmer <- numeric(nsim)
X2 <- X2.lmer <- numeric(nsim)
eps.lmer <- numeric(nsim)
library(lme4)
a <- 0 # True parameters
B1 <- .5
B2 <- .85
pb <- txtProgressBar()
for (i in 1:nsim){
setTxtProgressBar(pb,i/nsim)
x1 <- rnorm(N)
x2 <- rnorm(N)
epsilon <- rnorm(N)
LP <- a + B1*x1 + B2*x2 + epsilon # Linear predictor
y <- rbinom(length(LP), 1, plogis(LP)) # Binomial link
indiv <- 1:N
model <- glm(y ~ x1 + x2, family = binomial)
model2 <- glmer(y~ x1 +x2 +(1|indiv), family=binomial)
##
alpha[i] <- coef(model)[1]
X1[i] <- coef(model)[2]
X2[i] <- coef(model)[3]
##
alpha.lmer[i] <- fixef(model2)[1]
X1.lmer[i] <- fixef(model2)[2]
X2.lmer[i] <- fixef(model2)[3]
eps.lmer[i] <- attr(VarCorr(model2)$indiv,"stddev")
}
close(pb)
mean(X1) # Shows a bias downward (should be about .5)
mean(X2) # Shows a bias downward (should be about .85)
mean(alpha) # Pretty close (should be about 0)
res <- data.frame(X1,X2,alpha,X1.lmer,X2.lmer,alpha.lmer,eps.lmer)
summary(res)
means <- colMeans(res)
ses <- sapply(res,sd)/sqrt(nsim)
library(plotrix)
plotCI(1:7,means[c(1,4,2,5,3,6,7)],ses[c(1,4,2,5,3,6,7)],
ylim=c(0,1))
points(c(1.5,3.5,5.5,7),c(B1,B2,a,1),pch=16,col=2)
library(R2jags)
j1 <- jags(data=c("N","y","x1","x2"),
inits=list(list(a=0,B1=0.5,B2=0.85)),
n.iter=8000,
n.chain=1,parameters=c("a","B1","B2","tau.eps"),
model.file="logistmc.txt")
traceplot(j1,ask=FALSE,mfrow=c(2,2))
## logistmc.txt:
model {
for (i in 1:N) {
LP[i] <- a + B1*x1[i]+B2*x2[i]+epsilon[i]
prob[i] <- 1/(1+exp(-LP[i]))
y[i] ~ dbern(prob[i])
epsilon[i] ~ dnorm(0,tau.eps)
}
zero <- 0
tau.eps ~ dgamma(0.1,0.1)
a ~ dnorm(0,0.01)
B1 ~ dnorm(0,0.01)
B2 ~ dnorm(0,0.01)
}
--
View this message in context: http://www.nabble.com/Simulate-binary-data-for-a-logistic-regression-Monte-Carlo-tp22928872p22941504.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list