[R-sig-ME] MCMCglmm: predicting zeros and the rcov structure

Mon Oct 17 02:06:01 CEST 2011

Hi all,

I'm a relatively new MCMCglmm user and I've come across two obstacles
that I hope you might be able to help with. I have read the MCMCglmm
Course Notes, the Tutorial, the Overview, and relevant archived
messages from message boards about these topics, but haven't found (or
understood) answers.

My first question is regarding the posterior predictive check
described on pg.103 of the most recent Course Notes. I have a dataset
in which my response variable is count data and I have an enormous
number of zeros (about 10,000 out of 11,000 datapoints are zeros). A
couple people have advised using a zero-inflated model to compensate
for this, but after reading some comments on this topic on message
boards, I don't think I need a ZIP because I suspect I don't have two
separate processes causing zeros—just a really overdispersed dataset.
I tried using the posterior predictive check from the Course Notes to
test this hunch, and the check actually predicted that I would have
MORE zeros than I have. The Course Notes don't mention this as a
possibility and I'm not sure how to interpret this result. In the
meantime, I have just used a regular poisson model and it has seemed
to fit very well (good mixing, in contrast to the terrible mixing I
got from trying a ZIP and a ZAP, both with 4 million iterations). I am
tempted to just stick with the poisson, but if anyone has advice to
the contrary, I'm all ears.

Here's the code I used for the posterior predictive check:

prior<-list(G=list(G1=list(V=0.002, nu=1),
G2=list(V=0.002, nu=1)),
R=list(V=diag(1), nu=0.002))

modelA.1<-MCMCglmm(NumAgg~FoodPresent + AdultFPresent + AlienPresent +
Status + Tenure + Targets + SessLgth, random=~Hyena + Session,
family="poisson", data=master, prior=prior, nitt=60000, thin=40,
burnin=20000, pl=TRUE, verbose=TRUE, saveX= TRUE, DIC=TRUE)

#number of zeros in first 1000 iterations:
nz<-1:1000
#observed zeros:
oz<-sum(master$NumAgg==0)
#make sample size an object:
samplesize<-length(master$NumAgg)
#distribution of projected number of zeros:
for (i in 1:1000) {
	pred.1<-rnorm(samplesize, (modelA.1$X %*% modelA.1$Sol[i, ])@x,
sqrt(modelA.1$VCV[i]))
	nz[i]<-sum(rpois(samplesize, exp(pred.1)) == 0)
	}
hist(nz, breaks=seq(11200,12000,50))
#add the vertical line indicating the observed number of zeros to compare:
abline(v=oz)

*******
My second question concerns a new model I'm trying to run with about
1700 datapoints, a binary response variable ("BinIntens"), and roughly
an equal number of each of the two response categories. Based on the
advice in the Tutorial, I'm using family="categorical," and I also see
that the recommended rcov structure is us(trait):units. However, when
I try each of the following two model codes, I get the same error (see
below). I have three random effects: "Session," "Agg," and "Recip."
All the covariates are categorical; I have tried only using one or two
of them and this hasn't helped at all.

#prior (I have also tried other priors, but that doesn't change the
error—I've included the others as a post script to this message):
IJ <- (0.5) * (diag(1) + matrix(1, 1, 1))
prior<-list(G=list(G1=list(V=1, nu=1),
G2=list(V=1, nu=1),
G3=list(V=1, nu=1)),
R=list(V=IJ, fix=1))

#here's the first code I tried:
modelB1<-MCMCglmm(BinIntens~trait*(PrevAggYN + FoodPresent +
AdultFPresent + AlienPresent + AggStatus), random=~us(trait):Session +
~us(trait):Agg + ~us(trait):Recip, rcov=~us(trait):units,
family="categorical", data=master, prior=prior, nitt=300000, thin=130,
burnin=100000, pl=TRUE, verbose=TRUE, DIC=TRUE)

#and got this:
Error in substr(fformula, 1, closeB - 1) : invalid substring argument(s)

#here's the second code I tried:
modelB2<-MCMCglmm(BinIntens~trait - 1 + PrevAggYN + FoodPresent +
AdultFPresent + AlienPresent + AggStatus, random=~us(trait):Session +
~us(trait):Agg + ~us(trait):Recip, rcov=~us(trait):units,
family="categorical", data=master, prior=prior, nitt=300000, thin=130,
burnin=100000, pl=TRUE, verbose=TRUE, DIC=TRUE)

#and got the same error:
Error in substr(fformula, 1, closeB - 1) : invalid substring argument(s)

#I tried only including one random effect (Session) to see if random
effects were causing the error message:
modelB<-MCMCglmm(BinIntens~trait - 1 + PrevAggYN + FoodPresent +
AdultFPresent + AlienPresent + AggStatus, random=~us(trait):Session,
rcov=~us(trait):units,  family="categorical", data=master,
prior=prior, nitt=3000, thin=15, burnin=1000, pl=TRUE, verbose=TRUE,
DIC=TRUE)

#and got this different error message:
Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :
  contrasts can be applied only to factors with 2 or more levels
#This error may not be relevant since I will have to include all three
random effects to avoid pseudoreplication.

###I WAS able to get the following code to run properly, but I'm
suspicious of its usefulness since it doesn't include the
rcov=~us(trait):units specification:

modelB<-MCMCglmm(BinIntens~PrevAggYN + FoodPresent + AdultFPresent +
AlienPresent + AggStatus, random=~Session + Agg + Recip,
family="categorical", data=master, prior=prior, nitt=300000, thin=130,
burnin=100000, pl=TRUE, verbose=TRUE, DIC=TRUE)

It's possible the answer to this second question is contained in the
Course Notes—I had some trouble understanding the sections on
parameter expansion with the "us" function. My apologies if this is
the case.

If there is information I haven't included here that is necessary to
address this question, please let me know and I can provide it (I'm
also happy to provide the dataset). Thanks in advance for your
help—I've found this listserv to be very informative and helpful thus
far.

-Leslie Curren

P.S. Here are the other prior options I'm considering....I'd welcome
any advice about these as well, since I'm an MCMCglmm rookie.

#three G structure options:

(1)
G=list(G1=list(V=1, nu=0.002),
G2=list(V=1, nu=0.002),
G3=list(V=1, nu=0.002))

(2)
G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.V=100),
G2=list(V=1, nu=1, alpha.mu=0, alpha.V=100),
G3=list(V=1, nu=1, alpha.mu=0, alpha.V=100))

(3)
G=list(G1=list(V=0.002, nu=1),
G2=list(V=0.002, nu=1),
G3=list(V=0.002, nu=1))

#four R structure options:

R=list(V=1, fix=1)

R=list(V=diag(2), nu=2, fix=1)

R=list(V=diag(2), nu=0.002, fix=1)

R=list(V=0.002, nu=1))

--
Leslie Curren
Ph.D. Candidate: Ecology, Evolutionary Biology, and Behavior
Michigan State University
https://www.msu.edu/user/currenle