[R-sig-ME] Estimating/fixing covariances in multi-response model with several zero-inflated variables?
Adrian Jaeggi
ajaeggi at anth.ucsb.edu
Fri Nov 28 15:19:40 CET 2014
Dear list,
I am trying to model exchanges of different commodities among
individuals. There are five commodities (meat, produce, labor,
childcare and sick care), each of which can be given and received.
Each row is a unique combination of individuals A and B but each
individual appears multiple times hence A and B are random effects.
Three of the variables are heavily zero-inflated count variables, two
of them are binary.
E.g. meat given from A to B
summary(data$MeatAToB)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 0.00 12.23 0.00 1534.00
table(data$MeatAToB == 0)
FALSE TRUE
264 2360
ppois(0, mean(data$MeatAToB))
[1] 4.866032e-06
So far my approach has been to model giving one commodity (e.g.
MeatAToB) as a function of receiving any of the other commodities
using zipoisson, with the zi intercept capturing all the excess 0's,
the variance of the zero-inflation intercept fixed using the fix=2
argument and no covariances estimated between zi and count units, i.e.
idh(trait):units (cf. MCMCglmm Course Notes 5.2.):
prior1<- list(R=list(V=diag(2),nu=0.002,fix=2),
G=list(G1=list(V=1,nu=0.002), G2=list(V=1,nu=0.002)))
model1<- MCMCglmm(MeatAToB~trait-1 + at.level(trait,1):MeatBToA +
at.level(trait,1):ProdBToA + at.level(trait,1):LaborBToA +
at.level(trait,1):ChhildCareBToA + at.level(trait,1):SickCareBToA,
rcov=~idh(trait):units,
random=~idh(at.level(trait,1)):A+idh(at.level(trait,1)):B,
family='zipoisson', prior=prior1, data=data)
For the binary variables I've used family="categorical", slice=TRUE
and priors as suggested in the Course Notes. All of these models show
reasonable mixing and biologically meaningful results and can be
extended to include random slopes etc. So far so good.
However, as it is I have to fit five seperate models for giving the
five different commodities. It is unreasonable to think that giving
one commodity is unrelated to giving another. This got me thinking of
multi-response models in which I can estimate the covariance between
the different response variables.
My idea was this:
prior2<- list(R=list(V=diag(8)*0.002,nu=9,fix=c(2,4,6)),
G=list(G1=list(V=1,nu=0.002), G2=list(V=1,nu=0.002)
model2<- MCMCglmm(cbind(MeatAToB, ProdAToB, LaborAToB, ChildCareAToB,
SickCareAToB)~trait-1 + at.level(trait,c(1,3,5,7,8)):MeatBToA +
at.level(trait,c(1,3,5,7,8)):ProdBToA +
at.level(trait,c(1,3,5,7,8)):LaborBToA +
at.level(trait,c(1,3,5,7,8)):ChildCareBToA +
at.level(trait,c(1,3,5,7,8)):SickCareBToA, rcov=us(trait):units,
random=~idh(at.level(trait,c(1,3,5,7,8))):A +
idh(at.level(trait,c(1,3,5,7,8))):B,
family=c('zipoisson','zipoisson','zipoisson','categorical','categorical'),
prior=prior2, data=data)
Trait then contains 8 columns, in the following order: 1
MeatAToB_CountPart, 2 MeatAToB_ZIPart, 3 ProdAToB_CountPart, 4
ProdAToB_ZIPart, 5 LaborAToB_CountPart, 6 LaborAToB_ZIPart, 7
ChildCareAToB, 8 SickCareAToB. I want all the fixed effects to only
interact with the count / binary parts but not with the ZI part, for
which I'm only estimating intercepts (the excess 0's); hence the
at.level(trait,c(1,3,5,7,8)) interactions
In the 8x8 variance covariance matrix of this model, I would like to
estimate covariances between the count and binary columns only and fix
the ZI parts (similar to the simple model above). My naive approach
based on the simple model was to use rcov=us(trait):units, which
estimates the full variance covariance matrix including the ZI parts,
but try to somehow fix the latter, i.e. columns 2,4,6 using the fix
argument in the prior.
However, the above code gives the following ProdAToBmessages:
Error in MCMCglmm(cbind(MeatAToB, HortCalsAToBPerDay6round, :
fix term in priorG/priorR must be at least one less than the dimension of V
In addition: Warning messages:
1: In if (prior$fix != 0) { :
the condition has length > 1 and only the first element will be used
2: In prior$fix:nrow(prior$V) :
numerical expression has 3 elements: only the first used
3: In prior$fix:nrow(prior$V) :
numerical expression has 3 elements: only the first used
4: In if (sum(CM != 0) > nrow(prior$V) & prior$fix > 1) { :
the condition has length > 1 and only the first element will be used
5: In if (prior$fix != 1) { :
the condition has length > 1 and only the first element will be used
6: In if (det(prior$V) < 1e-08 & prior$fix != 0) { :
the condition has length > 1 and only the first element will be used
7: In if ((prior$fix > y | prior$fix < x) & prior$fix != 0) { :
the condition has length > 1 and only the first element will be used
8: In split > nfl :
longer object length is not a multiple of shorter object length
So it seems like the fix argument cannot be used to do what I'm trying
to do. Running the model without fixing the ZI parts will estimate the
full 8x8 VCV matrix, which I don't think makes sense (or does it??)
because the any given data point cannot contribute information to both
the ZI and the count process (cf. Course Notes).
My question to the list is therefore if anyone can tell me how to best
specify the variance - covariance structure in this multi-response
model with several zero-inflated variables? Should I go for the full
8x8 matrix using us(trait):units or should (and can?) the ZI parts
somehow be fixed?
Many thanks in advance,
Adrian
--
Adrian Jaeggi, PhD
Postdoctoral Researcher and Lecturer
Department of Anthropology
University of California Santa Barbara
Santa Barbara, CA 93106-3210
Phone: 805-455-8587
www.adrianjaeggi.com
More information about the R-sig-mixed-models
mailing list