[R-sig-ME] Estimating/fixing covariances in multi-response model with several zero-inflated variables?

Fri Nov 28 15:19:40 CET 2014

Dear list,

I am trying to model exchanges of different commodities among  
individuals. There are five commodities (meat, produce, labor,  
childcare and sick care), each of which can be given and received.  
Each row is a unique combination of individuals A and B but each  
individual appears multiple times hence A and B are random effects.  
Three of the variables are heavily zero-inflated count variables, two  
of them are binary.

E.g. meat given from A to B

summary(data$MeatAToB)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    0.00    0.00    0.00   12.23    0.00 1534.00

table(data$MeatAToB == 0)
FALSE  TRUE
   264  2360

ppois(0, mean(data$MeatAToB))
[1] 4.866032e-06

So far my approach has been to model giving one commodity (e.g.  
MeatAToB) as a function of receiving any of the other commodities  
using zipoisson, with the zi intercept capturing all the excess 0's,  
the variance of the zero-inflation intercept fixed using the fix=2  
argument and no covariances estimated between zi and count units, i.e.  
idh(trait):units (cf. MCMCglmm Course Notes 5.2.):

prior1<- list(R=list(V=diag(2),nu=0.002,fix=2),  
G=list(G1=list(V=1,nu=0.002), G2=list(V=1,nu=0.002)))

model1<- MCMCglmm(MeatAToB~trait-1 + at.level(trait,1):MeatBToA +  
at.level(trait,1):ProdBToA + at.level(trait,1):LaborBToA +  
at.level(trait,1):ChhildCareBToA + at.level(trait,1):SickCareBToA,  
rcov=~idh(trait):units,  
random=~idh(at.level(trait,1)):A+idh(at.level(trait,1)):B,  
family='zipoisson', prior=prior1, data=data)

For the binary variables I've used family="categorical", slice=TRUE  
and priors as suggested in the Course Notes. All of these models show  
reasonable mixing and biologically meaningful results and can be  
extended to include random slopes etc. So far so good.

However, as it is I have to fit five seperate models for giving the  
five different commodities. It is unreasonable to think that giving  
one commodity is unrelated to giving another. This got me thinking of  
multi-response models in which I can estimate the covariance between  
the different response variables.

My idea was this:

prior2<- list(R=list(V=diag(8)*0.002,nu=9,fix=c(2,4,6)),  
G=list(G1=list(V=1,nu=0.002), G2=list(V=1,nu=0.002)

model2<- MCMCglmm(cbind(MeatAToB, ProdAToB, LaborAToB, ChildCareAToB,  
SickCareAToB)~trait-1 + at.level(trait,c(1,3,5,7,8)):MeatBToA +  
at.level(trait,c(1,3,5,7,8)):ProdBToA +  
at.level(trait,c(1,3,5,7,8)):LaborBToA +  
at.level(trait,c(1,3,5,7,8)):ChildCareBToA +  
at.level(trait,c(1,3,5,7,8)):SickCareBToA, rcov=us(trait):units,  
random=~idh(at.level(trait,c(1,3,5,7,8))):A +  
idh(at.level(trait,c(1,3,5,7,8))):B,  
family=c('zipoisson','zipoisson','zipoisson','categorical','categorical'),  
prior=prior2, data=data)

Trait then contains 8 columns, in the following order: 1  
MeatAToB_CountPart, 2 MeatAToB_ZIPart, 3 ProdAToB_CountPart, 4  
ProdAToB_ZIPart, 5 LaborAToB_CountPart, 6 LaborAToB_ZIPart, 7  
ChildCareAToB, 8 SickCareAToB. I want all the fixed effects to only  
interact with the count / binary parts but not with the ZI part, for  
which I'm only estimating intercepts (the excess 0's); hence the  
at.level(trait,c(1,3,5,7,8)) interactions

In the 8x8 variance covariance matrix of this model, I would like to  
estimate covariances between the count and binary columns only and fix  
the ZI parts (similar to the simple model above). My naive approach  
based on the simple model was to use rcov=us(trait):units, which  
estimates the full variance covariance matrix including the ZI parts,  
but try to somehow fix the latter, i.e. columns 2,4,6 using the fix  
argument in the prior.

However, the above code gives the following ProdAToBmessages:
Error in MCMCglmm(cbind(MeatAToB, HortCalsAToBPerDay6round,  :
   fix term in priorG/priorR must be at least one less than the dimension of V
In addition: Warning messages:
1: In if (prior$fix != 0) { :
   the condition has length > 1 and only the first element will be used
2: In prior$fix:nrow(prior$V) :
   numerical expression has 3 elements: only the first used
3: In prior$fix:nrow(prior$V) :
   numerical expression has 3 elements: only the first used
4: In if (sum(CM != 0) > nrow(prior$V) & prior$fix > 1) { :
   the condition has length > 1 and only the first element will be used
5: In if (prior$fix != 1) { :
   the condition has length > 1 and only the first element will be used
6: In if (det(prior$V) < 1e-08 & prior$fix != 0) { :
   the condition has length > 1 and only the first element will be used
7: In if ((prior$fix > y | prior$fix < x) & prior$fix != 0) { :
   the condition has length > 1 and only the first element will be used
8: In split > nfl :
   longer object length is not a multiple of shorter object length

So it seems like the fix argument cannot be used to do what I'm trying  
to do. Running the model without fixing the ZI parts will estimate the  
full 8x8 VCV matrix, which I don't think makes sense (or does it??)  
because the any given data point cannot contribute information to both  
the ZI and the count process (cf. Course Notes).

My question to the list is therefore if anyone can tell me how to best  
specify the variance - covariance structure in this multi-response  
model with several zero-inflated variables? Should I go for the full  
8x8 matrix using us(trait):units or should (and can?) the ZI parts  
somehow be fixed?

Many thanks in advance,
Adrian

-- 
Adrian Jaeggi, PhD
Postdoctoral Researcher and Lecturer
Department of Anthropology
University of California Santa Barbara
Santa Barbara, CA 93106-3210
Phone: 805-455-8587
www.adrianjaeggi.com