[R-sig-ME] Estimating/fixing covariances in multi-response model with several zero-inflated variables?

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Mon Dec 1 09:29:28 CET 2014


Dear Adrian,

Have a look at the BradleyTerry2 package. Bradley Terry models are designed the handle data where several individuals/teams 'compete' with each other.

Best regards,


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx op inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey

________________________________________
Van: R-sig-mixed-models [r-sig-mixed-models-bounces op r-project.org] namens Adrian Jaeggi [ajaeggi op anth.ucsb.edu]
Verzonden: vrijdag 28 november 2014 15:19
Aan: r-sig-mixed-models op r-project.org
Onderwerp: [R-sig-ME] Estimating/fixing covariances in multi-response model with several zero-inflated variables?

Dear list,

I am trying to model exchanges of different commodities among
individuals. There are five commodities (meat, produce, labor,
childcare and sick care), each of which can be given and received.
Each row is a unique combination of individuals A and B but each
individual appears multiple times hence A and B are random effects.
Three of the variables are heavily zero-inflated count variables, two
of them are binary.

E.g. meat given from A to B

summary(data$MeatAToB)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    0.00    0.00    0.00   12.23    0.00 1534.00

table(data$MeatAToB == 0)
FALSE  TRUE
   264  2360

ppois(0, mean(data$MeatAToB))
[1] 4.866032e-06

So far my approach has been to model giving one commodity (e.g.
MeatAToB) as a function of receiving any of the other commodities
using zipoisson, with the zi intercept capturing all the excess 0's,
the variance of the zero-inflation intercept fixed using the fix=2
argument and no covariances estimated between zi and count units, i.e.
idh(trait):units (cf. MCMCglmm Course Notes 5.2.):

prior1<- list(R=list(V=diag(2),nu=0.002,fix=2),
G=list(G1=list(V=1,nu=0.002), G2=list(V=1,nu=0.002)))

model1<- MCMCglmm(MeatAToB~trait-1 + at.level(trait,1):MeatBToA +
at.level(trait,1):ProdBToA + at.level(trait,1):LaborBToA +
at.level(trait,1):ChhildCareBToA + at.level(trait,1):SickCareBToA,
rcov=~idh(trait):units,
random=~idh(at.level(trait,1)):A+idh(at.level(trait,1)):B,
family='zipoisson', prior=prior1, data=data)

For the binary variables I've used family="categorical", slice=TRUE
and priors as suggested in the Course Notes. All of these models show
reasonable mixing and biologically meaningful results and can be
extended to include random slopes etc. So far so good.

However, as it is I have to fit five seperate models for giving the
five different commodities. It is unreasonable to think that giving
one commodity is unrelated to giving another. This got me thinking of
multi-response models in which I can estimate the covariance between
the different response variables.

My idea was this:

prior2<- list(R=list(V=diag(8)*0.002,nu=9,fix=c(2,4,6)),
G=list(G1=list(V=1,nu=0.002), G2=list(V=1,nu=0.002)

model2<- MCMCglmm(cbind(MeatAToB, ProdAToB, LaborAToB, ChildCareAToB,
SickCareAToB)~trait-1 + at.level(trait,c(1,3,5,7,8)):MeatBToA +
at.level(trait,c(1,3,5,7,8)):ProdBToA +
at.level(trait,c(1,3,5,7,8)):LaborBToA +
at.level(trait,c(1,3,5,7,8)):ChildCareBToA +
at.level(trait,c(1,3,5,7,8)):SickCareBToA, rcov=us(trait):units,
random=~idh(at.level(trait,c(1,3,5,7,8))):A +
idh(at.level(trait,c(1,3,5,7,8))):B,
family=c('zipoisson','zipoisson','zipoisson','categorical','categorical'),
prior=prior2, data=data)

Trait then contains 8 columns, in the following order: 1
MeatAToB_CountPart, 2 MeatAToB_ZIPart, 3 ProdAToB_CountPart, 4
ProdAToB_ZIPart, 5 LaborAToB_CountPart, 6 LaborAToB_ZIPart, 7
ChildCareAToB, 8 SickCareAToB. I want all the fixed effects to only
interact with the count / binary parts but not with the ZI part, for
which I'm only estimating intercepts (the excess 0's); hence the
at.level(trait,c(1,3,5,7,8)) interactions

In the 8x8 variance covariance matrix of this model, I would like to
estimate covariances between the count and binary columns only and fix
the ZI parts (similar to the simple model above). My naive approach
based on the simple model was to use rcov=us(trait):units, which
estimates the full variance covariance matrix including the ZI parts,
but try to somehow fix the latter, i.e. columns 2,4,6 using the fix
argument in the prior.

However, the above code gives the following ProdAToBmessages:
Error in MCMCglmm(cbind(MeatAToB, HortCalsAToBPerDay6round,  :
   fix term in priorG/priorR must be at least one less than the dimension of V
In addition: Warning messages:
1: In if (prior$fix != 0) { :
   the condition has length > 1 and only the first element will be used
2: In prior$fix:nrow(prior$V) :
   numerical expression has 3 elements: only the first used
3: In prior$fix:nrow(prior$V) :
   numerical expression has 3 elements: only the first used
4: In if (sum(CM != 0) > nrow(prior$V) & prior$fix > 1) { :
   the condition has length > 1 and only the first element will be used
5: In if (prior$fix != 1) { :
   the condition has length > 1 and only the first element will be used
6: In if (det(prior$V) < 1e-08 & prior$fix != 0) { :
   the condition has length > 1 and only the first element will be used
7: In if ((prior$fix > y | prior$fix < x) & prior$fix != 0) { :
   the condition has length > 1 and only the first element will be used
8: In split > nfl :
   longer object length is not a multiple of shorter object length

So it seems like the fix argument cannot be used to do what I'm trying
to do. Running the model without fixing the ZI parts will estimate the
full 8x8 VCV matrix, which I don't think makes sense (or does it??)
because the any given data point cannot contribute information to both
the ZI and the count process (cf. Course Notes).

My question to the list is therefore if anyone can tell me how to best
specify the variance - covariance structure in this multi-response
model with several zero-inflated variables? Should I go for the full
8x8 matrix using us(trait):units or should (and can?) the ZI parts
somehow be fixed?

Many thanks in advance,
Adrian




--
Adrian Jaeggi, PhD
Postdoctoral Researcher and Lecturer
Department of Anthropology
University of California Santa Barbara
Santa Barbara, CA 93106-3210
Phone: 805-455-8587
www.adrianjaeggi.com

_______________________________________________
R-sig-mixed-models op r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.



More information about the R-sig-mixed-models mailing list