[R] Multivariate binary response analysis

Emmanuel Charpentier emm.charpentier at free.fr
Tue Dec 14 22:27:07 CET 2010

Dear Derek, dear list,

A couple suggestions (and more) :

0) check that your problem can't be expressed in terms of what glmer() 
(lme4(a|b)? package(s)) can solve. I use this often, and am quite 
impressed with its abilities.

1) Why not modeling that directly in BUGS ? That might not be easy, will 
almost certainly not be fast to converge and will require lots of post-
convergence analysis but offers a direct solution.

Current BUGS implementations (OpenBUGS 3.1.x, JAGS 2.x) are well-
interfaced to R. My current favorite is JAGS + rjags, but this preference 
might come from years of frustration about (Win|Open)BUG on Wine on 
Linux, and the current offering of OpenBUGS + BRugs, currently in beta on 
the "Community" page of the OpenBUGS website, seems quite decent. Memory 
use might have been a  problem in the "(not so) good old days", but in a 
era of multigigabyte netbooks, I doubt this is still relevant. 
Computation time is more of a concern : although Lunn & al. (2009)(1) 
allude to a parallelized version of WinBUGS(2), I am not aware of any 
publicly-accessible parallelized BUGS interpreter, which is a shame in 
times where even said netbooks get multicore CPUs ...).

2) The nice MCMCglmm package is able to run a large subclass of DAG 
Bayesian models expressible as generalizations of (possibly 
multidependent) multivaried mixed-model GLMs, *way* faster than current 
BUGS implementations (the author claims an acceleration factor of about 
30 on his test problem, and my (limited) experience does not disprove 

It seems (to me) that your problem *might* be expressible in terms 
palatable to MCMCglm(). Be aware, however, that using this package will 
*require* some serious understanding of the author's notation used in his 
(excellent) documentation. Furthermore, if you're interested in the 
"random" effects values, you're SOL : I have not (yet) been able to 
retrieve traces of their distributions in the resultant object. But 
again, my understanding of this package is limited.

My personal bias goes to BUGS, as you might have guessed, but the "public 
health warning" that the Spiegelhalter gang put in front of the WinBUGS 
manual still apply : "MCMC sampling can be dangerous". To which I add 
"model checking, convergence assessment and validation might eat more 
time than modeling itself".


					Emmanuel Charpentier


1. David Lunn et al., “Rejoinder to commentaries on ‘The BUGS project: 
Evolution, critique and future directions’,” Statistics in Medicine 28, 
no. 25 (11, 2009): 3081-3082.
2. http://www.page-meeting.org/pdf_assets/6232-PARALLEL%
20Girgis_POSTER_final.pdf. Accessed on Dec, 14, 2010.

On Mon, 13 Dec 2010 15:31:53 -0800, Janszen, Derek B wrote :

> Greetings ~
> I need some assistance determining an appropriate approach to analyzing
> multivariate binary response data, and how to do it in R.
> The setting: Data from an assay, wherein 6-hours-post-fertilization
> zebrafish embryos (n=20, say) are exposed in a vial to a chemical
> (replicated 3 times, say), and 5 days later observed for the
> presence/absence (1/0) of defects in several organ systems (yolk sac,
> jaw, snout, body axis, pericardial edema, etc.) for each fish. The assay
> can be used as a screen (in which case a single response 'any' is first
> analyzed) as well as for comparing the responses of different classes of
> chemicals (a MANOVA-type analysis). Interest is focused on where
> response(s) are occurring, any associations among responses, and
> ultimately on possible biological mechanisms (the fish are tested for
> behavioral activity prior to this assay, and then ground up to provide
> RNA for microarray assays. A-lotta-data!).
> What I *wish* I could do is something like glm(response.matrix ~
> treat/vial, family=binomial(logit), data=zf.dat) but I know this can't
> be done. I came across the baymvb (Bayesian analysis of multivariate
> binary data) package in the R contributed packages archives, but it is
> no longer supported and the author is incommunicado. In the baymvb
> function the model is specified as single.stacked.response ~
> structure.factor + linear.model.terms. Huh? This looks suspiciously
> similar to analyzing repeated measures data in SAS as a univariate
> response with a split-plot structure (which forces the response
> covariance matrix to have a compound symmetric structure). If this is
> what's happening with this function it is definitely not appropriate.
> How about a GEE approach (I'm not familiar with the liter
>  ature, or with any of the R packages that implement it)? Any other
>  recommendations? (NB: just discovered the bayesm package. Don't know if
>  this will work for me or not.)
> Any help would be greatly appreciated. Derek Janszen, PhD
> Sr Research Biostatistician
> Computational Biology & Bioinformatics Pacific Northwest National
> Laboratory Richland, WA  99352 USA
> derek.janszen at pnl.gov
> 	[[alternative HTML version deleted]]

More information about the R-help mailing list