[R-SIG-Mac] would parallel computing help? - summary of responses

Tue Mar 8 10:27:59 CET 2011

I want to thank Ken Beath and Prof. Ripley for replying to my query.  
For anyone else who might be interested, and as not all emails were cc-ed to r-sig-mac, the responses are copied below along with my reply.
Thanks,
Alan Kelly

Original posting:
> Dear all, I'm running a number of Bayesian binomial regression models using jags (interfacing with R via R2jags) on a Mac server with quad core processor running at 2.66 Ghz with 6 GB memory under Snow Leopard (session info below).  As the models contain around 30 predictors and between 5 to 15 thousand observations, the time required to run a single model with 3 chains with an adequate number of iterations to ensure convergence is around 2 hours.  While I can live with this for the occasional run, it will be a problem when I need to run several dozen different models. 
> Perhaps some of you have relevant experience and can advise if this run time could be significantly reduced using, for example, one of the parallel computing packages?  And if so, which one?  I should add that I'm not clear if jags can directly avail of multicore processing even if available - it might be necessary to program a Gibbs or Metropolis sampler directly in R.....
> Any thoughts/suggestions?
> Best wishes,
> Alan Kelly

Reply by Ken
__________________________

It would be easier to run multiple copies of R.

Ken
__________________________

Reply by Prof. Ripley
__________________________
This is an example of 'embarrassingly parallel' computation.  Simply 
run each chain in a separate process in parallel.  Packages such as 
snow or multicore can organize that for you.

However, if you mean logistic regression (there are other binomial 
regressions such as probit), first check how you are doing this in 
JAGS.  Using 'module glm' often makes a large difference in speed, and 
my recollection is that this is still not particularly fast compared 
to, say, MCMCpack. And in any case the recommended way to run JAGS 
with R is rjags (recommended by the author of JAGS, amonst others).

Follow-up query by Prof Ripley
__________________________

> It would be easier to run multiple copies of R.

What do you think the various 'parallel computing packages' actually 
do?

Comment by Ken
__________________________
They run multiple copies of R, but for a one off job it seems a lot easier just to open several terminal windows.

Ken

My reply
__________________________
Brian - many thanks for this.  Some of the models are indeed logistic regression, but many are binomial. 
I did discover the enormous advantage of loading module "glm" after the original run took some 24 hours for a logistic model!
MCMClogit can deliver the answer in around 2-3 minutes.  Clearly a nice solution in many circumstances. However, MCMClogit does not facilitate some within-model programming to test various parameters, quality of fit, etc., (as in Chpt. 24 Gelman and Hill) although some of these issues can certainly be tackled with the posterior distribution for the betas. I'm considering my options at this stage before attempting the full range of analyses required.

In response to your follow-on question about what parallel packages actually do - well I have no direct experience of this in R (although I have used parallel Mathematica across server linked servers previously) - but my supposition is that I can gain some advantage in time if one of the parallel computing packages can handle the multiple chains and from your previous reply it clearly can.
Many thanks for this,
Alan