[R-sig-ME] parallel MCMCglmm, RNGstreams, starting values & priors

Ruben Arslan rubenarslan at gmail.com
Mon Aug 25 16:58:06 CEST 2014


Dear list,

sorry for bumping my old post, I hope to elicit a response with a more focused question:

When does MCMCglmm automatically start from different values when using doMPI/foreach?

I have done some tests with models of varying complexity. For example, the script in my last
post (using "zapoisson") yielded 40 identical chains:
> identical(mcmclist[1], mcmclist[30])
TRUE 

A simpler (?) model (using "ztpoisson" and no specified prior), however, yielded different chains 
and I could use them to calculate gelman.diag()

Changing my script to the version below, i.e. seeding foreach using .options.mpi=list( seed= 1337) 
so as to make RNGstreams reproducible (or so I  thought), led to different chains even for the 
"zapoisson" model.

In no case have I (successfully) tried to supplant the default of MCMCglmm's "start" argument.
Is starting my models from different RNGsubstreams inadequate compared to manipulating
the start argument explicitly? If so, is there any worked example of explicit starting value manipulation
in parallel computation?
I've browsed the MCMCglmm source to understand how the default starting values are generated,
but didn't find any differences with respect to RNG for the two families "ztpoisson" and "zapoisson"
(granted, I did not dig very deep).

Best regards,

Ruben Arslan


# bsub -q mpi -W 12:00 -n 41 -R np20 mpirun -H localhost -n 41 R --slave -f "/usr/users/rarslan/rpqa/rpqa_main/rpqa_children_parallel.R"

library(doMPI)
cl <- startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/rpqa_main/")
registerDoMPI(cl)
Children_mcmc1 = foreach(i=1:clusterSize(cl),.options.mpi = list(seed=1337) ) %dopar% {
	library(MCMCglmm);library(data.table)
	load("/usr/users/rarslan/rpqa/rpqa1.rdata")
	
	nitt = 130000; thin = 100; burnin = 30000
	prior.m5d.2 = list(
		R = list(V = diag(c(1,1)), nu = 0.002),
		G=list(list(V=diag(c(1,1e-6)),nu=0.002))
	)
	
	rpqa.1 = na.omit(rpqa.1[spouses>0, list(idParents, children, male, urban, spouses, paternalage.mean, paternalage.factor)])
	(m1 = MCMCglmm( children ~ trait * (male + urban + spouses + paternalage.mean + paternalage.factor),
						rcov=~us(trait):units,
						random=~us(trait):idParents,
						family="zapoisson",
						prior = prior.m5d.2,
						data=rpqa.1, 
						pr = F, saveX = F, saveZ = F,
						nitt=nitt,thin=thin,burnin=burnin))
}

library(coda)
mcmclist = mcmc.list(lapply(Children_mcmc1,FUN=function(x) { x$Sol}))
save(Children_mcmc1,mcmclist, file = "/usr/users/rarslan/rpqa/rpqa_main/rpqa_mcmc_kids_za.rdata")
closeCluster(cl)
mpi.quit()

 

On 04 Aug 2014, at 20:25, Ruben Arslan <rubenarslan at gmail.com> wrote:

> Dear list,
> 
> would someone be willing to share her or his efforts in parallelising a MCMCglmm analysis?
> 
> I had something viable using harvestr that seemed to properly initialise
> the starting values from different random number streams (which is desirable,
> as far as I could find out), but I ended up being unable to use harvestr, because
> it uses an old version of plyr, where parallelisation works only for multicore, not for
> MPI.
> 
> I pasted my working version, that does not do anything about starting values or RNG
> at the end of this email. I can try to fumble further in the dark or try to update harvestr,
> but maybe someone has gone through all this already.
> 
> I'd also appreciate any tips for elegantly post-processing such parallel data, as some of my usual
> extraction functions and routines are hampered by the fact that some coda functions
> do not aggregate results over chains. (What I get from a single-chain summary in MCMCglmm
> is a bit more comprehensive, than what I managed to cobble together with my own extraction
> functions).
> 
> The reason I'm parallelising my analyses is that I'm having trouble getting a good effective
> sample size for any parameter having to do with the many zeroes in my data.
> Any pointers are very appreciated, I'm quite inexperienced with MCMCglmm.
> 
> Best wishes
> 
> Ruben
> 
> # bsub -q mpi-short -W 2:00 -n 42 -R np20 mpirun -H localhost -n 41 R --slave -f "rpqa/rpqa_main/rpqa_children_parallel.r"
> library(doMPI)
> cl <- startMPIcluster()
> registerDoMPI(cl)
> Children_mcmc1 = foreach(i=1:40) %dopar% {
> 	library(MCMCglmm)
> 	load("rpqa1.rdata")
> 	
> 	nitt = 40000; thin = 100; burnin = 10000
> 	prior = list(
> 		R = list(V = diag(c(1,1)), nu = 0.002),
> 		G=list(list(V=diag(c(1,1e-6)),nu=0.002))
> 	)
> 	
> 	MCMCglmm( children ~ trait -1 + at.level(trait,1):male + at.level(trait,1):urban + at.level(trait,1):spouses + at.level(trait,1):paternalage.mean + at.level(trait,1):paternalage.factor,
> 		rcov=~us(trait):units,
> 		random=~us(trait):idParents,
> 		family="zapoisson",
> 		prior = prior,
> 		data=rpqa.1, 
> 		pr = F, saveX = T, saveZ = T,
> 		nitt=nitt,thin=thin,burnin=burnin)
> }
> 
> library(coda)
> mcmclist = mcmc.list(lapply(Children_mcmc1,FUN=function(x) { x$Sol}))
> save(Children_mcmc1,mcmclist, file = "rpqa_mcmc_kids_za.rdata")
> closeCluster(cl)
> mpi.quit()
> 
> 
> --
> Ruben C. Arslan
> 
> Georg August University Göttingen
> Biological Personality Psychology and Psychological Assessment
> Georg Elias Müller Institute of Psychology
> Goßlerstr. 14
> 37073 Göttingen
> Germany
> Tel.: +49 551 3920704
> https://psych.uni-goettingen.de/en/biopers/team/arslan
> 
> 
> 


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list