[R-sig-ME] parallel MCMCglmm, RNGstreams, starting values & priors

Jarrod Hadfield j.hadfield at ed.ac.uk
Mon Aug 25 17:29:07 CEST 2014


Hi Ruben,

I do not think the issue is with the starting values, because even if  
the same starting values were used the chains would still differ  
because of the randomness in the Markov Chain (if I interpret your  
`identical' test correctly). This just involves a call to  
GetRNGstate() in the C++ code (L 871 of MCMCglmm.cc) so I think for  
some reason doMPI/foreach is not doing what you expect. I am not  
familiar with doMPI and am in the middle of writing lectures so  
haven't got time to look into it carefully. Outside of the context of  
doMPI I get the behaviour I expect:


l<-rnorm(200, -1, sqrt(1))
t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
y<-rpois(200,exp(l)-t)+1
# generate zero-truncated data with an intercept of -1

dat<-data.frame(y=y)
set.seed(1)
m1<-MCMCglmm(y~1, family="ztpoisson", data=dat)
set.seed(2)
m2<-MCMCglmm(y~1, family="ztpoisson", data=dat)
set.seed(2)
m3<-MCMCglmm(y~1, family="ztpoisson", data=dat)

plot(mcmc.list(m1$Sol, m2$Sol))
# different, as expected
plot(mcmc.list(m2$Sol, m3$Sol))
# the same, as expected





Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25 Aug 2014  
16:58:06 +0200:

> Dear list,
>
> sorry for bumping my old post, I hope to elicit a response with a  
> more focused question:
>
> When does MCMCglmm automatically start from different values when  
> using doMPI/foreach?
>
> I have done some tests with models of varying complexity. For  
> example, the script in my last
> post (using "zapoisson") yielded 40 identical chains:
>> identical(mcmclist[1], mcmclist[30])
> TRUE
>
> A simpler (?) model (using "ztpoisson" and no specified prior),  
> however, yielded different chains
> and I could use them to calculate gelman.diag()
>
> Changing my script to the version below, i.e. seeding foreach using  
> .options.mpi=list( seed= 1337)
> so as to make RNGstreams reproducible (or so I  thought), led to  
> different chains even for the
> "zapoisson" model.
>
> In no case have I (successfully) tried to supplant the default of  
> MCMCglmm's "start" argument.
> Is starting my models from different RNGsubstreams inadequate  
> compared to manipulating
> the start argument explicitly? If so, is there any worked example of  
> explicit starting value manipulation
> in parallel computation?
> I've browsed the MCMCglmm source to understand how the default  
> starting values are generated,
> but didn't find any differences with respect to RNG for the two  
> families "ztpoisson" and "zapoisson"
> (granted, I did not dig very deep).
>
> Best regards,
>
> Ruben Arslan
>
>
> # bsub -q mpi -W 12:00 -n 41 -R np20 mpirun -H localhost -n 41 R  
> --slave -f  
> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_children_parallel.R"
>
> library(doMPI)
> cl <- startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/rpqa_main/")
> registerDoMPI(cl)
> Children_mcmc1 = foreach(i=1:clusterSize(cl),.options.mpi =  
> list(seed=1337) ) %dopar% {
> 	library(MCMCglmm);library(data.table)
> 	load("/usr/users/rarslan/rpqa/rpqa1.rdata")
>
> 	nitt = 130000; thin = 100; burnin = 30000
> 	prior.m5d.2 = list(
> 		R = list(V = diag(c(1,1)), nu = 0.002),
> 		G=list(list(V=diag(c(1,1e-6)),nu=0.002))
> 	)
>
> 	rpqa.1 = na.omit(rpqa.1[spouses>0, list(idParents, children, male,  
> urban, spouses, paternalage.mean, paternalage.factor)])
> 	(m1 = MCMCglmm( children ~ trait * (male + urban + spouses +  
> paternalage.mean + paternalage.factor),
> 						rcov=~us(trait):units,
> 						random=~us(trait):idParents,
> 						family="zapoisson",
> 						prior = prior.m5d.2,
> 						data=rpqa.1,
> 						pr = F, saveX = F, saveZ = F,
> 						nitt=nitt,thin=thin,burnin=burnin))
> }
>
> library(coda)
> mcmclist = mcmc.list(lapply(Children_mcmc1,FUN=function(x) { x$Sol}))
> save(Children_mcmc1,mcmclist, file =  
> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_mcmc_kids_za.rdata")
> closeCluster(cl)
> mpi.quit()
>
>
>
> On 04 Aug 2014, at 20:25, Ruben Arslan <rubenarslan at gmail.com> wrote:
>
>> Dear list,
>>
>> would someone be willing to share her or his efforts in  
>> parallelising a MCMCglmm analysis?
>>
>> I had something viable using harvestr that seemed to properly initialise
>> the starting values from different random number streams (which is  
>> desirable,
>> as far as I could find out), but I ended up being unable to use  
>> harvestr, because
>> it uses an old version of plyr, where parallelisation works only  
>> for multicore, not for
>> MPI.
>>
>> I pasted my working version, that does not do anything about  
>> starting values or RNG
>> at the end of this email. I can try to fumble further in the dark  
>> or try to update harvestr,
>> but maybe someone has gone through all this already.
>>
>> I'd also appreciate any tips for elegantly post-processing such  
>> parallel data, as some of my usual
>> extraction functions and routines are hampered by the fact that  
>> some coda functions
>> do not aggregate results over chains. (What I get from a  
>> single-chain summary in MCMCglmm
>> is a bit more comprehensive, than what I managed to cobble together  
>> with my own extraction
>> functions).
>>
>> The reason I'm parallelising my analyses is that I'm having trouble  
>> getting a good effective
>> sample size for any parameter having to do with the many zeroes in my data.
>> Any pointers are very appreciated, I'm quite inexperienced with MCMCglmm.
>>
>> Best wishes
>>
>> Ruben
>>
>> # bsub -q mpi-short -W 2:00 -n 42 -R np20 mpirun -H localhost -n 41  
>> R --slave -f "rpqa/rpqa_main/rpqa_children_parallel.r"
>> library(doMPI)
>> cl <- startMPIcluster()
>> registerDoMPI(cl)
>> Children_mcmc1 = foreach(i=1:40) %dopar% {
>> 	library(MCMCglmm)
>> 	load("rpqa1.rdata")
>>
>> 	nitt = 40000; thin = 100; burnin = 10000
>> 	prior = list(
>> 		R = list(V = diag(c(1,1)), nu = 0.002),
>> 		G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>> 	)
>>
>> 	MCMCglmm( children ~ trait -1 + at.level(trait,1):male +  
>> at.level(trait,1):urban + at.level(trait,1):spouses +  
>> at.level(trait,1):paternalage.mean +  
>> at.level(trait,1):paternalage.factor,
>> 		rcov=~us(trait):units,
>> 		random=~us(trait):idParents,
>> 		family="zapoisson",
>> 		prior = prior,
>> 		data=rpqa.1,
>> 		pr = F, saveX = T, saveZ = T,
>> 		nitt=nitt,thin=thin,burnin=burnin)
>> }
>>
>> library(coda)
>> mcmclist = mcmc.list(lapply(Children_mcmc1,FUN=function(x) { x$Sol}))
>> save(Children_mcmc1,mcmclist, file = "rpqa_mcmc_kids_za.rdata")
>> closeCluster(cl)
>> mpi.quit()
>>
>>
>> --
>> Ruben C. Arslan
>>
>> Georg August University G�ttingen
>> Biological Personality Psychology and Psychological Assessment
>> Georg Elias M�ller Institute of Psychology
>> Go�lerstr. 14
>> 37073 G�ttingen
>> Germany
>> Tel.: +49 551 3920704
>> https://psych.uni-goettingen.de/en/biopers/team/arslan
>>
>>
>>
>
>
> 	[[alternative HTML version deleted]]
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the R-sig-mixed-models mailing list