[R-sig-ME] parallel MCMCglmm, RNGstreams, starting values & priors

Jarrod Hadfield j.hadfield at ed.ac.uk
Fri Aug 29 12:02:59 CEST 2014


Hi,

You need to make sure that they are fixed at the same value, not just fixed.

Cheers,

Jarrod



Quoting Ruben Arslan <rubenarslan at gmail.com> on Fri, 29 Aug 2014  
11:07:21 +0200:

> Hi Jarrod,
>
> that was exactly it. I hadn't checked the $VCV mcmclist, but I'll do  
> so in the future
> as the mistake is blindingly obvious that way: http://imgur.com/nlA0QwZ
>
> After adding fix=2 to R1 in my starting values, my parallel chains  
> converged as well.
>
> For future amateurs reading this:
> prior <- list(
> 	R=list(V=diag(2), nu=1.002, fix=2),
> 	G=list(G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000))
> )
>
> start <- list(
> 	liab=c(rnorm( nrow(krmh.1)*2 )),
> 	R = list(R1 = rIW(diag(2), 10 , fix = 2)), ### FIX=2 HERE AS WELL
> 	G = list(G1 = rIW(diag(2), 10 ))
> )
>
> Thank you so much. This sort of remote diagnosis with this little information
> seems borderline psychic to me.
> Now, onwards to modelling the pedigrees :-)
>
> Best wishes,
>
> Ruben
>
> PS.: I never actually figured out what kind of variability to use  
> for rIW() in my starting values,
> but it doesn't seem to matter and that's what I should want I  
> suppose. Actually, even with the mis-specified
> starting values my posterior looked pretty similar to now.
>
> On 29 Aug 2014, at 09:13, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>
>> Hi Ruben,
>>
>> Actually I might know what it is. When you sample different  
>> starting values do you inadvertently sample a new residual variance  
>> for the unidentified za part? You need to make sure that this is  
>> always fixed at the same value (otherwise the model is different).  
>> This is not a problem under the trait:units specification.
>>
>> Cheers,
>>
>> Jarrod.
>>
>>
>>
>> Quoting Jarrod Hadfield <j.hadfield at ed.ac.uk> on Fri, 29 Aug 2014  
>> 08:04:42 +0100:
>>
>>> Hi Ruben,
>>>
>>> Can you share your data and I will take a look. Its definitely not  
>>> Monte Carlo error.
>>>
>>> Cheers,
>>>
>>> Jarrod
>>>
>>>
>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014  
>>> 22:44:47 +0200:
>>>
>>>> Hi Jarrod,
>>>>
>>>> those two matched up quite well yes. I just completed another 20  
>>>> chains, using more variable
>>>> starting values. There's still two fixed effects  
>>>> traitza_children:spouses  and :male which haven't converged
>>>> according to multi-chain (gelman), but have according to geweke.
>>>> The offending traces: http://imgur.com/Qm6Ovfr
>>>> These specific effects aren't of interest to me, so if this  
>>>> doesn't affect the rest of my estimates, I can be happy
>>>> with this, but I can't conclude that, can I?
>>>>
>>>> I'm now also doing a run to see how it deals with the more  
>>>> intensely zero-inflated data when including
>>>> the unmarried.
>>>>
>>>> Thanks a lot for all that help,
>>>>
>>>> Ruben
>>>>
>>>>> gelman.diag(mcmclist)
>>>> Potential scale reduction factors:
>>>>
>>>>                                          Point est. Upper C.I.
>>>> (Intercept)                                      1.00       1.00
>>>> traitza_children                                 1.40       1.65
>>>> male                                             1.00       1.00
>>>> spouses                                          1.00       1.00
>>>> paternalage.mean                                 1.00       1.00
>>>> paternalage.factor(25,30]                        1.00       1.00
>>>> paternalage.factor(30,35]                        1.00       1.00
>>>> paternalage.factor(35,40]                        1.00       1.00
>>>> paternalage.factor(40,45]                        1.00       1.00
>>>> paternalage.factor(45,50]                        1.00       1.00
>>>> paternalage.factor(50,55]                        1.00       1.00
>>>> paternalage.factor(55,90]                        1.00       1.00
>>>> traitza_children:male                            1.33       1.54
>>>> traitza_children:spouses                         2.21       2.83
>>>> traitza_children:paternalage.mean                1.01       1.02
>>>> traitza_children:paternalage.factor(25,30]       1.05       1.08
>>>> traitza_children:paternalage.factor(30,35]       1.08       1.13
>>>> traitza_children:paternalage.factor(35,40]       1.15       1.25
>>>> traitza_children:paternalage.factor(40,45]       1.15       1.26
>>>> traitza_children:paternalage.factor(45,50]       1.26       1.43
>>>> traitza_children:paternalage.factor(50,55]       1.15       1.25
>>>> traitza_children:paternalage.factor(55,90]       1.14       1.23
>>>>
>>>> Multivariate psrf
>>>>
>>>> 8.99
>>>>
>>>>> summary(mcmclist)
>>>>
>>>> Iterations = 100001:149951
>>>> Thinning interval = 50
>>>> Number of chains = 20
>>>> Sample size per chain = 1000
>>>>
>>>> 1. Empirical mean and standard deviation for each variable,
>>>>  plus standard error of the mean:
>>>>
>>>>                                              Mean      SD  Naive  
>>>> SE Time-series SE
>>>> (Intercept)                                 1.36326 0.04848  
>>>> 0.0003428      0.0003542
>>>> traitza_children                           -0.76679 0.28738  
>>>> 0.0020321      0.0016682
>>>> male                                        0.09980 0.01633  
>>>> 0.0001155      0.0001222
>>>> spouses                                     0.12333 0.01957  
>>>> 0.0001384      0.0001414
>>>> paternalage.mean                            0.07215 0.02194  
>>>> 0.0001551      0.0001596
>>>> paternalage.factor(25,30]                  -0.03381 0.04184  
>>>> 0.0002959      0.0003066
>>>> paternalage.factor(30,35]                  -0.08380 0.04270  
>>>> 0.0003019      0.0003118
>>>> paternalage.factor(35,40]                  -0.16502 0.04569  
>>>> 0.0003231      0.0003289
>>>> paternalage.factor(40,45]                  -0.16738 0.05090  
>>>> 0.0003599      0.0003697
>>>> paternalage.factor(45,50]                  -0.18383 0.05880  
>>>> 0.0004158      0.0004242
>>>> paternalage.factor(50,55]                  -0.18241 0.07277  
>>>> 0.0005146      0.0005302
>>>> paternalage.factor(55,90]                  -0.40612 0.09875  
>>>> 0.0006983      0.0007467
>>>> traitza_children:male                       0.12092 0.08223  
>>>> 0.0005815      0.0004697
>>>> traitza_children:spouses                    0.64881 0.21132  
>>>> 0.0014942      0.0008511
>>>> traitza_children:paternalage.mean          -0.02741 0.08550  
>>>> 0.0006046      0.0006221
>>>> traitza_children:paternalage.factor(25,30] -0.17296 0.18680  
>>>> 0.0013209      0.0013750
>>>> traitza_children:paternalage.factor(30,35] -0.19027 0.19267  
>>>> 0.0013624      0.0013901
>>>> traitza_children:paternalage.factor(35,40] -0.24911 0.21282  
>>>> 0.0015049      0.0014391
>>>> traitza_children:paternalage.factor(40,45] -0.29772 0.23403  
>>>> 0.0016548      0.0015956
>>>> traitza_children:paternalage.factor(45,50] -0.51782 0.28589  
>>>> 0.0020215      0.0017602
>>>> traitza_children:paternalage.factor(50,55] -0.46126 0.32064  
>>>> 0.0022673      0.0021397
>>>> traitza_children:paternalage.factor(55,90] -0.38612 0.41461  
>>>> 0.0029317      0.0027396
>>>>
>>>> 2. Quantiles for each variable:
>>>>
>>>>                                              2.5%      25%       
>>>> 50%      75%      97.5%
>>>> (Intercept)                                 1.26883  1.33106   
>>>> 1.36322  1.39575  1.4589722
>>>> traitza_children                           -1.20696 -0.95751  
>>>> -0.81076 -0.63308 -0.0365042
>>>> male                                        0.06785  0.08878   
>>>> 0.09970  0.11085  0.1320168
>>>> spouses                                     0.08467  0.11030   
>>>> 0.12343  0.13643  0.1617869
>>>> paternalage.mean                            0.02950  0.05751   
>>>> 0.07202  0.08683  0.1153881
>>>> paternalage.factor(25,30]                  -0.11581 -0.06174  
>>>> -0.03397 -0.00574  0.0473783
>>>> paternalage.factor(30,35]                  -0.16656 -0.11250  
>>>> -0.08358 -0.05519  0.0003065
>>>> paternalage.factor(35,40]                  -0.25518 -0.19530  
>>>> -0.16500 -0.13440 -0.0757366
>>>> paternalage.factor(40,45]                  -0.26887 -0.20164  
>>>> -0.16675 -0.13335 -0.0677407
>>>> paternalage.factor(45,50]                  -0.30080 -0.22320  
>>>> -0.18339 -0.14440 -0.0687967
>>>> paternalage.factor(50,55]                  -0.32663 -0.23034  
>>>> -0.18227 -0.13317 -0.0415547
>>>> paternalage.factor(55,90]                  -0.60202 -0.47303  
>>>> -0.40454 -0.33994 -0.2139128
>>>> traitza_children:male                      -0.01083  0.06634   
>>>> 0.11024  0.16109  0.3295892
>>>> traitza_children:spouses                    0.37857  0.51072   
>>>> 0.59398  0.71395  1.2127940
>>>> traitza_children:paternalage.mean          -0.19138 -0.08250  
>>>> -0.02985  0.02493  0.1468989
>>>> traitza_children:paternalage.factor(25,30] -0.57457 -0.28481  
>>>> -0.16489 -0.05151  0.1728148
>>>> traitza_children:paternalage.factor(30,35] -0.61499 -0.30350  
>>>> -0.17736 -0.06299  0.1555147
>>>> traitza_children:paternalage.factor(35,40] -0.74251 -0.36752  
>>>> -0.22966 -0.10777  0.1151897
>>>> traitza_children:paternalage.factor(40,45] -0.84165 -0.42691  
>>>> -0.27729 -0.14322  0.1032436
>>>> traitza_children:paternalage.factor(45,50] -1.21782 -0.66568  
>>>> -0.48420 -0.32873 -0.0476720
>>>> traitza_children:paternalage.factor(50,55] -1.21327 -0.63623  
>>>> -0.43432 -0.24957  0.0955360
>>>> traitza_children:paternalage.factor(55,90] -1.33772 -0.62227  
>>>> -0.35364 -0.11050  0.3361684
>>>>
>>>>> effectiveSize(mcmclist)
>>>>                              (Intercept)                           
>>>>  traitza_children
>>>>                                 18814.05                           
>>>>          16359.33
>>>>                                     male                           
>>>>           spouses
>>>>                                 18132.98                           
>>>>          19547.05
>>>>                         paternalage.mean                   
>>>> paternalage.factor(25,30]
>>>>                                 19238.72                           
>>>>          18974.81
>>>>                paternalage.factor(30,35]                   
>>>> paternalage.factor(35,40]
>>>>                                 18874.33                           
>>>>          19406.63
>>>>                paternalage.factor(40,45]                   
>>>> paternalage.factor(45,50]
>>>>                                 19075.18                           
>>>>          19401.77
>>>>                paternalage.factor(50,55]                   
>>>> paternalage.factor(55,90]
>>>>                                 18960.11                           
>>>>          17893.23
>>>>                    traitza_children:male                    
>>>> traitza_children:spouses
>>>>                                 18545.55                           
>>>>          14438.51
>>>>        traitza_children:paternalage.mean  
>>>> traitza_children:paternalage.factor(25,30]
>>>>                                 18464.09                           
>>>>          16943.43
>>>> traitza_children:paternalage.factor(30,35]  
>>>> traitza_children:paternalage.factor(35,40]
>>>>                                 16827.44                           
>>>>          17230.04
>>>> traitza_children:paternalage.factor(40,45]  
>>>> traitza_children:paternalage.factor(45,50]
>>>>                                 17144.78                           
>>>>          18191.67
>>>> traitza_children:paternalage.factor(50,55]  
>>>> traitza_children:paternalage.factor(55,90]
>>>>                                 17466.60                           
>>>>          18540.59
>>>>
>>>>
>>>> ### current script:
>>>>
>>>> # bsub -q mpi -W 24:00 -n 21 -R np20 mpirun -H localhost -n 21 R  
>>>> --slave -f "/usr/users/rarslan/rpqa/krmh_main/children.R"
>>>> setwd("/usr/users/rarslan/rpqa/")
>>>> library(doMPI)
>>>> cl <-  
>>>> startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/krmh_main/")
>>>> registerDoMPI(cl)
>>>> Children = foreach(i=1:clusterSize(cl),.options.mpi =  
>>>> list(seed=1337) ) %dopar% {
>>>> 	library(MCMCglmm);library(data.table)
>>>>   setwd("/usr/users/rarslan/rpqa/krmh_main/")
>>>> 	source("../1 - extraction functions.r")
>>>>   load("../krmh1.rdata")
>>>>
>>>> 	krmh.1 = recenter.pat(na.omit(krmh.1[spouses>0, list(idParents,  
>>>> children, male, spouses, paternalage)]))
>>>>
>>>> 	samples = 1000
>>>> 	thin = 50; burnin = 100000
>>>> 	nitt = samples * thin + burnin
>>>>
>>>> 	prior <- list(
>>>> 		R=list(V=diag(2), nu=1.002, fix=2),
>>>> 		G=list(G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000))
>>>> 	)
>>>>
>>>> 	start <- list(
>>>> 		liab=c(rnorm( nrow(krmh.1)*2 )),
>>>> 		R = list(R1 = rIW(diag(2), 10 )),
>>>> 		G = list(G1 = rIW(diag(2), 10 ))
>>>> 	)
>>>>
>>>> 	( m1 = MCMCglmm( children ~ trait * (male + spouses +  
>>>> paternalage.mean + paternalage.factor),
>>>> 						rcov=~idh(trait):units,
>>>> 						random=~idh(trait):idParents,
>>>> 						family="zapoisson",
>>>> 						start = start,
>>>> 						prior = prior,
>>>> 						data=krmh.1,
>>>> 						pr = F, saveX = F, saveZ = F,
>>>> 						nitt=nitt,thin=thin,burnin=burnin)
>>>> 	)
>>>> 		m1$Residual$nrt<-2
>>>> 	m1
>>>> }
>>>>
>>>> save(Children,file = "Children.rdata")
>>>> closeCluster(cl)
>>>> mpi.quit()
>>>>
>>>> On 28 Aug 2014, at 20:59, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The posteriors for the two models look pretty close to me. Are  
>>>>> the scale reduction factors really as high as previously  
>>>>> reported? Before you had 1.83 for traitza_children:spouses, but  
>>>>> the plot suggests that it should be close to 1?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jarrod
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014  
>>>>> 19:59:16 +0200:
>>>>>
>>>>>> Sure! Thanks a lot.
>>>>>> I am using ~idh(trait):units already, sorry for saying that  
>>>>>> incorrectly in my last email.
>>>>>> These models aren't the final thing, I will replace the  
>>>>>> paternalage.factor variable
>>>>>> with its linear equivalent if that seems defensible (does so  
>>>>>> far) and in this model it seems
>>>>>> okay to remove the za-effects for all predictors except spouses.
>>>>>> So a final model would have fewer fixed effects. I also have  
>>>>>> datasets of 200k+ and 5m+,
>>>>>> but I'm learning MCMCglmm with this smaller one because my  
>>>>>> wrong turns take less time.
>>>>>>
>>>>>> I've uploaded a comparison coef plot of two models:
>>>>>> http://i.imgur.com/sHUfnmd.png
>>>>>> m7 is with the default starting values, m1 is with the  
>>>>>> specification I sent in my last email. I don't
>>>>>> know if such differences are something to worry about.
>>>>>>
>>>>>> I don't know what qualifies as highly overdispersed, here's a  
>>>>>> plot of the outcome for ever
>>>>>> married people (slate=real data):
>>>>>> http://imgur.com/14MywgZ
>>>>>> here's with everybody born (incl. some stillborn etc.):
>>>>>> http://imgur.com/knRGa1v
>>>>>> I guess my approach (generating an overdispersed poisson with  
>>>>>> the parameters from
>>>>>> the data and checking if it has as excess zeroes) is not the  
>>>>>> best way to diagnose zero-inflation,
>>>>>> but especially in the second case it seems fairly clear-cut.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Ruben
>>>>>>
>>>>>>> summary(m1)
>>>>>>
>>>>>> Iterations = 50001:149951
>>>>>> Thinning interval  = 50
>>>>>> Sample size  = 2000
>>>>>>
>>>>>> DIC: 31249.73
>>>>>>
>>>>>> G-structure:  ~idh(trait):idParents
>>>>>>
>>>>>>                    post.mean  l-95% CI u-95% CI eff.samp
>>>>>> children.idParents     0.006611 4.312e-08   0.0159    523.9
>>>>>> za_children.idParents  0.193788 7.306e-02   0.3283    369.3
>>>>>>
>>>>>> R-structure:  ~idh(trait):units
>>>>>>
>>>>>>                post.mean l-95% CI u-95% CI eff.samp
>>>>>> children.units       0.1285   0.1118   0.1452    716.1
>>>>>> za_children.units    0.9950   0.9950   0.9950      0.0
>>>>>>
>>>>>> Location effects: children ~ trait * (male + spouses +  
>>>>>> paternalage.mean + paternalage.factor)
>>>>>>
>>>>>>                                          post.mean   l-95% CI    
>>>>>> u-95% CI eff.samp  pMCMC
>>>>>> (Intercept)                                 1.3413364   
>>>>>> 1.2402100  1.4326099     1789 <5e-04 ***
>>>>>> traitza_children                           -0.8362879  
>>>>>> -1.2007980 -0.5016730     1669 <5e-04 ***
>>>>>> male                                        0.0994902   
>>>>>> 0.0679050  0.1297394     2000 <5e-04 ***
>>>>>> spouses                                     0.1236033   
>>>>>> 0.0839000  0.1624939     2000 <5e-04 ***
>>>>>> paternalage.mean                            0.0533892   
>>>>>> 0.0119569  0.0933960     2000  0.015 *
>>>>>> paternalage.factor(25,30]                  -0.0275822  
>>>>>> -0.1116421  0.0537359     1842  0.515
>>>>>> paternalage.factor(30,35]                  -0.0691025  
>>>>>> -0.1463214  0.0122393     1871  0.097 .
>>>>>> paternalage.factor(35,40]                  -0.1419933  
>>>>>> -0.2277379 -0.0574678     1845 <5e-04 ***
>>>>>> paternalage.factor(40,45]                  -0.1364952  
>>>>>> -0.2362714 -0.0451874     1835  0.007 **
>>>>>> paternalage.factor(45,50]                  -0.1445342  
>>>>>> -0.2591767 -0.0421178     1693  0.008 **
>>>>>> paternalage.factor(50,55]                  -0.1302972  
>>>>>> -0.2642965  0.0077061     2000  0.064 .
>>>>>> paternalage.factor(55,90]                  -0.3407879  
>>>>>> -0.5168972 -0.1493652     1810 <5e-04 ***
>>>>>> traitza_children:male                       0.0926888  
>>>>>> -0.0147379  0.2006142     1901  0.098 .
>>>>>> traitza_children:spouses                    0.5531197   
>>>>>> 0.3870616  0.7314289     1495 <5e-04 ***
>>>>>> traitza_children:paternalage.mean           0.0051463  
>>>>>> -0.1279396  0.1460099     1617  0.960
>>>>>> traitza_children:paternalage.factor(25,30] -0.1538957  
>>>>>> -0.4445749  0.1462955     1781  0.321
>>>>>> traitza_children:paternalage.factor(30,35] -0.1747883  
>>>>>> -0.4757851  0.1162476     1998  0.261
>>>>>> traitza_children:paternalage.factor(35,40] -0.2261843  
>>>>>> -0.5464379  0.0892582     1755  0.166
>>>>>> traitza_children:paternalage.factor(40,45] -0.2807543  
>>>>>> -0.6079678  0.0650281     1721  0.100 .
>>>>>> traitza_children:paternalage.factor(45,50] -0.4905843  
>>>>>> -0.8649214 -0.1244174     1735  0.010 **
>>>>>> traitza_children:paternalage.factor(50,55] -0.4648579  
>>>>>> -0.9215759 -0.0002083     1687  0.054 .
>>>>>> traitza_children:paternalage.factor(55,90] -0.3945406  
>>>>>> -1.0230155  0.2481568     1793  0.195
>>>>>> ---
>>>>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>>>>
>>>>>>> describe(krmh.1[spouses>0,])
>>>>>>                  vars    n mean   sd median trimmed  mad   min   
>>>>>>  max range skew kurtosis   se
>>>>>> children               2 6829 3.81 2.93   4.00    3.61 2.97   
>>>>>> 0.00 16.00 16.00 0.47    -0.46 0.04
>>>>>> male                   3 6829 0.46 0.50   0.00    0.45 0.00   
>>>>>> 0.00  1.00  1.00 0.14    -1.98 0.01
>>>>>> spouses                4 6829 1.14 0.38   1.00    1.03 0.00   
>>>>>> 1.00  4.00  3.00 2.87     8.23 0.00
>>>>>> paternalage            5 6829 3.65 0.80   3.57    3.60 0.80   
>>>>>> 1.83  7.95  6.12 0.69     0.70 0.01
>>>>>> paternalage_c          6 6829 0.00 0.80  -0.08   -0.05 0.80  
>>>>>> -1.82  4.30  6.12 0.69     0.70 0.01
>>>>>> paternalage.mean       7 6829 0.00 0.68  -0.08   -0.05 0.59  
>>>>>> -1.74  4.30  6.04 0.95     1.97 0.01
>>>>>> paternalage.diff       8 6829 0.00 0.42   0.00   -0.01 0.38  
>>>>>> -1.51  1.48  2.99 0.17     0.17 0.01
>>>>>>
>>>>>>> table(krmh.1$paternalage.factor)
>>>>>>
>>>>>> [0,25] (25,30] (30,35] (35,40] (40,45] (45,50] (50,55] (55,90]
>>>>>>  309    1214    1683    1562    1039     623     269     130
>>>>>>
>>>>>> On 28 Aug 2014, at 19:05, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>>>>>
>>>>>>> Hi Ruben,
>>>>>>>
>>>>>>> It might be hard to detect (near) ECPs with so many fixed  
>>>>>>> effects (can you post the model summary (and give us the mean  
>>>>>>> and standard deviation of any continuous covariates)). Also,  
>>>>>>> the complementary log-log link (which is the za specification)  
>>>>>>> is non-symmetric and runs into problems outside the range -35  
>>>>>>> to 3.5 so there may be a problem there, particularly if you  
>>>>>>> use rcov=~trait:units and the Poisson part is highly  
>>>>>>> over-dispersed.  You could try rcov=~idh(trait):units and fix  
>>>>>>> the non-identifiable za residual variance to something smaller  
>>>>>>> than 1 (say 0.5)  - it will mix slower but it will reduce the  
>>>>>>> chance of over/underflow.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Jarrod
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug  
>>>>>>> 2014 18:45:30 +0200:
>>>>>>>
>>>>>>>> Hi Jarrod,
>>>>>>>>
>>>>>>>>> 1) it did not return an error with rcov = ~trait:units  
>>>>>>>>> because you used R1=rpois(2,1)+1 and yet this specification  
>>>>>>>>> only fits a single variance (not a 2x2 covariance matrix).  
>>>>>>>>> R1=rpois(2,1)+1 is a bit of a weird specification since it  
>>>>>>>>> has to be integer. I would obtain starting values using rIW().
>>>>>>>>
>>>>>>>> I agree it's a weird specification, I was a bit lost and  
>>>>>>>> thought I could get away with just putting some random  
>>>>>>>> numbers in the starting value.
>>>>>>>> I didn't do R1=rpois(2,1)+1 though, I did  
>>>>>>>> R1=diag(rpois(2,1)+1), so I got a 2x2 matrix, but yes, bound  
>>>>>>>> to be integer.
>>>>>>>> I didn't know starting values should come from a conjugate  
>>>>>>>> distribution, though that probably means I didn't think about  
>>>>>>>> it much.
>>>>>>>>
>>>>>>>> I'm now doing
>>>>>>>> start <- list(
>>>>>>>> 	liab=c(rnorm( nrow(krmh.1)*2 )),
>>>>>>>> 	R = list(R1 = rIW( diag(2), nrow(krmh.1)) ),
>>>>>>>> 	G = list(G1 = rIW( diag(2), nrow(krmh.1)) )
>>>>>>>> )
>>>>>>>>
>>>>>>>> Is this what you had in mind?
>>>>>>>> I am especially unsure if I am supposed to use such a low  
>>>>>>>> sampling variability (my sample size is probably not even  
>>>>>>>> relevant for the starting values) and if I should start from  
>>>>>>>> diag(2).
>>>>>>>>
>>>>>>>> And, I am still happily confused that this specification  
>>>>>>>> still doesn't lead to errors with respect to rcov =  
>>>>>>>> ~trait:units . Does this mean I'm doing it wrong?
>>>>>>>>
>>>>>>>>> 3) a) how many effective samples do you have for each  
>>>>>>>>> parameter? and b) are you getting extreme category  
>>>>>>>>> problems/numerical issues? If you store the latent variables  
>>>>>>>>> (pl=TUE) what is their range for the Zi/za part?
>>>>>>>>
>>>>>>>> My parallel run using the above starting values isn't finished yet.
>>>>>>>> a) After applying the above starting values I get, for the  
>>>>>>>> location effects 1600-2000 samples for a 2000 sample chain  
>>>>>>>> (with thin set to 50). G and R-structure are from 369  
>>>>>>>> (za_children.idParents) to 716 (and 0 for the fixed part).
>>>>>>>> Effective sample sizes were similar for my run using the  
>>>>>>>> starting values for G/R that I drew from rpois, and using 40  
>>>>>>>> chains I of course get
>>>>>>>> b) I don't think I am getting extreme categories. I would  
>>>>>>>> probably be getting extreme categories if I included the  
>>>>>>>> forever-alones (they almost never have children), but this  
>>>>>>>> way no.
>>>>>>>> I wasn't sure how to examine the range of the latents  
>>>>>>>> separately for the za part, but for a single chain it looks  
>>>>>>>> okay:
>>>>>>>>> quantile(as.numeric(m1$Liab),probs = c(0,0.01,0,0.99,1))
>>>>>>>>    0%        1%        0%       99%      100%
>>>>>>>> -4.934111 -1.290728 -4.934111  3.389847  7.484206
>>>>>>>>
>>>>>>>> Well, all considered now that I use the above starting value  
>>>>>>>> specification I get slightly different estimates for all  
>>>>>>>> za-coefficients. Nothing major, but still leading me to
>>>>>>>> think my estimates aren't exactly independent of the starting  
>>>>>>>> values I use. I'll see what the parallel run yields.
>>>>>>>>
>>>>>>>> Thanks a lot,
>>>>>>>>
>>>>>>>> Ruben
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Jarrod
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Wed, 27 Aug  
>>>>>>>>> 2014 19:23:42 +0200:
>>>>>>>>>
>>>>>>>>>> Hi Jarrod,
>>>>>>>>>>
>>>>>>>>>> thanks again. I was able to get it running with your advice.
>>>>>>>>>> Some points of confusion remain:
>>>>>>>>>>
>>>>>>>>>> - You wrote that zi/za models would return an error with  
>>>>>>>>>> rcov = ~trait:units + starting values. This did not happen  
>>>>>>>>>> in my case, so I didn't build MCMCglmm myself with your  
>>>>>>>>>> suggested edits. Also, have you considered putting your own  
>>>>>>>>>> MCMCglmm repo on Github? Your users would be able to  
>>>>>>>>>> install pre-releases and I'd think you'd get some  
>>>>>>>>>> time-saving pull requests too.
>>>>>>>>>> - In my attempts to get my models to run properly, I messed  
>>>>>>>>>> up a prior and did not use fix=2 in my prior specification  
>>>>>>>>>> for my za models. This led to crappy convergence, it's much  
>>>>>>>>>> better now and for some of my simpler models I think I  
>>>>>>>>>> won't need parallel chains. I'm reminded of Gelman's folk  
>>>>>>>>>> theorem of statistical computing.
>>>>>>>>>> - I followed your advice, but of course I could not set the  
>>>>>>>>>> true values as starting values, but wanted to set random,  
>>>>>>>>>> bad starting values. I pasted below what I arrived at, I'm  
>>>>>>>>>> especially unsure whether I specified the starting values  
>>>>>>>>>> for G and R properly (I think not).
>>>>>>>>>> 	start <- list(
>>>>>>>>>> 		liab=c(rnorm( nrow(krmh.1)*2 )),
>>>>>>>>>> 		R = list(R1 = diag(rpois(2, 1)+1)),
>>>>>>>>>> 		G = list(G1 = diag(rpois(2, 1)+1))
>>>>>>>>>> 	)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> However, even though I may not need multiple chains for  
>>>>>>>>>> some of my simpler models, I've now run into conflicting  
>>>>>>>>>> diagnostics. The geweke.diag for each chain (and  
>>>>>>>>>> examination of the traces) gives
>>>>>>>>>> satisfactory diagnostics. Comparing multiple chains using  
>>>>>>>>>> gelman.diag, however, leads to one bad guy, namely the  
>>>>>>>>>> traitza_children:spouses interaction.
>>>>>>>>>> I think this implies that I've got some starting value  
>>>>>>>>>> dependence for this parameter, that won't be easily  
>>>>>>>>>> rectified through longer burnin?
>>>>>>>>>> Do you have any ideas how to rectify this?
>>>>>>>>>> I am currently doing sequential analyses on episodes of  
>>>>>>>>>> selection and in historical human data only those who marry  
>>>>>>>>>> have a chance at having kids. I exclude the unmarried
>>>>>>>>>> from my sample where I predict number of children, because  
>>>>>>>>>> I examine that in a previous model and the zero-inflation  
>>>>>>>>>> (65% zeros, median w/o unmarried = 4) when including the  
>>>>>>>>>> unmarried is so excessive.
>>>>>>>>>> Number of spouses is easily the strongest predictor in the  
>>>>>>>>>> model, but only serves as a covariate here. Since my other  
>>>>>>>>>> estimates are stable across chains and runs and agree well  
>>>>>>>>>> across models and with theory, I'm
>>>>>>>>>> inclined to shrug this off. But probably I shouldn't ignore  
>>>>>>>>>> this sign of non-convergence?
>>>>>>>>>>
>>>>>>>>>>> gelman.diag(mcmc_1)
>>>>>>>>>> Potential scale reduction factors:
>>>>>>>>>>
>>>>>>>>>>                                       Point est. Upper C.I.
>>>>>>>>>> (Intercept)                                      1.00       1.00
>>>>>>>>>> traitza_children                                 1.27       1.39
>>>>>>>>>> male                                             1.00       1.00
>>>>>>>>>> spouses                                          1.00       1.00
>>>>>>>>>> paternalage.mean                                 1.00       1.00
>>>>>>>>>> paternalage.factor(25,30]                        1.00       1.00
>>>>>>>>>> paternalage.factor(30,35]                        1.00       1.00
>>>>>>>>>> paternalage.factor(35,40]                        1.00       1.00
>>>>>>>>>> paternalage.factor(40,45]                        1.00       1.00
>>>>>>>>>> paternalage.factor(45,50]                        1.00       1.00
>>>>>>>>>> paternalage.factor(50,55]                        1.00       1.00
>>>>>>>>>> paternalage.factor(55,90]                        1.00       1.00
>>>>>>>>>> traitza_children:male                            1.22       1.32
>>>>>>>>>> traitza_children:spouses                         1.83       2.13
>>>>>>>>>> traitza_children:paternalage.mean                1.02       1.02
>>>>>>>>>> traitza_children:paternalage.factor(25,30]       1.03       1.05
>>>>>>>>>> traitza_children:paternalage.factor(30,35]       1.05       1.08
>>>>>>>>>> traitza_children:paternalage.factor(35,40]       1.10       1.15
>>>>>>>>>> traitza_children:paternalage.factor(40,45]       1.12       1.17
>>>>>>>>>> traitza_children:paternalage.factor(45,50]       1.19       1.28
>>>>>>>>>> traitza_children:paternalage.factor(50,55]       1.12       1.18
>>>>>>>>>> traitza_children:paternalage.factor(55,90]       1.11       1.17
>>>>>>>>>>
>>>>>>>>>> Multivariate psrf
>>>>>>>>>>
>>>>>>>>>> 7.27
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> Ruben
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 26 Aug 2014, at 13:04, Jarrod Hadfield  
>>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>
>>>>>>>>>>> There are 400 liabilities in a zapoisson model (2 per  
>>>>>>>>>>> datum). This code should work:
>>>>>>>>>>>
>>>>>>>>>>> g <-sample(letters[1:10], size = 200, replace = T)
>>>>>>>>>>> pred <- rnorm(200)
>>>>>>>>>>>
>>>>>>>>>>> l1<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>> l2<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>
>>>>>>>>>>> y<-VGAM::rzapois(200, exp(l1), exp(-exp(l2)))
>>>>>>>>>>>
>>>>>>>>>>> # generate zero-altered data with an intercept of -1  
>>>>>>>>>>> (because the intercept and variance are the same for both  
>>>>>>>>>>> processes this is just standard Poisson)
>>>>>>>>>>>
>>>>>>>>>>> dat<-data.frame(y=y, g = g, pred = pred)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> start.1<-list(liab=c(l1,l2), R = list(R1=diag(2)),  
>>>>>>>>>>> G=list(G1=diag(2)))
>>>>>>>>>>> prior.1<-list(R=list(V=diag(2), nu=1.002, fix=2),  
>>>>>>>>>>> G=list(G1=list(V=diag(2), nu=2, alpha.mu=c(0,0),  
>>>>>>>>>>> alpha.V=diag(2)*1000)))
>>>>>>>>>>>
>>>>>>>>>>> m1<-MCMCglmm(y~trait + pred:trait, random=~us(trait):g,  
>>>>>>>>>>> family="zapoisson",rcov=~idh(trait):units, data=dat,  
>>>>>>>>>>> prior=prior.1, start= start.1)
>>>>>>>>>>>
>>>>>>>>>>> However, there are 2 bugs in the current version of  
>>>>>>>>>>> MCMCglmm that return an error message when the  
>>>>>>>>>>> documentation implies it should be fine:
>>>>>>>>>>>
>>>>>>>>>>> a) it should be possible to have R=diag(2) rather than R =  
>>>>>>>>>>> list(R1=diag(2)). This bug cropped up when I implemented  
>>>>>>>>>>> block-diagonal R structures. It can be fixed by inserting:
>>>>>>>>>>>
>>>>>>>>>>>     if(!is.list(start$R)){
>>>>>>>>>>>        start$R<-list(R1=start$R)
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>> on L514 of MCMCglmm.R below
>>>>>>>>>>>
>>>>>>>>>>>     if(!is.list(prior$R[[1]])){
>>>>>>>>>>>        prior$R<-list(R1=prior$R)
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>> b) rcov=~trait:units models for zi/za models will return  
>>>>>>>>>>> an error when passing starting values. To fix this insert
>>>>>>>>>>>
>>>>>>>>>>>    if(diagR==3){
>>>>>>>>>>>      if(dim(start)[1]!=1){
>>>>>>>>>>>        stop("V is the wrong dimension for some  
>>>>>>>>>>> strart$G/start$R elements")
>>>>>>>>>>>      }
>>>>>>>>>>>      start<-diag(sum(nfl))*start[1]
>>>>>>>>>>>    }
>>>>>>>>>>>
>>>>>>>>>>> on L90 of priorfromat.R below
>>>>>>>>>>>
>>>>>>>>>>>    if(is.matrix(start)==FALSE){
>>>>>>>>>>>      start<-as.matrix(start)
>>>>>>>>>>>    }
>>>>>>>>>>>
>>>>>>>>>>> I will put these in the new version.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Jarrod
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25  
>>>>>>>>>>> Aug 2014 21:52:30 +0200:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Jarrod,
>>>>>>>>>>>>
>>>>>>>>>>>> thanks for these pointers.
>>>>>>>>>>>>
>>>>>>>>>>>>>> You will need to provide over-dispersed starting values  
>>>>>>>>>>>>>> for multiple-chain convergence diagnostics to be useful  
>>>>>>>>>>>>>> (GLMM are so simple I am generally happy if the output  
>>>>>>>>>>>>>> of a single run looks reasonable).
>>>>>>>>>>>>
>>>>>>>>>>>> Oh, I would be happy with single chains, but since  
>>>>>>>>>>>> computation would take weeks this way, I wanted to  
>>>>>>>>>>>> parallelise and I would use the multi-chain convergence  
>>>>>>>>>>>> as a criterion that my parallelisation was proper
>>>>>>>>>>>> and is as informative as a single long chain. There don't  
>>>>>>>>>>>> seem to be any such checks built-in – I was analysing my  
>>>>>>>>>>>> 40 chains for a bit longer than I like to admit until I  
>>>>>>>>>>>> noticed they were identical (effectiveSize
>>>>>>>>>>>> and summary.mcmc.list did not yell at me for this).
>>>>>>>>>>>>
>>>>>>>>>>>>>> # use some very bad starting values
>>>>>>>>>>>> I get that these values are bad, but that is the goal for  
>>>>>>>>>>>> my multi-chain aim, right?
>>>>>>>>>>>>
>>>>>>>>>>>> I can apply this to my zero-truncated model, but am again  
>>>>>>>>>>>> getting stuck with the zero-altered one.
>>>>>>>>>>>> Maybe I need only specify the Liab values for this?
>>>>>>>>>>>> At least I'm getting nowhere with specifying R and G  
>>>>>>>>>>>> starting values here. When I got an error, I always
>>>>>>>>>>>> went to the MCMCglmm source to understand why the checks  
>>>>>>>>>>>> failed, but I didn't always understand
>>>>>>>>>>>> what was being checked and couldn't get it to work.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's a failing example:
>>>>>>>>>>>>
>>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>>> g = sample(letters[1:10], size = 200, replace = T)
>>>>>>>>>>>> pred = rnorm(200)
>>>>>>>>>>>> y<-rpois(200,exp(l)-t)
>>>>>>>>>>>> y[1:40] = 0
>>>>>>>>>>>> # generate zero-altered data with an intercept of -1
>>>>>>>>>>>>
>>>>>>>>>>>> dat<-data.frame(y=y, g = g, pred = pred)
>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>> start_true = list(Liab=l, R = 1, G = 1 )
>>>>>>>>>>>> m1<-MCMCglmm(y~1 + pred,random = ~ g,  
>>>>>>>>>>>> family="zapoisson",rcov=~us(trait):units, data=dat,  
>>>>>>>>>>>> start= start_true)
>>>>>>>>>>>>
>>>>>>>>>>>> # use true latent variable as starting values
>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>> # use some very bad starting values
>>>>>>>>>>>> start_rand = list(Liab=rnorm(200), R = rpois(1, 1)+1, G =  
>>>>>>>>>>>> rpois(1, 1)+1 )
>>>>>>>>>>>> m2<-MCMCglmm(y~1 + pred,random = ~  
>>>>>>>>>>>> g,rcov=~us(trait):units,  family="zapoisson", data=dat,  
>>>>>>>>>>>> start = start_rand)
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>>
>>>>>>>>>>>> Ruben
>>>>>>>>>>>>
>>>>>>>>>>>> On 25 Aug 2014, at 18:29, Jarrod Hadfield  
>>>>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry  - I was wrong when I said that everything is  
>>>>>>>>>>>>> Gibbs sampled conditional on the latent variables. The  
>>>>>>>>>>>>> location effects (fixed and random effects) are also  
>>>>>>>>>>>>> sampled conditional on the (co)variance components so  
>>>>>>>>>>>>> you should add them to the starting values. In the case  
>>>>>>>>>>>>> where the true values are used:
>>>>>>>>>>>>>
>>>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat,  
>>>>>>>>>>>>> start=list(Liab=l,R=1))
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jarrod
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Quoting Jarrod Hadfield <j.hadfield at ed.ac.uk> on Mon, 25  
>>>>>>>>>>>>> Aug 2014 17:14:14 +0100:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You will need to provide over-dispersed starting values  
>>>>>>>>>>>>>> for multiple-chain convergence diagnostics to be useful  
>>>>>>>>>>>>>> (GLMM are so simple I am generally happy if the output  
>>>>>>>>>>>>>> of a single run looks reasonable).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With non-Gaussian data everything is Gibbs sampled  
>>>>>>>>>>>>>> conditional on the latent variables, so you only need  
>>>>>>>>>>>>>> to pass them:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>>>>> y<-rpois(200,exp(l)-t)+1
>>>>>>>>>>>>>> # generate zero-truncated data with an intercept of -1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> dat<-data.frame(y=y)
>>>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat,  
>>>>>>>>>>>>>> start=list(Liab=l))
>>>>>>>>>>>>>> # use true latent variable as starting values
>>>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>>>> m2<-MCMCglmm(y~1, family="ztpoisson", data=dat,  
>>>>>>>>>>>>>> start=list(Liab=rnorm(200)))
>>>>>>>>>>>>>> # use some very bad starting values
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> plot(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>>>> # not identical despite the same seed because of  
>>>>>>>>>>>>>> different starting values but clearly sampling the same  
>>>>>>>>>>>>>> posterior distribution:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gelman.diag(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jarrod
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25  
>>>>>>>>>>>>>> Aug 2014 18:00:08 +0200:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dear Jarrod,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> thanks for the quick reply. Please, don't waste time  
>>>>>>>>>>>>>>> looking into doMPI – I am happy that I
>>>>>>>>>>>>>>> get the expected result, when I specify that  
>>>>>>>>>>>>>>> reproducible seed, whyever that may be.
>>>>>>>>>>>>>>> I'm pretty sure that is the deciding factor, because I  
>>>>>>>>>>>>>>> tested it explicitly, I just have no idea
>>>>>>>>>>>>>>> how/why it interacts with the choice of family.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That said, is setting up different RNG streams for my  
>>>>>>>>>>>>>>> workers (now that it works) __sufficient__
>>>>>>>>>>>>>>> so that I get independent chains and can use  
>>>>>>>>>>>>>>> gelman.diag() for convergence diagnostics?
>>>>>>>>>>>>>>> Or should I still tinker with the starting values myself?
>>>>>>>>>>>>>>> I've never found a worked example of supplying  
>>>>>>>>>>>>>>> starting values and am thus a bit lost.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sorry for sending further questions, I hope someone  
>>>>>>>>>>>>>>> else takes pity while
>>>>>>>>>>>>>>> you're busy with lectures.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ruben
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 25 Aug 2014, at 17:29, Jarrod Hadfield  
>>>>>>>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I do not think the issue is with the starting values,  
>>>>>>>>>>>>>>>> because even if the same starting values were used  
>>>>>>>>>>>>>>>> the chains would still differ because of the  
>>>>>>>>>>>>>>>> randomness in the Markov Chain (if I interpret your  
>>>>>>>>>>>>>>>> `identical' test correctly). This just involves a  
>>>>>>>>>>>>>>>> call to GetRNGstate() in the C++ code (L 871  
>>>>>>>>>>>>>>>> ofMCMCglmm.cc) so I think for some reason  
>>>>>>>>>>>>>>>> doMPI/foreach is not doing what you expect. I am not  
>>>>>>>>>>>>>>>> familiar with doMPI and am in the middle of writing  
>>>>>>>>>>>>>>>> lectures so haven't got time to look into it  
>>>>>>>>>>>>>>>> carefully. Outside of the context of doMPI I get the  
>>>>>>>>>>>>>>>> behaviour I expect:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>>>>>>> y<-rpois(200,exp(l)-t)+1
>>>>>>>>>>>>>>>> # generate zero-truncated data with an intercept of -1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> dat<-data.frame(y=y)
>>>>>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>>>> set.seed(2)
>>>>>>>>>>>>>>>> m2<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>>>> set.seed(2)
>>>>>>>>>>>>>>>> m3<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> plot(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>>>>>> # different, as expected
>>>>>>>>>>>>>>>> plot(mcmc.list(m2$Sol, m3$Sol))
>>>>>>>>>>>>>>>> # the same, as expected
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon,  
>>>>>>>>>>>>>>>> 25 Aug 2014 16:58:06 +0200:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Dear list,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> sorry for bumping my old post, I hope to elicit a  
>>>>>>>>>>>>>>>>> response with a more focused question:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> When does MCMCglmm automatically start from  
>>>>>>>>>>>>>>>>> different values when using doMPI/foreach?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have done some tests with models of varying  
>>>>>>>>>>>>>>>>> complexity. For example, the script in my last
>>>>>>>>>>>>>>>>> post (using "zapoisson") yielded 40 identical chains:
>>>>>>>>>>>>>>>>>> identical(mcmclist[1], mcmclist[30])
>>>>>>>>>>>>>>>>> TRUE
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> A simpler (?) model (using "ztpoisson" and no  
>>>>>>>>>>>>>>>>> specified prior), however, yielded different chains
>>>>>>>>>>>>>>>>> and I could use them to calculate gelman.diag()
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Changing my script to the version below, i.e.  
>>>>>>>>>>>>>>>>> seeding foreach using .options.mpi=list( seed= 1337)
>>>>>>>>>>>>>>>>> so as to make RNGstreams reproducible (or so I   
>>>>>>>>>>>>>>>>> thought), led to different chains even for the
>>>>>>>>>>>>>>>>> "zapoisson" model.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In no case have I (successfully) tried to supplant  
>>>>>>>>>>>>>>>>> the default of MCMCglmm's "start" argument.
>>>>>>>>>>>>>>>>> Is starting my models from different RNGsubstreams  
>>>>>>>>>>>>>>>>> inadequate compared to manipulating
>>>>>>>>>>>>>>>>> the start argument explicitly? If so, is there any  
>>>>>>>>>>>>>>>>> worked example of explicit starting value manipulation
>>>>>>>>>>>>>>>>> in parallel computation?
>>>>>>>>>>>>>>>>> I've browsed the MCMCglmm source to understand how  
>>>>>>>>>>>>>>>>> the default starting values are generated,
>>>>>>>>>>>>>>>>> but didn't find any differences with respect to RNG  
>>>>>>>>>>>>>>>>> for the two families "ztpoisson" and "zapoisson"
>>>>>>>>>>>>>>>>> (granted, I did not dig very deep).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ruben Arslan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # bsub -q mpi -W 12:00 -n 41 -R np20 mpirun -H  
>>>>>>>>>>>>>>>>> localhost -n 41 R --slave -f  
>>>>>>>>>>>>>>>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_children_parallel.R"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> library(doMPI)
>>>>>>>>>>>>>>>>> cl <-  
>>>>>>>>>>>>>>>>> startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/rpqa_main/")
>>>>>>>>>>>>>>>>> registerDoMPI(cl)
>>>>>>>>>>>>>>>>> Children_mcmc1 =  
>>>>>>>>>>>>>>>>> foreach(i=1:clusterSize(cl),.options.mpi =  
>>>>>>>>>>>>>>>>> list(seed=1337) ) %dopar% {
>>>>>>>>>>>>>>>>> 	library(MCMCglmm);library(data.table)
>>>>>>>>>>>>>>>>> 	load("/usr/users/rarslan/rpqa/rpqa1.rdata")
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 	nitt = 130000; thin = 100; burnin = 30000
>>>>>>>>>>>>>>>>> 	prior.m5d.2 = list(
>>>>>>>>>>>>>>>>> 		R = list(V = diag(c(1,1)), nu = 0.002),
>>>>>>>>>>>>>>>>> 		G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>>>>>>>>>>>>>>>> 	)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 	rpqa.1 = na.omit(rpqa.1[spouses>0, list(idParents,  
>>>>>>>>>>>>>>>>> children, male, urban, spouses, paternalage.mean,  
>>>>>>>>>>>>>>>>> paternalage.factor)])
>>>>>>>>>>>>>>>>> 	(m1 = MCMCglmm( children ~ trait * (male + urban +  
>>>>>>>>>>>>>>>>> spouses + paternalage.mean + paternalage.factor),
>>>>>>>>>>>>>>>>> 						rcov=~us(trait):units,
>>>>>>>>>>>>>>>>> 						random=~us(trait):idParents,
>>>>>>>>>>>>>>>>> 						family="zapoisson",
>>>>>>>>>>>>>>>>> 						prior = prior.m5d.2,
>>>>>>>>>>>>>>>>> 						data=rpqa.1,
>>>>>>>>>>>>>>>>> 						pr = F, saveX = F, saveZ = F,
>>>>>>>>>>>>>>>>> 						nitt=nitt,thin=thin,burnin=burnin))
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> library(coda)
>>>>>>>>>>>>>>>>> mcmclist =  
>>>>>>>>>>>>>>>>> mcmc.list(lapply(Children_mcmc1,FUN=function(x) {  
>>>>>>>>>>>>>>>>> x$Sol}))
>>>>>>>>>>>>>>>>> save(Children_mcmc1,mcmclist, file =  
>>>>>>>>>>>>>>>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_mcmc_kids_za.rdata")
>>>>>>>>>>>>>>>>> closeCluster(cl)
>>>>>>>>>>>>>>>>> mpi.quit()
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 04 Aug 2014, at 20:25, Ruben Arslan  
>>>>>>>>>>>>>>>>> <rubenarslan at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Dear list,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> would someone be willing to share her or his  
>>>>>>>>>>>>>>>>>> efforts in parallelising a MCMCglmm analysis?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I had something viable using harvestr that seemed  
>>>>>>>>>>>>>>>>>> to properly initialise
>>>>>>>>>>>>>>>>>> the starting values from different random number  
>>>>>>>>>>>>>>>>>> streams (which is desirable,
>>>>>>>>>>>>>>>>>> as far as I could find out), but I ended up being  
>>>>>>>>>>>>>>>>>> unable to use harvestr, because
>>>>>>>>>>>>>>>>>> it uses an old version of plyr, where  
>>>>>>>>>>>>>>>>>> parallelisation works only for multicore, not for
>>>>>>>>>>>>>>>>>> MPI.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I pasted my working version, that does not do  
>>>>>>>>>>>>>>>>>> anything about starting values or RNG
>>>>>>>>>>>>>>>>>> at the end of this email. I can try to fumble  
>>>>>>>>>>>>>>>>>> further in the dark or try to update harvestr,
>>>>>>>>>>>>>>>>>> but maybe someone has gone through all this already.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd also appreciate any tips for elegantly  
>>>>>>>>>>>>>>>>>> post-processing such parallel data, as some of my  
>>>>>>>>>>>>>>>>>> usual
>>>>>>>>>>>>>>>>>> extraction functions and routines are hampered by  
>>>>>>>>>>>>>>>>>> the fact that some coda functions
>>>>>>>>>>>>>>>>>> do not aggregate results over chains. (What I get  
>>>>>>>>>>>>>>>>>> from a single-chain summary in MCMCglmm
>>>>>>>>>>>>>>>>>> is a bit more comprehensive, than what I managed to  
>>>>>>>>>>>>>>>>>> cobble together with my own extraction
>>>>>>>>>>>>>>>>>> functions).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The reason I'm parallelising my analyses is that  
>>>>>>>>>>>>>>>>>> I'm having trouble getting a good effective
>>>>>>>>>>>>>>>>>> sample size for any parameter having to do with the  
>>>>>>>>>>>>>>>>>> many zeroes in my data.
>>>>>>>>>>>>>>>>>> Any pointers are very appreciated, I'm quite  
>>>>>>>>>>>>>>>>>> inexperienced with MCMCglmm.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ruben
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # bsub -q mpi-short -W 2:00 -n 42 -R np20 mpirun -H  
>>>>>>>>>>>>>>>>>> localhost -n 41 R --slave -f  
>>>>>>>>>>>>>>>>>> "rpqa/rpqa_main/rpqa_children_parallel.r"
>>>>>>>>>>>>>>>>>> library(doMPI)
>>>>>>>>>>>>>>>>>> cl <- startMPIcluster()
>>>>>>>>>>>>>>>>>> registerDoMPI(cl)
>>>>>>>>>>>>>>>>>> Children_mcmc1 = foreach(i=1:40) %dopar% {
>>>>>>>>>>>>>>>>>> 	library(MCMCglmm)
>>>>>>>>>>>>>>>>>> 	load("rpqa1.rdata")
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 	nitt = 40000; thin = 100; burnin = 10000
>>>>>>>>>>>>>>>>>> 	prior = list(
>>>>>>>>>>>>>>>>>> 		R = list(V = diag(c(1,1)), nu = 0.002),
>>>>>>>>>>>>>>>>>> 		G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>>>>>>>>>>>>>>>>> 	)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 	MCMCglmm( children ~ trait -1 +  
>>>>>>>>>>>>>>>>>> at.level(trait,1):male + at.level(trait,1):urban +  
>>>>>>>>>>>>>>>>>> at.level(trait,1):spouses +  
>>>>>>>>>>>>>>>>>> at.level(trait,1):paternalage.mean +  
>>>>>>>>>>>>>>>>>> at.level(trait,1):paternalage.factor,
>>>>>>>>>>>>>>>>>> 		rcov=~us(trait):units,
>>>>>>>>>>>>>>>>>> 		random=~us(trait):idParents,
>>>>>>>>>>>>>>>>>> 		family="zapoisson",
>>>>>>>>>>>>>>>>>> 		prior = prior,
>>>>>>>>>>>>>>>>>> 		data=rpqa.1,
>>>>>>>>>>>>>>>>>> 		pr = F, saveX = T, saveZ = T,
>>>>>>>>>>>>>>>>>> 		nitt=nitt,thin=thin,burnin=burnin)
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> library(coda)
>>>>>>>>>>>>>>>>>> mcmclist =  
>>>>>>>>>>>>>>>>>> mcmc.list(lapply(Children_mcmc1,FUN=function(x) {  
>>>>>>>>>>>>>>>>>> x$Sol}))
>>>>>>>>>>>>>>>>>> save(Children_mcmc1,mcmclist, file =  
>>>>>>>>>>>>>>>>>> "rpqa_mcmc_kids_za.rdata")
>>>>>>>>>>>>>>>>>> closeCluster(cl)
>>>>>>>>>>>>>>>>>> mpi.quit()
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Ruben C. Arslan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Georg August University G�ttingen
>>>>>>>>>>>>>>>>>> Biological Personality Psychology and Psychological  
>>>>>>>>>>>>>>>>>> Assessment
>>>>>>>>>>>>>>>>>> Georg Elias M�ller Institute of Psychology
>>>>>>>>>>>>>>>>>> Go�lerstr. 14
>>>>>>>>>>>>>>>>>> 37073 G�ttingen
>>>>>>>>>>>>>>>>>> Germany
>>>>>>>>>>>>>>>>>> Tel.: +49 551 3920704
>>>>>>>>>>>>>>>>>> https://psych.uni-goettingen.de/en/biopers/team/arslan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 	[[alternative HTML version deleted]]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> The University of Edinburgh is a charitable body,  
>>>>>>>>>>>>>>>> registered in
>>>>>>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>> Scotland, with registration number SC005336.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the R-sig-mixed-models mailing list