[R-sig-ME] strong effect of prior on residual variance

Thu May 5 15:08:39 CEST 2016

Hi Jarrod,
Thanks for the speedy response,
There is just one measure of T per cohort. T is sexual dimorphism in a trait, so its a cohort level value - individuals can't be sexually dimorphic: it was calculated using mcmcglmm to estimate cohort-specific posterior distributions for both male and female trait values, then randomising the order of the distributions and dividing them (Male[sample_i]/Female[sample_i]) where i is 1:1000. I then took the mean of the posterior for each cohort as my response variable. It's much the same as one could do to derive a posterior distribution of heritability when given posterior distributions of additive and phenotypic variation (also as stated on page 5 of the course notes). Perhaps it would be more appropriate to use the entire posterior distributions of SD for all 90 cohort rather than the just means (so there would be 1000 samples of SD per cohort)?
B is a numerical variable, if expressed as a factor it has 65 levels (not all years have a measure of B). Should this be converted to a factor?
Cheers,Rob
> Subject: Re: [R-sig-ME] strong effect of prior on residual variance
> To: robgriffin247 at hotmail.com; r-sig-mixed-models at r-project.org
> From: j.hadfield at ed.ac.uk
> Date: Thu, 5 May 2016 13:47:37 +0100
> 
> Hi Rob,
> 
> 1) how many observations are there per cohort on average?
> 2) how many levels does B have in it?
> 3) nu=1 is typical in a parameter expanded prior, as this is flat for 
> the standard deviation.
> 
> Cheers,
> 
> Jarrod
> 
> 
> 
> On 05/05/2016 13:34, Rob Griffin wrote:
> >
> >
> > Dear list members,
> > I'm using MCMCglmm to model variance among ~90 cohorts as a result of an environmental factor ("B" - numerical). "T" is the response variable, which is formed as a ratio, with mean ~1 & is normally distributed. The cohorts come from two different populations, where each cohort is defined by the place and year of birth (e.g. one cohort is all individuals born in one area, A1, in 2014), such that there is one value of T per cohort. B is measured on a larger level, so there is one score of B per year, regardless of population. I include Area as a fixed effect (factor with two levels) because in some years only one area is measured so it may induce sampling bias of the environmental effect (e.g. one year where only one area is measured has an extreme B score). From a biological perspective I expect B to have a small impact on the among-cohort variance in T so I've used parameter expanded priors for the random effect and inverse-wishart for the residual (as suggested to a previou!
> >   s thread I started - on priors for small variance components - last year: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q1/023370.html).
> > When I used similar set up to my previous model (see code below) I find that, contrary to my expectation, B has a relatively large variance estimate compared to the residual (both mean and median of the posterior for B is 10x higher than residual). It seems unlikely that 90% of the variance in T is explained by B.
> > This prompted me to fiddle with the prior specification to make sure nothing was wrong... Settings used were nu = 0.002 or 2, & V = 1 or 10 (for the priors in R with all four combinations of V and nu tested), also producing 4 independent chains with 100k iterations, 25k burnin, and thinning interval of 50 for each; autocorrelation is low (<0.1 between successive samples), convergence appears good for both B and Residual. The estimate of variance (both the median and mean of the posterior) is similar across the four chains, within each combination of nu and V, for both B and Residual.
> > In the previous thread it is pointed out that "Usually the data overwhelm the prior for the residual variance so you can probably be pretty relaxed about that." I find that estimates of B and residual (units) are generally insensitive to changes in any of the parameters in G, but highly sensitive to changes in both the belief and variance inputs for R (increases in nu and V both increase the estimate of residual, while also absorbing variance from the random effect B). Given the statement that data usually overwhelm the prior, should I be concerned that the prior is strongly affecting the estimate of residual (and random effects) variance in this case? How should this be interpreted and dealt with? The consistency across independent chains suggests to me that the model is able to estimate B and residual variances well, but is drawing too much information from the prior rather than the data, and therefore I'm using the wrong prior.
> > Thanks,Rob
> > ####################prior1 = list(G = list(	 G1 = list(V = 1, nu=0.001, alpha.mu=0, alpha.V=1000)					),				        R  = list(V = 1, nu=0.001))	M1A = MCMCglmm(	T ~ 1 + Area	,random = ~ B 	,data = DF3	,nitt = nitt	,burnin = burnin	,thin = thin	,family = "gaussian"	,prior = prior1	)
> >
> >
> >   		 	   		
> > 	[[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
 		 	   		  
	[[alternative HTML version deleted]]