[R-sig-ME] strong effect of prior on residual variance

Thu May 5 14:34:24 CEST 2016

Dear list members,
I'm using MCMCglmm to model variance among ~90 cohorts as a result of an environmental factor ("B" - numerical). "T" is the response variable, which is formed as a ratio, with mean ~1 & is normally distributed. The cohorts come from two different populations, where each cohort is defined by the place and year of birth (e.g. one cohort is all individuals born in one area, A1, in 2014), such that there is one value of T per cohort. B is measured on a larger level, so there is one score of B per year, regardless of population. I include Area as a fixed effect (factor with two levels) because in some years only one area is measured so it may induce sampling bias of the environmental effect (e.g. one year where only one area is measured has an extreme B score). From a biological perspective I expect B to have a small impact on the among-cohort variance in T so I've used parameter expanded priors for the random effect and inverse-wishart for the residual (as suggested to a previous thread I started - on priors for small variance components - last year: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q1/023370.html).
When I used similar set up to my previous model (see code below) I find that, contrary to my expectation, B has a relatively large variance estimate compared to the residual (both mean and median of the posterior for B is 10x higher than residual). It seems unlikely that 90% of the variance in T is explained by B.
This prompted me to fiddle with the prior specification to make sure nothing was wrong... Settings used were nu = 0.002 or 2, & V = 1 or 10 (for the priors in R with all four combinations of V and nu tested), also producing 4 independent chains with 100k iterations, 25k burnin, and thinning interval of 50 for each; autocorrelation is low (<0.1 between successive samples), convergence appears good for both B and Residual. The estimate of variance (both the median and mean of the posterior) is similar across the four chains, within each combination of nu and V, for both B and Residual.
In the previous thread it is pointed out that "Usually the data overwhelm the prior for the residual variance so you can probably be pretty relaxed about that." I find that estimates of B and residual (units) are generally insensitive to changes in any of the parameters in G, but highly sensitive to changes in both the belief and variance inputs for R (increases in nu and V both increase the estimate of residual, while also absorbing variance from the random effect B). Given the statement that data usually overwhelm the prior, should I be concerned that the prior is strongly affecting the estimate of residual (and random effects) variance in this case? How should this be interpreted and dealt with? The consistency across independent chains suggests to me that the model is able to estimate B and residual variances well, but is drawing too much information from the prior rather than the data, and therefore I'm using the wrong prior.
Thanks,Rob
####################prior1 = list(G = list(	 G1 = list(V = 1, nu=0.001, alpha.mu=0, alpha.V=1000)					),				        R  = list(V = 1, nu=0.001))	M1A = MCMCglmm(	T ~ 1 + Area	,random = ~ B 	,data = DF3	,nitt = nitt	,burnin = burnin	,thin = thin	,family = "gaussian"	,prior = prior1	)

	[[alternative HTML version deleted]]