[R-sig-ME] estimates of low variance in mcmcglmm (cont.)
Robert Griffin
robgriffin247 at hotmail.com
Wed May 6 10:22:34 CEST 2015
I am resending the below email because it appears it did not reach the list
last time, sent a few weeks back, now also edited):
It follows up on this thread (
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q1/023370.html)
-------------
Hi Jarrod - thanks for the tip,
What is concerning me is that I have tried this out with dummy data now, to
test whether a variance signal from line is detected when line variance
does not exist. From that I get an estimate of tiny line variance but the
quantiles are all positive (suggesting, to me, that it is significantly
larger than zero, though very close).
2.5% 25% 50% 75% 97.5%
line_variance 7.751e-07 8.486e-05 3.801e-04 1.039e-03 3.396e-03
Examining the plots of the chain VCV, however, shows that the density plot
for line is pushed up against zero, which leads me to believe that the
variance is *not *different from zero.
I am trying to test whether or not there is line variance - but given that
negative estimates of variance cannot be estimated, the quantiles will
never overlap zero. Would it make sense to randomly reassign the line
labels among individuals in my real data and then compare the density plots
of the posterior distributions for the real data and randomised data - then
if the plot for the real data does not stack up against zero, and the
randomised does, it appears that the line variance signal is genuine? Is
there a better way to test (and report) this?
Rob
Here is the dummy script (*run1 *and *run2 *need to be set as *yes* and *path
*needs setting to relevant file directory if you are repeating the
analysis) and the output is just below.
require("MCMCglmm")
set.seed(62346)
# Run chains
run1 = "no"
run2 = "no"
# File path
path = "C://Users//"
# Construct data
n = 50000 # number of samples
l = 33 # number of lines
b = 4 # number of blocks
v = 8 # number of vials
df = data.frame(
round(rnorm(n, 50, 3),0), # Response variable
sample(1:l, n, replace = T), # Assign lines
sample(1:b, n, replace = T), # Assign block
sample(1:v, n, replace = T) # Assign vial
)
colnames(df) = c("score", "line", "block", "vial")
# Check even distributions
hist(df$score, breaks = 25)
hist(df$line, breaks = 25)
hist(df$block, breaks = 25)
hist(df$vial, breaks = 25)
# Visual approximation for line effect
boxplot(df$score ~ df$line)
# Chain parameters
nitt = 20000
burnin = 2000
thin = 10
# Prior
prior = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
R = list(V = 1, nu=0.002))
# MCMC chains
if(run1 == "yes"){
chain100 = MCMCglmm(score ~1 + block,
random = ~line + vial,
rcov = ~units,
nitt = nitt,
thin = thin,
burnin = burnin,
prior = prior,
family = "gaussian",
start = list(QUASI = FALSE),
data = df)
file = paste(path, "chain100.rda", sep = "")
save(chain100, file = file)
}else{}
if(run2 == "yes"){
chain101 = MCMCglmm(score ~1 + block,
random = ~line + vial,
rcov = ~units,
nitt = nitt,
thin = thin,
burnin = burnin,
prior = prior,
family = "gaussian",
start = list(QUASI = FALSE),
data = df)
file = paste(path, "chain101.rda", sep = "")
save(chain101, file = file)
}else{}
# Load chains
chain_100 = load(paste(path, "chain100.rda", sep= ""))
chain_101 = load(paste(path, "chain101.rda", sep= ""))
# Chain function
chain_fun1 = function(chainX, chainSol){
chainX = mcmc(data =cbind(
# Mean score
mean_score = chainSol[,"(Intercept)"],
# Line variance
line_variance = chainX[,"line"],
# Vial variance
vial_variance = chainX[,"vial"],
# Residual variance
unit_variance = chainX[,"units"],
# Phenotypic variance
phen_variance = chainX[,"line"] + chainX[,"vial"] + chainX[,"units"]
),
start = attr(chainX, "mcpar")[1],
end = attr(chainX, "mcpar")[2],
thin = attr(chainX, "mcpar")[3])
chainX = mcmc(data =cbind( chainX,
# Heritability
heritability = chainX[,"line_variance"]/chainX[,"phen_variance"],
# Coefficient of Genetic Variance
gene.coeff = sqrt(chainX[,"line_variance"])/chainX[,"mean_score"],
# Coefficient of VP
phen.coeff = sqrt(chainX[,"phen_variance"])/chainX[,"mean_score"]
),
start = attr(chainX, "mcpar")[1],
end = attr(chainX, "mcpar")[2],
thin = attr(chainX, "mcpar")[3])
return(chainX)
}
results1 =
mcmc.list(chain_fun1(chain100$VCV,chain100$Sol),chain_fun1(chain101$VCV,chain101$Sol))
summary(results1)
plot(chain100$VCV[,1])
plot(chain101$VCV[,1])
## Output ##
> summary(results1)
Iterations = 2001:19991
Thinning interval = 10
Number of chains = 2
Sample size per chain = 1800
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
mean_score 5.002e+01 0.0344920 5.749e-04 5.748e-04
line_variance 7.412e-04 0.0009462 1.577e-05 1.509e-05
vial_variance 5.333e-04 0.0009973 1.662e-05 1.696e-05
unit_variance 9.006e+00 0.0564273 9.405e-04 9.405e-04
phen_variance 9.008e+00 0.0564594 9.410e-04 9.410e-04
heritability 8.228e-05 0.0001050 1.750e-06 1.674e-06
gene.coeff 4.440e-04 0.0003149 5.248e-06 5.040e-06
phen.coeff 6.000e-02 0.0001936 3.226e-06 3.226e-06
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
mean_score 4.995e+01 5.000e+01 5.002e+01 5.005e+01 5.009e+01
line_variance 7.751e-07 8.486e-05 3.801e-04 1.039e-03 3.396e-03
vial_variance 3.095e-07 3.938e-05 1.884e-04 6.070e-04 3.010e-03
unit_variance 8.895e+00 8.969e+00 9.005e+00 9.046e+00 9.115e+00
phen_variance 8.895e+00 8.970e+00 9.006e+00 9.047e+00 9.116e+00
heritability 8.656e-08 9.481e-06 4.203e-05 1.154e-04 3.755e-04
gene.coeff 1.761e-05 1.840e-04 3.899e-04 6.441e-04 1.164e-03
phen.coeff 5.962e-02 5.987e-02 6.000e-02 6.013e-02 6.037e-02
----------------------
Previous message:
Hi Rob,
Parameter expanded priors are probably a better option:
prior = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G3 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
R = list(V = 1, nu=0.002))
This prior is approximately flat for the standard deviation. Note that
this does not imply it is flat for the variance; it still puts quite a
bit of mass at zero, but less than the inverse-Wishart with low degree
of belief. If you want something flat on the variance scale (e.g. a
uniform distribution) you'll have to move to WinBugs or JAGS.
Parameter expansion is not implemented for the residuals so I've left
the inverse-Wishart prior on the residual variance. Usually the data
overwhelm the prior for the residual variance so you can probably be
pretty relaxed about that.
Cheers,
Jarrod
Quoting Robert Griffin <robgriffin247 at hotmail.com
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>> on Fri, 27
Mar 2015
15:33:15 +0100:
>* Dear list,
*>>* I have sampled ~35 copies of a chromosome from a population because I want
*>* to estimate how much contribution that part of the genome makes to the
*>* variance in the traits. Therefore I want to estimate the additive genetic
*>* variance. I will do this by using making a univariate response model in
*>* MCMCglmm. The data for the trait was collected from 300-500 offspring per
*>* sampled chromosome, and measured in males. This was done across 4
*>* experimental *blocks *and within each *line *and *block *there were 4 *vials
*>* *of many individuals, each sourced from one of two *sets *of parents.
*>>* Within the model there are *block*, parental set (*set*), *vial*, and *line
*>* *effects to model. I have done this in the following way:
*>>* chain1 = MCMCglmm(trait ~1 + block,
*>* random = ~line + set + vial,
*>* rcov = ~units,
*>* nitt = nitt,
*>* thin = thin,
*>* burnin = burnin,
*>* prior = prior,
*>* family = "gaussian",
*>* start = list(QUASI = FALSE),
*>* data = df1)
*>>>* However, the phenotypic variance in this trait is large [var(trait) =
*>* ~150], and I am expecting an extremely large part of the variance to be
*>* environmental & measurement error (residual), and the variables of line,
*>* set, block, and vial to contribute very little (probably <5% of total
*>* variation each) - visual examination of the data suggests that there is
*>* almost no variance among lines, blocks, vials, or parental sets. Which
*>* leads me to my call for help.
*>>* I am mainly concerned about how to choose priors for variances which are
*>* expected to be near zero (when the aim is to test if line variance is not
*>* 0) - can this affect the outcome of the model? How should I define my
*>* priors in such a case? Currently my best estimate from reading the
*>* literature is to use the following:
*>>* prior = list(G = list( G1 = list(V = var(trait)/4, nu=0.002),
*>* G2 = list(V = var(trait)/4, nu=0.002),
*>* G3 = list(V = var(trait)/4, nu=0.002)),
*>* R = list(V = var(trait)/4, nu=0.002))
*>>>* Advice about the priors (and the model in general if you happen spot
*>* anything- e.g. should the family be Gaussian?) would be greatly appreciated,
*>>* Rob
*>>>* -----------------------------
*>* Robert Griffin
*>* PhD candidate, Uppsala University
*>* griffinevo.wordpress.com <http://griffinevo.wordpress.com>
*>>* [[alternative HTML version deleted]]
*>>* _______________________________________________
*>* R-sig-mixed-models at r-project.org
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> mailing
list
*>* https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
*>>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list