[R-sig-ME] Interaction variance from mcmcglmm

Tue Nov 3 16:01:32 CET 2015

Dear list members,

I recently published work using mcmcglmm to test for Y linked genetic
variance, finding variance within a population. This data was collected by
measuring phenotypic data from randomly sampled Y chromosomes from within a
population, tested in a single genetic background, using lifespan as the
response, and Y chromosome as a random effect.

I have been trying to think of how I would improve such a study and
concluded that a more complete study would be done using
- Y chromosomes from multiple source populations (to test for among
population variance)
- multiple genetic backgrounds (to distinguish between [direct] genetic
effects, and epistatic effects)...

This got me thinking about the analysis of such data. I would need to
estimate how much of the Y-linked variance within and among populations is
genetic and epistatic. This has lead me to two main questions where I could
use some guidance:

1) can mcmcglmm be used to estimate the interaction (epistatic) variance?
- I think the answer is yes but I have as yet been unable to find clear
advice on this, how can I add interaction variance term in to mcmcglmm?
- I can add the variance explained by the genetic background by adding this
as a random effect too, but (how) can I find the interaction between two
random explanatory variables?

2) could I do it all from a single model or would I need multiple models?
- If I built a model where the trait (Lifespan) is the response variable,
with random effects of Y chromosome, Y chromosome source population,
genetic background, and the interactions between Y/Y-source and genetic
background, would I be able to get estimates of the direct genetic and
epistatic variances within and among the source populations?
- I could build two models, one with Y chromosome and genetic background as
random effects (including the interaction), and the other with Y-chromosome
source population and genetic background as random effects (including the
interaction). These two models would separately estimate the within and
among population variances. I think this would be the correct approach.
However, I'm unconvinced that the latter would give the desired division in
to direct and epistatic Y-linked variation.

Thanks,
Rob

*Please kindly note*, advice on prior specification when estimating
interaction effects would be also very highly appreciated, remembering that
the variance components are likely very small. I have included a
demonstration script which makes some dummy data, and shows some of the
proposed models. The script includes a filepath which you can edit if you
wish to save and load chains, and switches at line 56 to turn the chains
on/off when running scripts.

##############
## R SCRIPT ##
##############

# Y-background interaction dummy data
rm(list=ls())
library("lme4")
library("MCMCglmm")
set.seed(24)

# Set filepath (CUSTOMISE THIS IF YOU WISH TO BE ABLE TO SAVE AND LOAD
CHAINS)
filepath = paste0("C:\\Users\\Rob\\Notes\\Y_epistasis\\")

#### Description of dummy data ####
# 20 Y chromosomes randomly sampled from 3 populations (A1:A20, B1:B20,
C1:C20, n = 60)
# Test in genetic backgrounds (BG) from three populations (Canton [CN],
Zimbabwe [ZW], and Dahomey [DH])
# Measure a trait (Lifepsan) in 90 individuals per combination of Y & BG
# Measured across three replicate blocks (BL1, BL2, BL3), with two vials
each per block

n1 = 20 # Number of Y's sampled per source population
n2 = 30 # Number of measurements per block/line
n3 = 3  # Number of blocks (BL1, BL2, BL3)
n4 = 3 # Number of genetic backgrounds (CN, ZW, DH)
n5 = 3  # Number of Y source populations (A, B, C)
n6 = 2  # Number of vials per block/Y/BG

# Construction of data
dummy = data.frame(
as.factor(rep(c(paste0("A",1:n1), paste0("B",1:n1), paste0("C",1:n1)), each
= n2*n3, times = n4)),
as.factor(rep(c("A","B","C"), each = n1*n2*n3, times = n5)),
as.factor(rep(c("CN","ZW","DH"), each = n1*n2*n3*n4, times = 1)),
as.factor(rep(c("BL1","BL2","BL3"), each = n2)),
as.factor(rep(c("A","B"), each = n2 / n6)),
as.numeric(abs(round(rnorm(n1*n2*n3*n4*n5, 50, 15),0)))
)
colnames(dummy) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age")
dummy$Vial = paste(dummy$Y, dummy$Y.Source, dummy$BG, dummy$Block,
dummy$Vial, sep = "_")
head(dummy)

# Mean zero unit variance for Age (response variable)
dummy$Score = (dummy$Age - mean(dummy$Age)) / sd(dummy$Age)

# Subset DFs (Only one source population and one genetic background, as in
previous JEB paper)
dummy_A = data.frame(split(dummy , f = dummy$Y.Source)[1])
dummy_A_CN = data.frame(split(dummy_A , f = dummy_A$A.BG)[1])
colnames(dummy_A) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age",
"Score")
colnames(dummy_A_CN) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age",
"Score")

# MCMC Priors
prior1 = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
R  = list(V = 1, nu=0.002))

prior2 = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G3 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
R  = list(V = 1, nu=0.002))

# MCMC Switches
testing = "y"
runChain_A_CN1 = "y"
runChain_A1 = "y"
runChain_1 = "y"

# MCMC Details
if(testing == "y"){
nitt = 10000
burn = 1000
thin = 10
}else{
nitt = 100000
burn = 10000
thin = 100
}

# MCMC Chain (as in JEB Paper: One Y source population, One genetic
background)
# Within-population Y-linked variation
if(runChain_A_CN1=="y"){
Chain_A_CN1 = MCMCglmm(Score ~1 + Block,
random = ~Y + Vial,
rcov = ~units,
nitt = nitt,
thin = thin,
burnin = burn,
prior = prior1,
family = "gaussian",
start = list(QUASI = FALSE),
data = dummy_A_CN)
file = paste(filepath, "Chain_A_CN1.rda", sep = "")
save(Chain_A_CN1, file = file)
}else{}

##############################
#### Proposed MCMC Chains ####
##############################

# Within-population Y-linked variation split in to direct and epistatic
if(runChain_A1=="y"){
Chain_A1 = MCMCglmm(Score ~1 + Block,
random = ~Y + BG + Vial,
rcov = ~units,
nitt = nitt,
thin = thin,
burnin = burn,
prior = prior2,
family = "gaussian",
start = list(QUASI = FALSE),
data = dummy_A)
file = paste(filepath, "Chain_A1.rda", sep = "")
save(Chain_A1, file = file)
}else{}

# Among-population Y-linked variation split in to direct and epistatic
if(runChain_1=="y"){
Chain_1 = MCMCglmm(Score ~1 + Block,
random = ~Y.Source + BG + Vial,
rcov = ~units,
nitt = nitt,
thin = thin,
burnin = burn,
prior = prior2,
family = "gaussian",
start = list(QUASI = FALSE),
data = dummy)
file = paste(filepath, "Chain_1.rda", sep = "")
save(Chain_1, file = file)
}else{}

Chain.A_CN1 = load(paste(filepath, "Chain_A_CN1.rda", sep= ""))
Chain.A1 = load(paste(filepath, "Chain_A1.rda", sep= ""))
Chain.1 = load(paste(filepath, "Chain_1.rda", sep= ""))

	[[alternative HTML version deleted]]