[R-sig-ME] Interaction variance from mcmcglmm

Robert Griffin robgriffin247 at hotmail.com
Fri Nov 13 10:48:14 CET 2015


Thanks Paul,

I have tried with the suggested expanded model, using simulated data where
I have added Y-source * Background interactions. See below. Is it correct
then to say that the variance estimate of "BG:Y.Source" is variance in
lifespan explained by the interaction between the genetic background
(autosomes and X source) and the source of the Y? Then the phenotypic
variance would be the sum of all components (2.910), and the interaction
BG*Y.Source would explain 100*(0.902/2.910) = 30.98% of the variance in
lifespan?

> colMeans(NestChain_Mod2$VCV)
           BG      Y.Source    Y.Source:Y   BG:Y.Source BG:Y.Source:Y
   Vial         units
 1.0029972054  0.4037536287  0.0008707007  0.9017016510  0.0002926541
 0.0013600115  0.5993455117


I also have a follow up on the nesting, I may need some explanation of the
syntax in the model you proposed (what are : and / doing?), but I read that
as having Y nested within Y.Source, which is then nested within the
background (BG). I understand that Y should be nested within Y.Source
(because each Y tested can only be from one source population), however,
given that all Y chromosomes are tested in all backgrounds (autosome and X
source), wouldn't it be inappropriate to nest within BG? i.e. Y's A1-A34
are all tested in CN, DH, and ZW, Y's B1-B34 are also all tested in CN, DH,
and ZW...

Thanks,
Rob

Here's the script I used:

############
prior3 = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G3 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G4 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G5 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G6 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
R  = list(V = 1, nu=0.002))

# Y source * BG variance
dummy$Age2 = ifelse(dummy$Y.Source=="A"&dummy$BG=="CN", dummy$Age+15,
dummy$Age)
dummy$Age2 = ifelse(dummy$Y.Source=="B"&dummy$BG=="CN", abs(dummy$Age-15),
dummy$Age2)
dummy$Age2 = ifelse(dummy$Y.Source=="C"&dummy$BG=="DH", dummy$Age+15,
dummy$Age2)
dummy$Age2 = ifelse(dummy$Y.Source=="C"&dummy$BG=="ZW", abs(dummy$Age-15),
dummy$Age2)
dummy$Age2 = ifelse(dummy$Y.Source=="A"&dummy$BG=="DH", abs(dummy$Age-15),
dummy$Age2)
dummy$Age2 = ifelse(dummy$Y.Source=="B"&dummy$BG=="ZW", dummy$Age+15,
dummy$Age2)

boxplot(dummy$Age2 ~ dummy$Y.Source*dummy$BG)
dummy$Score2 = (dummy$Age2 - mean(dummy$Age2)) / sd(dummy$Age2)

# Chain
NestChain_Mod2 = MCMCglmm(Score2 ~1 + Block,
random = ~BG + Y.Source + Y.Source:Y + BG:Y.Source + BG:Y.Source:Y + Vial,
rcov = ~units,
nitt = 5000,
thin = 5,
burnin = 1000,
prior = prior3,
family = "gaussian",
start = list(QUASI = FALSE),
data = dummy)
file = paste(filepath, "NestChain_mod2.rda", sep = "")
save(NestChain_Mod2, file = file)

NestChain_mod2 = load(paste(filepath, "NestChain_mod2.rda", sep= ""))

# Variance components
colMeans(NestChain_Mod2$VCV)

colMeans(NestChain_Mod2$VCV)[1]/sum(colMeans(NestChain_Mod2$VCV))*100
colMeans(NestChain_Mod2$VCV)[2]/sum(colMeans(NestChain_Mod2$VCV))*100
colMeans(NestChain_Mod2$VCV)[3]/sum(colMeans(NestChain_Mod2$VCV))*100
colMeans(NestChain_Mod2$VCV)[4]/sum(colMeans(NestChain_Mod2$VCV))*100
colMeans(NestChain_Mod2$VCV)[5]/sum(colMeans(NestChain_Mod2$VCV))*100
colMeans(NestChain_Mod2$VCV)[6]/sum(colMeans(NestChain_Mod2$VCV))*100
colMeans(NestChain_Mod2$VCV)[7]/sum(colMeans(NestChain_Mod2$VCV))*100


---------------------------




---
Robert Griffin
PhD
griffinevo.wordpress.com

On Tue, Nov 3, 2015 at 6:29 PM, paul debes <paul.debes at utu.fi> wrote:

> Dear Rob,
>
> Unfortunately, I cannot help with the MCMCglmm prior specifications and
> hope someone else will do this, but maybe with the model.
>
> I think that fitting two separate models might lead to different results
> than one common model when you have a significant but ignored covariance
> between effects across populations when estimating only the within
> population variance.
>
> As the Y effects are nested within Y.Source effects (they are, right?),
> one initial full model to estimate within and across population variance,
> and including the interaction with the genetic background effects, could be:
>
> fixed = Score ~ 1 + Block,
> random = ~ BG + Y.Source/Y + BG:Y.Source/Y + Vial,
>
> expanded:
> random = ~ BG + Y.Source + Y.Source:Y + BG:Y.Source + BG:Y.Source:Y + Vial,
>
>
> Best,
> Paul
>
>
>
> On Tue, 03 Nov 2015 17:01:32 +0200, Robert Griffin <
> robgriffin247 at hotmail.com> wrote:
>
> Dear list members,
>>
>> I recently published work using mcmcglmm to test for Y linked genetic
>> variance, finding variance within a population. This data was collected by
>> measuring phenotypic data from randomly sampled Y chromosomes from within
>> a
>> population, tested in a single genetic background, using lifespan as the
>> response, and Y chromosome as a random effect.
>>
>> I have been trying to think of how I would improve such a study and
>> concluded that a more complete study would be done using
>> - Y chromosomes from multiple source populations (to test for among
>> population variance)
>> - multiple genetic backgrounds (to distinguish between [direct] genetic
>> effects, and epistatic effects)...
>>
>> This got me thinking about the analysis of such data. I would need to
>> estimate how much of the Y-linked variance within and among populations is
>> genetic and epistatic. This has lead me to two main questions where I
>> could
>> use some guidance:
>>
>> 1) can mcmcglmm be used to estimate the interaction (epistatic) variance?
>> - I think the answer is yes but I have as yet been unable to find clear
>> advice on this, how can I add interaction variance term in to mcmcglmm?
>> - I can add the variance explained by the genetic background by adding
>> this
>> as a random effect too, but (how) can I find the interaction between two
>> random explanatory variables?
>>
>> 2) could I do it all from a single model or would I need multiple models?
>> - If I built a model where the trait (Lifespan) is the response variable,
>> with random effects of Y chromosome, Y chromosome source population,
>> genetic background, and the interactions between Y/Y-source and genetic
>> background, would I be able to get estimates of the direct genetic and
>> epistatic variances within and among the source populations?
>> - I could build two models, one with Y chromosome and genetic background
>> as
>> random effects (including the interaction), and the other with
>> Y-chromosome
>> source population and genetic background as random effects (including the
>> interaction). These two models would separately estimate the within and
>> among population variances. I think this would be the correct approach.
>> However, I'm unconvinced that the latter would give the desired division
>> in
>> to direct and epistatic Y-linked variation.
>>
>> Thanks,
>> Rob
>>
>> *Please kindly note*, advice on prior specification when estimating
>>
>> interaction effects would be also very highly appreciated, remembering
>> that
>> the variance components are likely very small. I have included a
>> demonstration script which makes some dummy data, and shows some of the
>> proposed models. The script includes a filepath which you can edit if you
>> wish to save and load chains, and switches at line 56 to turn the chains
>> on/off when running scripts.
>>
>> ##############
>> ## R SCRIPT ##
>> ##############
>>
>> # Y-background interaction dummy data
>> rm(list=ls())
>> library("lme4")
>> library("MCMCglmm")
>> set.seed(24)
>>
>> # Set filepath (CUSTOMISE THIS IF YOU WISH TO BE ABLE TO SAVE AND LOAD
>> CHAINS)
>> filepath = paste0("C:\\Users\\Rob\\Notes\\Y_epistasis\\")
>>
>> #### Description of dummy data ####
>> # 20 Y chromosomes randomly sampled from 3 populations (A1:A20, B1:B20,
>> C1:C20, n = 60)
>> # Test in genetic backgrounds (BG) from three populations (Canton [CN],
>> Zimbabwe [ZW], and Dahomey [DH])
>> # Measure a trait (Lifepsan) in 90 individuals per combination of Y & BG
>> # Measured across three replicate blocks (BL1, BL2, BL3), with two vials
>> each per block
>>
>> n1 = 20 # Number of Y's sampled per source population
>> n2 = 30 # Number of measurements per block/line
>> n3 = 3  # Number of blocks (BL1, BL2, BL3)
>> n4 = 3 # Number of genetic backgrounds (CN, ZW, DH)
>> n5 = 3  # Number of Y source populations (A, B, C)
>> n6 = 2  # Number of vials per block/Y/BG
>>
>> # Construction of data
>> dummy = data.frame(
>> as.factor(rep(c(paste0("A",1:n1), paste0("B",1:n1), paste0("C",1:n1)),
>> each
>> = n2*n3, times = n4)),
>> as.factor(rep(c("A","B","C"), each = n1*n2*n3, times = n5)),
>> as.factor(rep(c("CN","ZW","DH"), each = n1*n2*n3*n4, times = 1)),
>> as.factor(rep(c("BL1","BL2","BL3"), each = n2)),
>> as.factor(rep(c("A","B"), each = n2 / n6)),
>> as.numeric(abs(round(rnorm(n1*n2*n3*n4*n5, 50, 15),0)))
>> )
>> colnames(dummy) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age")
>> dummy$Vial = paste(dummy$Y, dummy$Y.Source, dummy$BG, dummy$Block,
>> dummy$Vial, sep = "_")
>> head(dummy)
>>
>> # Mean zero unit variance for Age (response variable)
>> dummy$Score = (dummy$Age - mean(dummy$Age)) / sd(dummy$Age)
>>
>> # Subset DFs (Only one source population and one genetic background, as in
>> previous JEB paper)
>> dummy_A = data.frame(split(dummy , f = dummy$Y.Source)[1])
>> dummy_A_CN = data.frame(split(dummy_A , f = dummy_A$A.BG)[1])
>> colnames(dummy_A) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age",
>> "Score")
>> colnames(dummy_A_CN) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age",
>> "Score")
>>
>> # MCMC Priors
>> prior1 = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
>> G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
>> R  = list(V = 1, nu=0.002))
>>
>> prior2 = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
>> G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
>> G3 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
>> R  = list(V = 1, nu=0.002))
>>
>> # MCMC Switches
>> testing = "y"
>> runChain_A_CN1 = "y"
>> runChain_A1 = "y"
>> runChain_1 = "y"
>>
>> # MCMC Details
>> if(testing == "y"){
>> nitt = 10000
>> burn = 1000
>> thin = 10
>> }else{
>> nitt = 100000
>> burn = 10000
>> thin = 100
>> }
>>
>> # MCMC Chain (as in JEB Paper: One Y source population, One genetic
>> background)
>> # Within-population Y-linked variation
>> if(runChain_A_CN1=="y"){
>> Chain_A_CN1 = MCMCglmm(Score ~1 + Block,
>> random = ~Y + Vial,
>> rcov = ~units,
>> nitt = nitt,
>> thin = thin,
>> burnin = burn,
>> prior = prior1,
>> family = "gaussian",
>> start = list(QUASI = FALSE),
>> data = dummy_A_CN)
>> file = paste(filepath, "Chain_A_CN1.rda", sep = "")
>> save(Chain_A_CN1, file = file)
>> }else{}
>>
>> ##############################
>> #### Proposed MCMC Chains ####
>> ##############################
>>
>> # Within-population Y-linked variation split in to direct and epistatic
>> if(runChain_A1=="y"){
>> Chain_A1 = MCMCglmm(Score ~1 + Block,
>> random = ~Y + BG + Vial,
>> rcov = ~units,
>> nitt = nitt,
>> thin = thin,
>> burnin = burn,
>> prior = prior2,
>> family = "gaussian",
>> start = list(QUASI = FALSE),
>> data = dummy_A)
>> file = paste(filepath, "Chain_A1.rda", sep = "")
>> save(Chain_A1, file = file)
>> }else{}
>>
>> # Among-population Y-linked variation split in to direct and epistatic
>> if(runChain_1=="y"){
>> Chain_1 = MCMCglmm(Score ~1 + Block,
>> random = ~Y.Source + BG + Vial,
>> rcov = ~units,
>> nitt = nitt,
>> thin = thin,
>> burnin = burn,
>> prior = prior2,
>> family = "gaussian",
>> start = list(QUASI = FALSE),
>> data = dummy)
>> file = paste(filepath, "Chain_1.rda", sep = "")
>> save(Chain_1, file = file)
>> }else{}
>>
>> Chain.A_CN1 = load(paste(filepath, "Chain_A_CN1.rda", sep= ""))
>> Chain.A1 = load(paste(filepath, "Chain_A1.rda", sep= ""))
>> Chain.1 = load(paste(filepath, "Chain_1.rda", sep= ""))
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
> --
> Paul Debes
> DFG Research Fellow
> University of Turku
> Department of Biology
> Itäinen Pitkäkatu 4
> 20520 Turku
> Finland
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list