[R-sig-ME] Interaction variance from mcmcglmm

Fri Nov 13 14:14:17 CET 2015

Hi Rob,

I'm not entirely certain about the biological interpretation (I don't work  
at the level of the chromosomes) but your interpretation appears right to  
me, although variance estimates are very unreliable given that some are  
based on only three levels (Y.source and BG).

In the case you have considered using many more levels than three for the  
BG factor in your actual experiment, you could also employ a more detailed  
covariance structure and test for differences in BG variance among  
Y.source levels and for different correlations between BG effects among  
Y.source pairs.
For a genetic model it can make sense to test for the first when  
demographic events within populations have been different (e.g., different  
population sizes / past events leading to different genetic diversities),  
and to test for the second when the genetic differentiation differs  
between population pairs (e.g., differences in shared ancestry /  
migration). Of course, both processes relate to each other and population  
size differences can also drive differentiation, etc. I think the present  
model assumes that the first and second are equal from a genetic  
perspective, but it is the easiest model to obtain the BG variance  
estimates within and among Y.source. I'm not sure, though, if a more  
complicated model fits with the interaction question you asked.  
Practically, you could simulate the BG effects for the Y.source levels  
drawn from a multivariate normal distribution.
You could then consider (there are several options here, this is one):

random = ~ Y.Source:Y + BG:us(Y.Source) + BG:Y.Source:Y + Vial

As for the other question:
'a/b' is short for 'a + a:b',
'a:b' does not include 'a' effects, in contrast to 'a/b',
and none of them does include b effects (b is nested).

In your case writing 'Y.source/Y' or 'Y.source + Y.source:Y' is even the  
same as 'Y.source + Y' because each Y does only occur in one level of  
Y.source (nesting by design). Writing 'Y.source + Y.source:Y' (or short:  
'Y.source/Y') makes it just visually clearer that 'Y' is within 'Y.source'  
but how you write it should not affect the model estimates (maybe in some  
programmes?). Because 'Y.source/Y' is like writing 'Y' this should solve  
the remaining question about the nesting of 'Y' within 'BG:Y.source' too.  
This means 'BG:Y.source:Y' is in your case the same as 'BG:Y'. Try it out,  
should give the same result.
Sorry if that confused you.

Hope this helps,
Paul

On Fri, 13 Nov 2015 11:48:14 +0200, Robert Griffin  
<robgriffin247 at hotmail.com> wrote:

> Thanks Paul,
>
> I have tried with the suggested expanded model, using simulated data  
> where
> I have added Y-source * Background interactions. See below. Is it correct
> then to say that the variance estimate of "BG:Y.Source" is variance in
> lifespan explained by the interaction between the genetic background
> (autosomes and X source) and the source of the Y? Then the phenotypic
> variance would be the sum of all components (2.910), and the interaction
> BG*Y.Source would explain 100*(0.902/2.910) = 30.98% of the variance in
> lifespan?
>
>> colMeans(NestChain_Mod2$VCV)
>            BG      Y.Source    Y.Source:Y   BG:Y.Source BG:Y.Source:Y
>    Vial         units
>  1.0029972054  0.4037536287  0.0008707007  0.9017016510  0.0002926541
>  0.0013600115  0.5993455117
>
>
> I also have a follow up on the nesting, I may need some explanation of  
> the
> syntax in the model you proposed (what are : and / doing?), but I read  
> that
> as having Y nested within Y.Source, which is then nested within the
> background (BG). I understand that Y should be nested within Y.Source
> (because each Y tested can only be from one source population), however,
> given that all Y chromosomes are tested in all backgrounds (autosome and  
> X
> source), wouldn't it be inappropriate to nest within BG? i.e. Y's A1-A34
> are all tested in CN, DH, and ZW, Y's B1-B34 are also all tested in CN,  
> DH,
> and ZW...
>
> Thanks,
> Rob
>
> Here's the script I used:
>
> ############
> prior3 = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
> G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
> G3 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
> G4 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
> G5 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
> G6 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
> R  = list(V = 1, nu=0.002))
>
> # Y source * BG variance
> dummy$Age2 = ifelse(dummy$Y.Source=="A"&dummy$BG=="CN", dummy$Age+15,
> dummy$Age)
> dummy$Age2 = ifelse(dummy$Y.Source=="B"&dummy$BG=="CN",  
> abs(dummy$Age-15),
> dummy$Age2)
> dummy$Age2 = ifelse(dummy$Y.Source=="C"&dummy$BG=="DH", dummy$Age+15,
> dummy$Age2)
> dummy$Age2 = ifelse(dummy$Y.Source=="C"&dummy$BG=="ZW",  
> abs(dummy$Age-15),
> dummy$Age2)
> dummy$Age2 = ifelse(dummy$Y.Source=="A"&dummy$BG=="DH",  
> abs(dummy$Age-15),
> dummy$Age2)
> dummy$Age2 = ifelse(dummy$Y.Source=="B"&dummy$BG=="ZW", dummy$Age+15,
> dummy$Age2)
>
> boxplot(dummy$Age2 ~ dummy$Y.Source*dummy$BG)
> dummy$Score2 = (dummy$Age2 - mean(dummy$Age2)) / sd(dummy$Age2)
>
> # Chain
> NestChain_Mod2 = MCMCglmm(Score2 ~1 + Block,
> random = ~BG + Y.Source + Y.Source:Y + BG:Y.Source + BG:Y.Source:Y +  
> Vial,
> rcov = ~units,
> nitt = 5000,
> thin = 5,
> burnin = 1000,
> prior = prior3,
> family = "gaussian",
> start = list(QUASI = FALSE),
> data = dummy)
> file = paste(filepath, "NestChain_mod2.rda", sep = "")
> save(NestChain_Mod2, file = file)
>
> NestChain_mod2 = load(paste(filepath, "NestChain_mod2.rda", sep= ""))
>
> # Variance components
> colMeans(NestChain_Mod2$VCV)
>
> colMeans(NestChain_Mod2$VCV)[1]/sum(colMeans(NestChain_Mod2$VCV))*100
> colMeans(NestChain_Mod2$VCV)[2]/sum(colMeans(NestChain_Mod2$VCV))*100
> colMeans(NestChain_Mod2$VCV)[3]/sum(colMeans(NestChain_Mod2$VCV))*100
> colMeans(NestChain_Mod2$VCV)[4]/sum(colMeans(NestChain_Mod2$VCV))*100
> colMeans(NestChain_Mod2$VCV)[5]/sum(colMeans(NestChain_Mod2$VCV))*100
> colMeans(NestChain_Mod2$VCV)[6]/sum(colMeans(NestChain_Mod2$VCV))*100
> colMeans(NestChain_Mod2$VCV)[7]/sum(colMeans(NestChain_Mod2$VCV))*100
>
>
> ---------------------------
>
>
>
>
> ---
> Robert Griffin
> PhD
> griffinevo.wordpress.com
>
> On Tue, Nov 3, 2015 at 6:29 PM, paul debes <paul.debes at utu.fi> wrote:
>
>> Dear Rob,
>>
>> Unfortunately, I cannot help with the MCMCglmm prior specifications and
>> hope someone else will do this, but maybe with the model.
>>
>> I think that fitting two separate models might lead to different results
>> than one common model when you have a significant but ignored covariance
>> between effects across populations when estimating only the within
>> population variance.
>>
>> As the Y effects are nested within Y.Source effects (they are, right?),
>> one initial full model to estimate within and across population  
>> variance,
>> and including the interaction with the genetic background effects,  
>> could be:
>>
>> fixed = Score ~ 1 + Block,
>> random = ~ BG + Y.Source/Y + BG:Y.Source/Y + Vial,
>>
>> expanded:
>> random = ~ BG + Y.Source + Y.Source:Y + BG:Y.Source + BG:Y.Source:Y +  
>> Vial,
>>
>>
>> Best,
>> Paul
>>
>>
>>
>> On Tue, 03 Nov 2015 17:01:32 +0200, Robert Griffin <
>> robgriffin247 at hotmail.com> wrote:
>>
>> Dear list members,
>>>
>>> I recently published work using mcmcglmm to test for Y linked genetic
>>> variance, finding variance within a population. This data was  
>>> collected by
>>> measuring phenotypic data from randomly sampled Y chromosomes from  
>>> within
>>> a
>>> population, tested in a single genetic background, using lifespan as  
>>> the
>>> response, and Y chromosome as a random effect.
>>>
>>> I have been trying to think of how I would improve such a study and
>>> concluded that a more complete study would be done using
>>> - Y chromosomes from multiple source populations (to test for among
>>> population variance)
>>> - multiple genetic backgrounds (to distinguish between [direct] genetic
>>> effects, and epistatic effects)...
>>>
>>> This got me thinking about the analysis of such data. I would need to
>>> estimate how much of the Y-linked variance within and among  
>>> populations is
>>> genetic and epistatic. This has lead me to two main questions where I
>>> could
>>> use some guidance:
>>>
>>> 1) can mcmcglmm be used to estimate the interaction (epistatic)  
>>> variance?
>>> - I think the answer is yes but I have as yet been unable to find clear
>>> advice on this, how can I add interaction variance term in to mcmcglmm?
>>> - I can add the variance explained by the genetic background by adding
>>> this
>>> as a random effect too, but (how) can I find the interaction between  
>>> two
>>> random explanatory variables?
>>>
>>> 2) could I do it all from a single model or would I need multiple  
>>> models?
>>> - If I built a model where the trait (Lifespan) is the response  
>>> variable,
>>> with random effects of Y chromosome, Y chromosome source population,
>>> genetic background, and the interactions between Y/Y-source and genetic
>>> background, would I be able to get estimates of the direct genetic and
>>> epistatic variances within and among the source populations?
>>> - I could build two models, one with Y chromosome and genetic  
>>> background
>>> as
>>> random effects (including the interaction), and the other with
>>> Y-chromosome
>>> source population and genetic background as random effects (including  
>>> the
>>> interaction). These two models would separately estimate the within and
>>> among population variances. I think this would be the correct approach.
>>> However, I'm unconvinced that the latter would give the desired  
>>> division
>>> in
>>> to direct and epistatic Y-linked variation.
>>>
>>> Thanks,
>>> Rob
>>>
>>> *Please kindly note*, advice on prior specification when estimating
>>>
>>> interaction effects would be also very highly appreciated, remembering
>>> that
>>> the variance components are likely very small. I have included a
>>> demonstration script which makes some dummy data, and shows some of the
>>> proposed models. The script includes a filepath which you can edit if  
>>> you
>>> wish to save and load chains, and switches at line 56 to turn the  
>>> chains
>>> on/off when running scripts.
>>>
>>> ##############
>>> ## R SCRIPT ##
>>> ##############
>>>
>>> # Y-background interaction dummy data
>>> rm(list=ls())
>>> library("lme4")
>>> library("MCMCglmm")
>>> set.seed(24)
>>>
>>> # Set filepath (CUSTOMISE THIS IF YOU WISH TO BE ABLE TO SAVE AND LOAD
>>> CHAINS)
>>> filepath = paste0("C:\\Users\\Rob\\Notes\\Y_epistasis\\")
>>>
>>> #### Description of dummy data ####
>>> # 20 Y chromosomes randomly sampled from 3 populations (A1:A20, B1:B20,
>>> C1:C20, n = 60)
>>> # Test in genetic backgrounds (BG) from three populations (Canton [CN],
>>> Zimbabwe [ZW], and Dahomey [DH])
>>> # Measure a trait (Lifepsan) in 90 individuals per combination of Y &  
>>> BG
>>> # Measured across three replicate blocks (BL1, BL2, BL3), with two  
>>> vials
>>> each per block
>>>
>>> n1 = 20 # Number of Y's sampled per source population
>>> n2 = 30 # Number of measurements per block/line
>>> n3 = 3  # Number of blocks (BL1, BL2, BL3)
>>> n4 = 3 # Number of genetic backgrounds (CN, ZW, DH)
>>> n5 = 3  # Number of Y source populations (A, B, C)
>>> n6 = 2  # Number of vials per block/Y/BG
>>>
>>> # Construction of data
>>> dummy = data.frame(
>>> as.factor(rep(c(paste0("A",1:n1), paste0("B",1:n1), paste0("C",1:n1)),
>>> each
>>> = n2*n3, times = n4)),
>>> as.factor(rep(c("A","B","C"), each = n1*n2*n3, times = n5)),
>>> as.factor(rep(c("CN","ZW","DH"), each = n1*n2*n3*n4, times = 1)),
>>> as.factor(rep(c("BL1","BL2","BL3"), each = n2)),
>>> as.factor(rep(c("A","B"), each = n2 / n6)),
>>> as.numeric(abs(round(rnorm(n1*n2*n3*n4*n5, 50, 15),0)))
>>> )
>>> colnames(dummy) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age")
>>> dummy$Vial = paste(dummy$Y, dummy$Y.Source, dummy$BG, dummy$Block,
>>> dummy$Vial, sep = "_")
>>> head(dummy)
>>>
>>> # Mean zero unit variance for Age (response variable)
>>> dummy$Score = (dummy$Age - mean(dummy$Age)) / sd(dummy$Age)
>>>
>>> # Subset DFs (Only one source population and one genetic background,  
>>> as in
>>> previous JEB paper)
>>> dummy_A = data.frame(split(dummy , f = dummy$Y.Source)[1])
>>> dummy_A_CN = data.frame(split(dummy_A , f = dummy_A$A.BG)[1])
>>> colnames(dummy_A) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age",
>>> "Score")
>>> colnames(dummy_A_CN) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age",
>>> "Score")
>>>
>>> # MCMC Priors
>>> prior1 = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0,  
>>> alpha.V=1000),
>>> G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
>>> R  = list(V = 1, nu=0.002))
>>>
>>> prior2 = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0,  
>>> alpha.V=1000),
>>> G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
>>> G3 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
>>> R  = list(V = 1, nu=0.002))
>>>
>>> # MCMC Switches
>>> testing = "y"
>>> runChain_A_CN1 = "y"
>>> runChain_A1 = "y"
>>> runChain_1 = "y"
>>>
>>> # MCMC Details
>>> if(testing == "y"){
>>> nitt = 10000
>>> burn = 1000
>>> thin = 10
>>> }else{
>>> nitt = 100000
>>> burn = 10000
>>> thin = 100
>>> }
>>>
>>> # MCMC Chain (as in JEB Paper: One Y source population, One genetic
>>> background)
>>> # Within-population Y-linked variation
>>> if(runChain_A_CN1=="y"){
>>> Chain_A_CN1 = MCMCglmm(Score ~1 + Block,
>>> random = ~Y + Vial,
>>> rcov = ~units,
>>> nitt = nitt,
>>> thin = thin,
>>> burnin = burn,
>>> prior = prior1,
>>> family = "gaussian",
>>> start = list(QUASI = FALSE),
>>> data = dummy_A_CN)
>>> file = paste(filepath, "Chain_A_CN1.rda", sep = "")
>>> save(Chain_A_CN1, file = file)
>>> }else{}
>>>
>>> ##############################
>>> #### Proposed MCMC Chains ####
>>> ##############################
>>>
>>> # Within-population Y-linked variation split in to direct and epistatic
>>> if(runChain_A1=="y"){
>>> Chain_A1 = MCMCglmm(Score ~1 + Block,
>>> random = ~Y + BG + Vial,
>>> rcov = ~units,
>>> nitt = nitt,
>>> thin = thin,
>>> burnin = burn,
>>> prior = prior2,
>>> family = "gaussian",
>>> start = list(QUASI = FALSE),
>>> data = dummy_A)
>>> file = paste(filepath, "Chain_A1.rda", sep = "")
>>> save(Chain_A1, file = file)
>>> }else{}
>>>
>>> # Among-population Y-linked variation split in to direct and epistatic
>>> if(runChain_1=="y"){
>>> Chain_1 = MCMCglmm(Score ~1 + Block,
>>> random = ~Y.Source + BG + Vial,
>>> rcov = ~units,
>>> nitt = nitt,
>>> thin = thin,
>>> burnin = burn,
>>> prior = prior2,
>>> family = "gaussian",
>>> start = list(QUASI = FALSE),
>>> data = dummy)
>>> file = paste(filepath, "Chain_1.rda", sep = "")
>>> save(Chain_1, file = file)
>>> }else{}
>>>
>>> Chain.A_CN1 = load(paste(filepath, "Chain_A_CN1.rda", sep= ""))
>>> Chain.A1 = load(paste(filepath, "Chain_A1.rda", sep= ""))
>>> Chain.1 = load(paste(filepath, "Chain_1.rda", sep= ""))
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>> --
>> Paul Debes
>> DFG Research Fellow
>> University of Turku
>> Department of Biology
>> Itäinen Pitkäkatu 4
>> 20520 Turku
>> Finland
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>

-- 
Paul Debes
DFG Research Fellow
University of Turku
Department of Biology
Itäinen Pitkäkatu 4
20520 Turku
Finland