[R-sig-ME] Modelling heterogeneous variances for an interaction term in MCMCglmm?
Jarrod Hadfield
j.hadfield at ed.ac.uk
Tue Feb 18 18:20:11 CET 2014
Hi Jackie,
1/ you don't need to recode pop.bin in the csv file. The intercept in
the model is the mu term for pop.bin #1, the intercept + the second
coefficient in the model is the mu term for pop.bin #2. Alternatively,
you could fit ~pop.bin-1 rather than ~pop.bin in the fixed part of the
model. In this case the four fixed effects are the four mu terms
without having to add things together.
2/ You don't need the var.fnorm function if you just want the mean magnitude
3/ You can condition on any random term you like. For example, if you
wanted to know what the magnitude within a study was, but you wanted
it averaged over populations you would discard the study variance.
However, if you wanted to know what the magnitude for a particular
study was then you should add the study random effect (not the
variance but the effect, so use pr=TRUE in the call o MCMCglmm) to the
mu term. If you think that the residual variation is not biological
variation (e.g. measurement error) then you should leave it out of the
calculations. This is what Michael and I did.
Cheers,
Jarrod
Quoting Jackie Wood <jackiewood7 at gmail.com> on Tue, 18 Feb 2014
10:33:51 -0500:
> Hi Jarrod,
>
> I discussed the code you provided last week with a colleague, and we just
> want to be sure that we are interpreting and executing it properly. As an
> example, say we specify a simple model with only pop.bin (which has four
> levels) allowing for heterogeneous variances at each level:
>
> prior1=list(R=list(V=diag(4),n=0.002),G=list(G1=list(V=diag(4),n=0.002),G2=list(V=diag(4),n=0.002)))
> model1=MCMCglmm(grad.lin.value~pop.bin,mev=mev,data=data,prior=prior1,random=~idh(pop.bin):study.IDR+idh(pop.bin):pop.IDR,rcov=~idh(pop.bin):units,nitt=300000,burnin=100000,thin=50,family="gaussian")
>
> In the object model$Sol there are four columns and we are interested in the
> first (the intercept, which corresponds to population bin #1) hence we use
> "model1$Sol[,1]" as you said.
>
> If we include all the sources of variance (study, pop, and units), we
> specify a vector of the column numbers that contain the random effects
> variance estimates for the intercept:
>
> m1 = c(1, 5, 10)
> model1$VCV[,m1]
>
> We then estimate the posterior mode and HPD credible intervals on the
> folded normal distribution with the following:
>
> mu.fnorm<-function(mu,sigma){dnorm(mu,0,sigma)*2*sigma^2+mu*(2*pnorm(mu,0,sigma)-1)}
>
> newMode = mu.fnorm(model1$Sol[,1], sqrt(rowSums(model1$VCV[,m1])))
> HPDinterval(newMode)
> posterior.mode(newMode)
>
> If we are correct, the resulting output should be the posterior mode and
> HPD credible intervals for the magnitude of selection for population bin #1?
>
> If we wanted to then calculate the magnitude of selection for population
> bin #2, we would recode the bins in the CSV file such that population bin
> #2 becomes the intercept and then follow the same method as for population
> bin #1 to obtain the magnitude of selection for population bin #2 (there
> could be an easier way to do this of course, but we just want to make sure
> this is getting at what we're interested in).
>
>
> There is an additional function provided by Mr. Morrissey, "var.fnorm",
> which gives the variance of selection under the folded normal but we are
> assuming that this is not necessary to compute the magnitude of selection
> and its confidence intervals (we only need the mu.fnorm function for that)
> and it is simply useful if one is also interested in what the variance of
> selection is in each of the pop size bins.
>
>
> Finally, we were hoping if you could clarify one more issue for us
> regarding which source of variance to include. Our model is structured to
> produce three variance estimates: variance attributable to study,
> population, and the residual variance. Study and population essentially
> represent "nuisance parameters" to us that were included in the model to
> account for issues of non-independence in our dataset. If we were
> interested in calculating the magnitude of selection after accounting for
> these sources of variation, should we include only the residual variance
> estimate as "sigma" in the mu.fnorm function? Or should we still include
> the estimates of study-level and population-level variance in our final
> calculations?
>
>
> Jackie
>
>
> On Fri, Feb 14, 2014 at 1:53 AM, Jarrod Hadfield <j.hadfield at ed.ac.uk>wrote:
>
>> Hi Jackie,
>>
>> The function is taking the standard deviation not the variance. You also
>> need to decide what sources of variance to include. Lets imagine you want
>> the expected absolute value of a random draw from the first of 12
>> combinations, and you want to include the variance due to the two random
>> terms and the residual. Then,
>>
>> mu.fnorm(m1$Sol[,1], sqrt(rowSums(m1$VCV[,seq(1,12*3,12)])))
>>
>> or something close to it should work.
>>
>> Jarrod
>>
>>
>>
>>
>> Quoting Jackie Wood <jackiewood7 at gmail.com> on Thu, 13 Feb 2014 18:41:34
>> -0500:
>>
>> Hi Jarrod,
>>>
>>> Thanks so much for the information regarding R-structure and model
>>> specification. Your explanation for that cleared things up substantially.
>>>
>>> In regards to the folded normal, I completely understand what you're
>>> saying
>>> though I'm unsure how that would be coded in MCMCglmm. My experience with
>>> MCMCglmm thus far has been estimating heritability and Qst so I might
>>> think
>>> it would be something along the lines of:
>>>
>>> posterior.fnorm<-mu.fnorm(model$Sol,model$VCV),
>>> var.fnorm(model$Sol,model$VCV)
>>> posterior.mode(posterior.fnorm)
>>> HPDinterval(posterior.fnorm,0.95)
>>>
>>> The first line of code doesn't work (and is probably pretty messed up),
>>> but
>>> I'm just wondering if it is in the right direction. I think evaluating the
>>> functions for the folded normal for each MCMC iteration as you suggested
>>> involves applying "mu.fnorm" and "var.fnorm" to the posterior distribution
>>> of solutions and the posterior distribution of variance matrices given by
>>> an MCMCglmm model but how the two functions would be combined to give
>>> "posterior.fnorm" in this case is not clear to me.
>>>
>>> Thanks,
>>> Jackie
>>>
>>>
>>>
>>>
>>> On Thu, Feb 13, 2014 at 1:28 PM, Jarrod Hadfield <j.hadfield at ed.ac.uk
>>> >wrote:
>>>
>>> Hi Jackie,
>>>>
>>>> The reason that it does not run is that each observation needs to be
>>>> associated with a single effect. For ease, imagine two factors fac1 and
>>>> fac2 have 2-levels each (A and B, and C and D respectively). MCMCglmm
>>>> removes the intercept from formula inside variance functions (e.g. idh)
>>>> so
>>>> the R-structure model is
>>>>
>>>> fac1*fac2-1
>>>>
>>>> which for the four types of observation gives:
>>>>
>>>> fac1A fac1B fac2D fac1B:fac2D
>>>> AC 1 0 0 0
>>>> AD 1 0 1 0
>>>> BC 0 1 0 0
>>>> BD 0 1 1 1
>>>>
>>>> This means that the (residual) variance for AD is actually
>>>> V[1,1]+V[3,3]+V[1,3]+V[3,1] where V is the estimated covariance matrix.
>>>> This makes interpretation difficult, and actually MCMCglmm does not allow
>>>> you to fit this type of R-structure (although it does allow the
>>>> G-structures to be of this form).
>>>>
>>>>
>>>> Perhaps a better way of doing it is to fit idh(fac1:fac2) because this
>>>> gives
>>>>
>>>> fac1A:fac2C fac1A:fac2D fac1B:fac2C fac1B:fac2D
>>>> AC 1 0 0 0
>>>> AD 0 1 0 0
>>>> BC 0 0 1 0
>>>> BD 0 0 0 1
>>>>
>>>> which means that there is a one-to-one mapping between the diagonal
>>>> elements of V and the variance within each factor combination. MCMCglmm
>>>> will allow you to fit this type of R-structure. The next version of
>>>> MCMCglmm (which will probably be released next week) will issue more
>>>> sensible warnings when invalid R-structures are specified.
>>>>
>>>> Note however, that if you fit idh(fac1:fac2) for all sources of variance
>>>> (residual and random effects) and you have fac1:fac2-1 in the fixed
>>>> formula
>>>> then the analysis is exactly equivalent to fitting separate models to the
>>>> data from each factor combination.
>>>>
>>>> Applying a function to a posterior distribution results in a valid new
>>>> posterior distribution. It therefore makes more sense to evaluate the
>>>> functions for the folded normal for each MCMC iteration to give a
>>>> posterior
>>>> distribution for the expected absolute value of selection. This is
>>>> advantageous because mean(f(x)) is not always equal to f(mean(x)) when f
>>>> is
>>>> non-linear and it also allows you to use things like HPDinterval in order
>>>> to determine the uncertainty in your inferences. Note that it is hard to
>>>> know what influence a prior placed directly on x will do for inferences
>>>> about x, let alone on f(x).
>>>>
>>>> Cheers,
>>>>
>>>> Jarrod
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Quoting Jackie Wood <jackiewood7 at gmail.com> on Thu, 13 Feb 2014 12:44:24
>>>> -0500:
>>>>
>>>> Hi Jarrod,
>>>>
>>>>>
>>>>> Thanks for your response! I made the change you suggested and the model
>>>>> stopped after 3000 iterations and asked for a stronger prior, I used
>>>>> n=1,
>>>>> and it seemed to run ok after that.
>>>>>
>>>>> I was wondering if I might impose on you to ask a follow-up question? My
>>>>> response variable "grad.lin" are linear selection gradients and
>>>>> therefore
>>>>> have directionality, but we are also interested in the magnitude. So we
>>>>> are
>>>>> essentially investigating the potential effect of population size on
>>>>> selection. I have read a number of previous articles and syntheses
>>>>> regarding selection and understand that this requires the use of a
>>>>> folded
>>>>> normal distribution. In fact, one of those articles was one that you
>>>>> collaborated on with Michael Morrissey. I contacted Dr. Morrissey, and
>>>>> he
>>>>> was kind enough to provide an example of functions that could be used to
>>>>> obtain the corresponding values for the magnitude of selection once the
>>>>> posterior mode and variance were obtained from an MCMCglmm model using
>>>>> the
>>>>> selection gradients "straight up" as it were. The functions were as
>>>>> follows:
>>>>>
>>>>> mu.fnorm<-function(mu,sigma){dnorm(mu,0,sigma)*2*sigma^2+
>>>>> mu*(2*pnorm(mu,0,sigma)-1)}
>>>>> var.fnorm<-function(mu,sigma){mu^2+sigma^2-(sigma*sqrt(2/pi)
>>>>> *exp((-1*mu^2)/(2*sigma^2))+mu*(1-2*pnorm(-1*mu/sigma,0,1)))^2}
>>>>>
>>>>> I assumed on my own that the value for the posterior mode of selection
>>>>> that
>>>>> is plugged into the functions above is given for each level of "pop.bin"
>>>>> in
>>>>> the object:
>>>>>
>>>>> posterior.mode(model$Sol)
>>>>>
>>>>> and that the posterior variance of selection to use in the functions
>>>>> comes
>>>>> from the units variance (after the random effects "study" and "pop" are
>>>>> accounted for) given in:
>>>>>
>>>>> posterior.mode(model$VCV)
>>>>>
>>>>> In any case, to make a long story short, the reason why I was attempting
>>>>> to
>>>>> use the prior with "R=list(V=diag(12),n=0.002)" is to obtain estimates
>>>>> of
>>>>> units variance for the different levels of the interaction term for use
>>>>> in
>>>>> the folded normal functions to obtain values for the magnitude of
>>>>> selection. So the change to the model you provided works, but only one
>>>>> overall estimate of units variance is given.
>>>>>
>>>>> Anyway, perhaps this is not a problem if I am totally incorrect in the
>>>>> assumption that units variance is what I'm looking for.
>>>>>
>>>>> I checked the data and all three of the groups (fish, plants, and birds)
>>>>> are represented in each of the pop.size bins though it's admittedly not
>>>>> an
>>>>> even spread. For example, there is more data for birds than for fish or
>>>>> plants in each of the bins.
>>>>>
>>>>> Thanks again,
>>>>> Jackie
>>>>>
>>>>>
>>>>> On Thu, Feb 13, 2014 at 5:15 AM, Jarrod Hadfield <j.hadfield at ed.ac.uk
>>>>> >wrote:
>>>>>
>>>>> Hi Jackie,
>>>>>
>>>>>>
>>>>>> Unless you have a very large amount of data I would be careful about
>>>>>> fitting 3 12x12 covariance matrices. However, that in itself should not
>>>>>> generate the error.
>>>>>>
>>>>>> Does
>>>>>>
>>>>>> prior=list(R=list(V=diag(1),n=0.002),G=list(G1=list(V=diag(
>>>>>> 12),n=0.002),G2=list(V=diag(12),n=0.002)))
>>>>>>
>>>>>> model=MCMCglmm(grad.lin~pop.bin*group,mev=mev,data=data,
>>>>>> prior=prior,random=~idh(pop.bin*group):study+idh(pop.bin*
>>>>>> group):pop,rcov=~units,nitt=300000,burnin=100000,thin=50)
>>>>>>
>>>>>> run? If so can you let me know. If not can you make sure that both
>>>>>> pop.bin
>>>>>> and group are coded as factors.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Jarrod
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Quoting Jackie Wood <jackiewood7 at gmail.com> on Wed, 12 Feb 2014
>>>>>> 18:04:22
>>>>>> -0500:
>>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>>
>>>>>>> I am using MCMCglmm and attempting to create a model which allows
>>>>>>> heterogeneous variances for different levels of a moderator variable
>>>>>>> where
>>>>>>> the moderator is an interaction term. I attempted the run the
>>>>>>> following
>>>>>>> model:
>>>>>>>
>>>>>>> prior=list(R=list(V=diag(12),n=0.002),G=list(G1=list(V=
>>>>>>> diag(12),n=0.002),G2=list(V=diag(12),n=0.002)))
>>>>>>>
>>>>>>> model=MCMCglmm(grad.lin~pop.bin*group,mev=mev,data=data,
>>>>>>> prior=prior,random=~idh(pop.bin*group):study
>>>>>>> +idh(pop.bin*group):pop,rcov=~idh(pop.bin*group):units,nitt=
>>>>>>> 300000,burnin=100000,thin=50)
>>>>>>>
>>>>>>> where "pop.bin" corresponds to bins of population size and has 4
>>>>>>> levels,
>>>>>>> and "group" has 3 levels (Fish, Birds, Plants).
>>>>>>>
>>>>>>> When I run the above model I receive the error:
>>>>>>> Error in priorformat(if (NOpriorG) { :V is the wrong dimension for
>>>>>>> some
>>>>>>> prior$G/prior$R elements4
>>>>>>>
>>>>>>> I tried running a simpler model with just "pop.bin" instead of the
>>>>>>> interaction term and "V=diag(4)" in the prior, and it seems to run
>>>>>>> just
>>>>>>> fine.
>>>>>>>
>>>>>>> I'm unsure whether the problem is simply too many levels and I perhaps
>>>>>>> don't have enough data or statistical power to run this model or
>>>>>>> whether I
>>>>>>> made a simple (or huge) error in the model or prior specification. If
>>>>>>> anyone could shed some light on this it would be a huge help.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jackie
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jacquelyn L.A. Wood, MSc.
>>>>>>> PhD Candidate
>>>>>>> Biology Dept.
>>>>>>> Concordia University
>>>>>>> 7141 Sherbrooke St. West
>>>>>>> Montreal, QC
>>>>>>> H4B 1R6
>>>>>>> Phone: (514) 293-7255
>>>>>>> Fax: (514) 848-2881
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>> Scotland, with registration number SC005336.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> Jacquelyn L.A. Wood, MSc.
>>>>> PhD Candidate
>>>>> Biology Dept.
>>>>> Concordia University
>>>>> 7141 Sherbrooke St. West
>>>>> Montreal, QC
>>>>> H4B 1R6
>>>>> Phone: (514) 293-7255
>>>>> Fax: (514) 848-2881
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Jacquelyn L.A. Wood, MSc.
>>> PhD Candidate
>>> Biology Dept.
>>> Concordia University
>>> 7141 Sherbrooke St. West
>>> Montreal, QC
>>> H4B 1R6
>>> Phone: (514) 293-7255
>>> Fax: (514) 848-2881
>>>
>>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>
>
> --
> Jacquelyn L.A. Wood, MSc.
> PhD Candidate
> Biology Dept.
> Concordia University
> 7141 Sherbrooke St. West
> Montreal, QC
> H4B 1R6
> Phone: (514) 293-7255
> Fax: (514) 848-2881
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list