[R-sig-ME] fitting beta and zero mixture model containing both nested and crossed random effects

Paul Johnson pcdjohn@on @ending from gm@il@com
Wed Jun 13 18:35:05 CEST 2018

Hi Meng,

Is it possible for you to model the raw data (the counts of alleles within each individual site/assay/etc) rather than the calculated allele frequency (= proportion*)? If you have access to this data it would simplify your model greatly. I guess the reason for having zeroes is that there were no copies of an allele in a subpopulation (site, etc), giving an estimated proportion of zero? If so, you need a model with a binomial distribution for the responses (e.g. cbind(n.minor.allele, n.major.allele)**, assuming two alleles), and you could have a beta distribution for the allele frequencies. Then the model will see zeroes as being drawn from a binomial distribution with a non-zero allele frequency. This is the Balding-Nichols model, which gives a natural way of gauging the strength of random effects via f, which is an analogue of Wright’s Fst (which measures correlation between alleles in a subpopulation relative to the total population):

p.sub ~ Beta(shape1 = (1 - f) / f * p, shape2 = (1 - f) / f * (1 - p))

…for 0 < f < 1. p is the mean allele frequency, p.sub are the subpopulation (site, assay, etc) allele frequencies.

And the distribution of the number of alleles in subpopulation i is: 

n.minor.allele[i] ~ Binom(2n[i], p.sub[i])

...n[i] is the number of diploid individuals in subpopulation i.

Alternatively, I guess there’s nothing to stop you having normally distributed allele frequencies on the logit scale, in which case you could fit the model with glmer.

Best wishes,

*It’s a stats/population genetics language difference — for statisticians frequency means count, while for population geneticists an allele frequency is a proportion (I have a foot in each camp).
**Pooling alleles in this way, i.e. ignoring the individual level, assumes HWE. If you don’t want to do that you can treat a diploid individual as a level nested within site/assay/etc, and estimate an individual level f, which is analogous to Fis, the inbreeding coefficient.

> On 13 Jun 2018, at 15:47, Meng Liu <liumeng using usc.edu> wrote:
> Hi Ben and Guilaume,
> Thank you for reply. I am working on a precision experiment design, in
> which a sample will be tested using different assay, by different operator
> at different site. The measurement is allele frequency of DNA, which is a
> continuous proportion outcome. I originally plan to run a beta distribution
> random effects model, among which assay, operator and site are all random
> factors. However, because I found there are some zeros in the response
> data, that's why I am trying to run a zero-inflated beta random effects
> model, with random factors in both zero part and non zero part. I.e., we
> assume there will be variance from each factor in terms of predicting zero,
> and variance from each factor in terms of continuous data. However, the
> final research question would be evaluating the total variance contributed
> from each factor.I can see here it is more complex for just generalized
> linear model because of random effects from two different models. I am
> wondering if you have any idea on this or do you know anybody who might
> have thoughts on this?
> Thank you again for all help!
> Best regards,
> Meng
> On Wed, Jun 13, 2018 at 9:38 AM, Ben Bolker <bbolker using gmail.com> wrote:
>> I'm not sure how this (variance decomposition based on a
>> zero-inflated model) would work.
>> What is your subject-area/scientific question?
>> On Wed, Jun 13, 2018 at 4:26 AM, Guillaume Chaumet
>> <guillaumechaumet using gmail.com> wrote:
>>> My bad, I replied to you the first time without including the list.
>>> Regarding your last question, perhaps the list and/or Ben could
>>> provide a more accurate answer than me.
>>> I'm also curious to know how glmmTMB could do that
>>> 2018-06-13 0:09 GMT+02:00 Meng Liu <liumeng using usc.edu>:
>>>> Hi Guillaume,
>>>> Thank you so much for this! I just have another question: for example
>> if I
>>>> have random factor A and B in both logistic model part and beta model
>> part,
>>>> then after I fit the whole model and got variance component estimation
>> of
>>>> random effect for factor A and B for both logistic model part and beta
>> model
>>>> model part, will there be any way to combine variance together? I.e. I
>> can
>>>> estimate a total variance from factor A, and a total variance from
>> factor B
>>>> (i.e. only differ by factor, not model)? Something like variance
>>>> decomposition but I believe here is more complex as this is a mixture
>> model.
>>>> Thank you again for all your help
>>>> Best regards,
>>>> Meng
>>>> On Sun, Jun 10, 2018 at 11:03 AM, Guillaume Chaumet
>>>> <guillaumechaumet using gmail.com> wrote:
>>>>> brms:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-
>> 2Dproject.org_web_packages_brms_index.html&d=DwIBaQ&c=
>> clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=
>> Ij73g98b5MaGitndhxmoIw&m=Uy-z_keMG1SfZG-g8FxVqzfz-Ghl2OHun7TY7tfexwo&s=
>> Gfi89kd1PSimpIhWBglYPuJRn3_FF_uNBGvzVDvWe4A&e=
>>>>> 2018-06-09 21:06 GMT+02:00 Meng Liu <liumeng using usc.edu>:
>>>>>> To whom it may concern,
>>>>>> I am trying to fit a model for a data among which the response value
>> is
>>>>>> within [0,1). I am thinking about fitting the zeros as a complete
>>>>>> separate
>>>>>> category from the non-zero data, i.e. a binomial (Bernoulli) model to
>>>>>> "==0
>>>>>> vs >0" and a Beta model to the >0 responses. Also, my data contains
>> both
>>>>>> nested factors and crossed factors, which means I need to add nested
>>>>>> random
>>>>>> effects and crossed random effects to both logistic model part and
>> beta
>>>>>> model model. However, I didn't find any R packages can do exactly
>> what I
>>>>>> want (By far I found gamlss, glmmTMB, zoib but they either can only
>>>>>> assume
>>>>>> random zero or they can only fit repeated measures/clustered data but
>>>>>> not
>>>>>> nested and crossed design). Therefore, I am wondering if any one
>> know if
>>>>>> there is any available package or function can do this.
>>>>>> Thank you very much for your help!
>>>>>> Best regards
>>>>>> Meng
>>>>>>      [[alternative HTML version deleted]]
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
>> ethz.ch_mailman_listinfo_r-2Dsig-2Dmixed-2Dmodels&d=DwIBaQ&c=
>> clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=
>> Ij73g98b5MaGitndhxmoIw&m=Uy-z_keMG1SfZG-g8FxVqzfz-Ghl2OHun7TY7tfexwo&s=
>> FMNtOORgf7OlXhD5m8VHoGCnuWlt5NLqtXxalxQOhQw&e=
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
>> ethz.ch_mailman_listinfo_r-2Dsig-2Dmixed-2Dmodels&d=DwIFaQ&c=
>> clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=Ij73g98b5MaGitndhxmoIw&m=
>> 2BcoGwnjTH2lMtoXsQy3PazIJQlKthmPnzhLeUAe8Iw&s=
>> ni4tfiCBB6TzaOnxJfYkjSxo5Ztgo3sx1FyYE-Qc2AY&e=
> 	[[alternative HTML version deleted]]
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

More information about the R-sig-mixed-models mailing list