[R-sig-ME] zero-inflation and multimodal count distribution

Tue May 2 18:31:51 CEST 2017

On 17-05-02 12:20 PM, simone santoro wrote:
> Hi all,
> 
> I am trying to test a hypothesis regarding the different contribution of
> sons and daughters to parents’ fitness. I have a number of (bird) nests of
> which I have measured a feature of parents related to their quality
> (continuous variable) that I hypothesize affects the future lifetime
> fecundity of their sons and daughters.
> 
> Specifically, my hypothesis is that at high values of parents’ quality sons
> will be more fecund than sisters through their entire life and vice versa,
> at low values of parents’ quality, daughters will be more than brothers.
> 
> 
> Note that sons and daughters of a nest, of which I have recorded their
> lifetime fecundity, are born all the same year. Thus, year of birth (of
> sons and daughters) is a random intercept I want to control for as it is
> the nest identity. The data set may be arranged in two ways, one that
> considers a row for each nest and another that considers a row for each
> offspring (son or daughter).
> 
> 
> In case 1 (row = nest), I have these variables: FN, family name; YEAR,
> birth year of sons and daughter; nDescBySons, lifetime total number of
> progeny generated by sons (pooled);  nDescByDaughs, lifetime total number
> of progeny generated by daughters (pooled); nSons, number of sons; nDaughs,
> number of daughters; parQuality, parents’ quality.
> 
> In case 2 (row = son or daughter), I have these variables: FN, family name;
> YEAR, birth year of sons and daughter; nDesc, lifetime total number of
> progeny generated by the individual; sex, son or daughter; nestSize, total
> number of sons and daughters at nest; parQuality, parents’ quality.
> 
> 
> In a way, I think that the second arrangement of data is easier to be
> analyzed for testing my hypothesis (comment/suggestion on this?). In this
> way I have direct information on the individual-level lifetime fecundity of
> sons and daughters and have not necessarily to take care of how many sons
> and daughters were at the nest.
> However, I have lot of zeros (many sons and daughters disappear – die or
> emigrate - and have no recorded descendants at all) and data have a kind of
> bimodal distribution after the zero mode (see below image):
> 
> https://drive.google.com/open?id=0BwsTfIcebsrOZnljSW9uQXF2UU0
> 
> 
> Thus, I would use a zero-inflated GLMM as, for instance, by using glmmTMB
> package in R. Something like this:
> 
> glmmTMB(nDesc ~ parQuality*sex+(1|NF)+(1|YEAR),…, zi~1)
> 
> But, what about that ‘ugly’ multimodal distribution? I thought I may try
> different distributions (e.g. poisson, compois, any other?) and compare the
> model fit by looking at the AIC.
> 
> Any advice on this would be extremely appreciated.
> 
> 
> Simone
> 

   My main thought is that your plots show the *marginal* distribution
of the data.  Differences among families/years or odd shapes of the
parental quality distribution could drive this pattern without any need
to assume the *conditional* distribution is multimodal.  Fit a sensible
model (like the one you suggest) and then check diagnostics in various
ways (if you have enough data, you could consider interactions between
sex and parental quality and the random effects -- e.g. does parental
quality matter more in some birth years than others?)