[R-sig-ME] Correct specification of nested binomial mixed model with custom intercept to infer variance components and intraclass correlations

Thu Jan 26 16:11:20 CET 2017

Dear all,
Just to ask a bit of advice about the correct way to specify a nested binomial GLM, in the context of estimating variance components / intraclass correlations to infer heritabilities of a behavioural trait.

The behavioural trait is a binary one (eating an egg or not, 1 or 0), and is performed by known individual honeybees (individually numbered, “individual_ID”) of a known father line (“patriline”) of a given hive (“colony”). Several subsequent egg eating events could be performed by the same individuals. Of each experiment with each colony several trials were done, and for each trial we have data on how many eggs were eaten in total, so we could analyse as a dependent variable the proportion of those eggs that were eaten by a given individual. In addition, we also genotyped a bunch of bees of each colony, which gave us the patriline distribution within each colony (“expected_proportion_patriline”), i.e. the proportion that each patriline makes up in the colony, which I thought should affect the a priori probability that bees of a given patriline would be observed eating eggs.

My question is what mixed model syntax would make most sense to analyse this data, and allows us to infer variance components and intraclass correlations as a basis for a heritability estimate of this egg eating behaviour?

One model I thought of was to include the expected proportion of each patriline that is present as a custom offset, using
library(afex)
set_sum_contrasts() # use effect coding
data$baseline=qlogis(data$expected_proportion_patriline) # custom intercept (qlogis=logit)
fit1=glmer(cbind(eggs_eaten_by_individual, eggs_eaten_in_total_intrial -eggs_eaten_by_individual)~-1+(1|colony/patriline/ID), offset=baseline, data=data, family=binomial)
but would this model make sense as a basis to estimate the variance components and intraclass correlation?

In another model I worked with the mean eggs eaten by bees of a given patriline and then fitted the model
data$baseline=qlogis(data$expected_proportion_patriline) # custom intercept (qlogis=logit)
fit2=glmer(cbind(eggs_eaten_by_patriline, eggs_eaten_in_total_intrial – eggs_eaten_by_patriline)~-1+(1|colony/patriline), offset=baseline, data=data, family=binomial)

Again though I am not sure if such a model would make sense, and neither of my two models take trial explicitly as a factor. Would anybody have any advice by any chance about the most sensible model given my experimental design (proportion data, with individuals nested in patriline nested in colony, with repeated trials and a priori info on the proportion of eggs that would be expected to be eaten by each patriline based on independent genotyping, which could perhaps be included as a custom intercept or a covariate)?

Best regards,
Tom Wenseleers
Dept of Biology
University of Leuven
Belgium

_____