[R-sig-ME] possible collinearity between random and fixed effects in a linear mixed model

Thu Sep 10 17:33:19 CEST 2015

Dear all,

Recently I have submitted a manuscript in which I used a linear mixed model
to describe the variation in the date of reproduction of female wild boars.
For each female (n=350, culled by hunters in 27 relatively small [nearly 200
ha] hunting areas in 8 hunting seasons), we had analyzed the uterus and we
had an estimate of the age of fetuses, therefore we obtained the estimate of
the conception date.

Since I was especially interested in the study of the synchrony of
reproduction among females belonging to the same social group (a phenomenon
observed in the past for wild boar female groups and many other mammals), I
fitted a model with date of reproduction as dependent variable, and groups
of females culled in both the same "hunting area" + the same hunting season
as random factor (i.e., females with a not negligible prob. to belong to the
same social group...therefore my random term is a rough proxy of social
group).

As independent variables, I included in my models many environmental and
individual variables.

I tested for the inclusion of the random term (full model with the random
term vs. full model without it), that was extremely significant. Then I
performed model selection, selecting some of the environmental factors.

The problem: Most of my environmental data are "poor". For example, for the
rainfall and seed production I have a unique value for each hunting season.
For other variables such as tree coverage I have a unique data for all the
observations belonging to the same "hunting area", etc. Therefore, I have
n=350 sows but only 8 different values for rainfall, only 27 values for tree
coverage etc.

I had included the "poor" environmental variables (and some of them were
selected) because a first version of the ms was rejected by a journal
because of the lack of environmental data, known to be able to influence the
reproduction.

Anyway: in this new submission the referee told me that I cannot use both
the random term and the fixed effect because:

<<some of the fixed factors are equivalent to the random term year (e.g
rainfall, temperatures, seed crop) or hunting ground (habitat, tree species
abundance). This is not correct because environmental and inter-annual
heterogeneity (which I guess the authors are interested in) is accounted for
by random factors. So one can either use hunting ground- and year-base
predictors as fixed terms to study their effects or as a random terms, to
eliminate their effects. I assume that the authors prefer the former.>>

The referee in few words suggested me to drop to the random term. However,
the random term explains a huge amount of variance and is in perfect
agreement with the biological knowledge about wild boar reproduction (i.e.,
synchrony of reproduction among females near to each other). I think it is a
shame to drop to it, and I want to be sure that I can't use both the
environmental data and the random term for the abovementioned reasons (due
to the fact that a certain amount of groups have the same values for the
environmental variables). Furthermore, it is not only a matter of
"dropping". I actually think that results without the random term would be
extremely misleading. Since it seems that there IS a "social group effect":
if I ignored it, systematic variation would end up in the residuals leading
to potentially biased inference.

If the referee's suggestion is questionable, I want to try to discuss this
aspect, and possibly to keep my model. My lack of statistical knowledge,
anyway, make me unsure. Someone can suggest anything?

Thanks in advance,

Antonio

	[[alternative HTML version deleted]]