[R-sig-ME] possible collinearity between random and fixed effects in a linear mixed model

Thierry Onkelinx thierry.onkelinx at inbo.be
Fri Sep 11 10:50:20 CEST 2015

Dear Antonello,

IMHO the referee is wrong. I've illustrated this issue with a simple
example at http://rpubs.com/INBOstats/both_fixed_random

To put is simple: the random effects only model things that the fixed
effects can't model. In your case the random effect will tell something
about the social group effect after correcting for the fixed effects.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-09-10 17:33 GMT+02:00 Antonello Canu <antonellocanu1982 op gmail.com>:

> Dear all,
> Recently I have submitted a manuscript in which I used a linear mixed model
> to describe the variation in the date of reproduction of female wild boars.
> For each female (n=350, culled by hunters in 27 relatively small [nearly
> 200
> ha] hunting areas in 8 hunting seasons), we had analyzed the uterus and we
> had an estimate of the age of fetuses, therefore we obtained the estimate
> of
> the conception date.
> Since I was especially interested in the study of the synchrony of
> reproduction among females belonging to the same social group (a phenomenon
> observed in the past for wild boar female groups and many other mammals), I
> fitted a model with date of reproduction as dependent variable, and groups
> of females culled in both the same "hunting area" + the same hunting season
> as random factor (i.e., females with a not negligible prob. to belong to
> the
> same social group...therefore my random term is a rough proxy of social
> group).
> As independent variables, I included in my models many environmental and
> individual variables.
> I tested for the inclusion of the random term (full model with the random
> term vs. full model without it), that was extremely significant. Then I
> performed model selection, selecting some of the environmental factors.
> The problem: Most of my environmental data are "poor". For example, for the
> rainfall and seed production I have a unique value for each hunting season.
> For other variables such as tree coverage I have a unique data for all the
> observations belonging to the same "hunting area", etc. Therefore, I have
> n=350 sows but only 8 different values for rainfall, only 27 values for
> tree
> coverage etc.
> I had included the "poor" environmental variables (and some of them were
> selected) because a first version of the ms was rejected by a journal
> because of the lack of environmental data, known to be able to influence
> the
> reproduction.
> Anyway: in this new submission the referee told me that I cannot use both
> the random term and the fixed effect because:
> <<some of the fixed factors are equivalent to the random term year (e.g
> rainfall, temperatures, seed crop) or hunting ground (habitat, tree species
> abundance). This is not correct because environmental and inter-annual
> heterogeneity (which I guess the authors are interested in) is accounted
> for
> by random factors. So one can either use hunting ground- and year-base
> predictors as fixed terms to study their effects or as a random terms, to
> eliminate their effects. I assume that the authors prefer the former.>>
> The referee in few words suggested me to drop to the random term. However,
> the random term explains a huge amount of variance and is in perfect
> agreement with the biological knowledge about wild boar reproduction (i.e.,
> synchrony of reproduction among females near to each other). I think it is
> a
> shame to drop to it, and I want to be sure that I can't use both the
> environmental data and the random term for the abovementioned reasons (due
> to the fact that a certain amount of groups have the same values for the
> environmental variables). Furthermore, it is not only a matter of
> "dropping". I actually think that results without the random term would be
> extremely misleading. Since it seems that there IS a "social group effect":
> if I ignored it, systematic variation would end up in the residuals leading
> to potentially biased inference.
> If the referee's suggestion is questionable, I want to try to discuss this
> aspect, and possibly to keep my model. My lack of statistical knowledge,
> anyway, make me unsure. Someone can suggest anything?
> Thanks in advance,
> Antonio
>         [[alternative HTML version deleted]]
> _______________________________________________
> R-sig-mixed-models op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]

More information about the R-sig-mixed-models mailing list