[R-sig-ME] GLMM & lack of linearity on the logit

Mon Jul 5 02:57:54 CEST 2010

<puts on stats instructor hat>

When you categorize a continuous variable, you are making an untested
theoretical claim that there are only two kinds of nests: Those of high
volume and those of low volume, and that it does not matter what the volume
is beyond knowing that it is "high" or "low." In psychology, this is rarely
true.

For more than two levels of a factor, the same logic holds: Why do those
levels differ from each other in a manner that makes the continuous
underlying distribution irrelevant?  If your cutoffs are at 2, 4, and 6,
then why would you insist _a priori_ that 3.9999 is wholly different from
4.0001 while being exactly the same as 2.000?

If you have such a reason, you should make this reason clear and apparent in
the methods section of your paper and let the reviewer pick on that. :)

</hat>

In terms of lack of linearity, you can probably just transform your
continuous data into a more-linear continuous distribution.  But if
quadratic and cubic functions don't help, it may just be that it is only the
linear component of your nonlinear variable is predictive of death...but
others may disagree with me on this point.

--Adam

On Sun, 4 Jul 2010, Luciano La Sala wrote:

> Dear R-people,
>
> I have just received from reviewers of a manuscript some harsh comments on
> the statistical procedures. I'm studying risk factors of mortality at the
> nest level among Olrog's Gull nest mates, which is why I used mixed models
> with random intercepts (Nest ID). The outcome of interest if "Death"
> (yes/no) and one of my explanatory variables is "Egg Volume" (continuous).
> Since violation of linearity on the logit was evident I created 4 categories
> using the quartiles of the distribution and modeled them as dummies.
>
> However, one reviewer stated: "It is unclear why you used volume of eggs as
> a factor (i.e. categorized variable) in the analyses. Incorporating this
> predictor as a continuous variable, as was originally measured, would make
> analysis more informative. You stated that you made so "to relax the
> linearity assumption". GLMM are sufficiently robust to accept a continuous
> variable into a categorized model that, with the correct link function and
> the variable transformation, would support well the linearity assumption."
>
> That said, I wonder if (1) categorization is such a bad thing on the one
> hand, and (2) lack of linearity on the logit scale can be handled well by
> GLMM.
>
> In my case, adding quadratic and cubic terms after assessment of the shape
> of the x-y relationship did not improve the fit, so I decided to use dummies
> and thus relax the linearity assumption.
>
> Thank you very much in advance.
>
> Luciano
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>