[R-sig-ME] GLMM & lack of linearity on the logit
Jonathan Baron
baron at psych.upenn.edu
Mon Jul 5 03:45:20 CEST 2010
Another issue is that, when many people create several categories, the
categories are often treated as factors (i.e., names, not numbers).
We don't know in this case. But that causes two problems.
First, we are looking for any difference between bins, not just a
monotonic effect. We can get a significant result because (say) the
middle bin is higher than the rest, but that could really be part of
the null hypothesis, not what we are looking for.
Second, the hypothesis test is much broader: any difference at all,
not just a monotonic effect with higher numbers in the predictor
associated with higher numbers in the dependent variable. Because it
is broader, the test loses power and can fail to detect a real
difference that would be detected even by falsely assuming that the
linear model fits. (And often the deviation from linearity can be
corrected by transforming variables, as noted by others, so it is just
a matter of scaling.)
On 07/04/10 17:57, Adam D. I. Kramer wrote:
> <puts on stats instructor hat>
>
> When you categorize a continuous variable, you are making an untested
> theoretical claim that there are only two kinds of nests: Those of high
> volume and those of low volume, and that it does not matter what the volume
> is beyond knowing that it is "high" or "low." In psychology, this is rarely
> true.
An interesting real example is here:
http://journal.sjdm.org/10/10202/jdm10202.html
or http://journal.sjdm.org/10/10202/jdm10202.pdf
A previously published paper found a result based on a split, but the
effect went away with almost any other split, or with a linear model.
> For more than two levels of a factor, the same logic holds: Why do those
> levels differ from each other in a manner that makes the continuous
> underlying distribution irrelevant? If your cutoffs are at 2, 4, and 6,
> then why would you insist _a priori_ that 3.9999 is wholly different from
> 4.0001 while being exactly the same as 2.000?
>
> If you have such a reason, you should make this reason clear and apparent in
> the methods section of your paper and let the reviewer pick on that. :)
Jon
--
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
Editor: Judgment and Decision Making (http://journal.sjdm.org)
More information about the R-sig-mixed-models
mailing list