[R-sig-ME] what about "zero-inflated" predictors

Sat Dec 29 17:47:20 CET 2012

Edwin Lebrija Trejos <elebrija at ...> writes:

> Dear mixed-modelers, I am analyzing seedling survival (0/1) as a
> function of the density of neighbors with a different level of
> relatedness to the focal species. This is, the density of neighbors
> of the same species, of the same genus, family and so on. Focal
> species rarely ocurr with neighbors of the same genus so 92% of the
> cases correspond to 0 neighbors of the same genus (the frequency
> table of the density of neighbors of the same genus is copies at the
> end).I have ran a model using glmer where both the Wald Z test and
> the likelihood ratio test of the models, with and without the fixed
> effect, show a significant increase of the seedling survival odds
> with an increase in the density of neighbors from the same genus.I
> have found lots of information discussing problems with response
> variables containing many zeroes but almost nothing about
> predictors with many zeroes. In a book by Simon Sheater (2009) on
> regression with R there is a brief section that discusses the
> transformation of predictors in logistic regression for binary
> data. Sheater (2009) shows how there is a need to transform skewed
> variables to mantain the linear relationship between the predictor
> and the log odds. He also shows that when the predictor variable has
> a poisson distribution the log odds remain a linear funtion of the
> predictor. My variable in question cannot be normalized with any
> transformation and does not precisely follows a poisson
> distribution.In classic regression my data would certainly
> invalidate the analysis but I am wondering if this is also the case
> for mixed models fit by glmer. Thank you very much for your
> attention to this problem,Edwin

  The reason that there's very little attention given to the
distribution of the predictors is that in general the definition
of standard statistical models such as GLMMs **does not say anything
about the distribution of the predictors**.  In particular, as far
as I am aware your statement that "in classic regression my data
would certainly invalidate the analysis" is not true -- at least
if we're only talking about the distribution of the predictor.

   The main importance of the distribution of the predictor is that
it affects the power of the test -- obviously if most of your
predictor data are zeros, they won't give you very much information
about how the response changes as a function of the response.
I haven't read Sheater's book, but the purpose of transforming the
predictor in this context is to take a response that is *not*
log-odds-linear on the original scale of the predictor, but (e.g.)
might be log-odds-linear when the predictor is on a log scale.
Thus the transformation is *not* fixing a problem with the
distribution of the predictor, but rather with the linearity
of the response.

  As always I'm happy to be corrected by others on the list ...