[R] analyzing binomial data with spatially correlated errors

Thu Mar 20 14:08:10 CET 2008

On Wed, Mar 19, 2008 at 3:02 PM, Ben Bolker <bolker at ufl.edu> wrote:
> Jean-Baptiste Ferdy <Jean-Baptiste.Ferdy <at> univ-montp2.fr> writes:
>
>  >
>  > Dear R users,
>  >
>  > I want to explain binomial data by a serie of fixed effects. My problem is
>  > that my binomial data  are spatially correlated. Naively, I thought I could
>  > found something similar to gls to analyze such data. After some reading, I
>  > decided that lmer is probably to tool I need. The model I want to fit would
>  > look like
>  >
>  > lmer ( cbind(n.success,n.failure) ~ (x1 + x2 + ... + xn)^2 , family=binomial,
>  > correlation=corExp(1,form=~longitude+latitude))
>  >
>  > This doesn't work because lmer says it needs a random effect in the model.
>  > And, apart from the spatial random effect that I want to capture by computing
>  > the correlation matrix, I have no other random effect.
>  >
>  > There must be something I do not understand here... I can't get why gls can do
>  > this on gaussian data but lmer can't on binomial ones.
>  >
>
>   This is a hard problem.  The proximal issue is that lmer does not yet
>  include a correlation term (I'm a little surprised you didn't get an
>  error to that effect), and won't for some time since it is still in heavy
>  development for more basic features.

The lmer function does have a ... argument that will swallow up the
correlation argument (and proceed to ignore it).  I think there are
advantages to including a ... argument but one of the disadvantages is
this quietly ignoring arguments that are not defined in the function
(and are not mentioned in any documentation about the function).

> If your data were normal you could
>  use gls from the nlme package, but nlme doesn't do generalized LMMs
>  (only LMMs and NLMMs).  You could *almost* use glmmPQL from the MASS package,
>  which allows you to fit any lme model structure
>  within a GLM 'wrapper', but as far as I know it wraps only lme (
>  which requires at least one random effect) and not gls.

I don't think it is at all trivial to define a model for a binary or
binomial response with a spatial correlation structure.  It may be
well-known; it's just that I don't know how to do it easily.  I just
saw the post by Roger Bivand who gave a reference on one approach
(although usually when one finds oneself using something like a group
factor with only one level to define the random effects it is a sign
that something suspicious is underway.  Expecting to estimate a
variance from a single observation should raise a few red flags.)

The way Jose and I approached correlation structures in lme is to
pre-whiten the data and the model matrices, conditional on the
parameters that determine the (additional) marginal correlation of the
responses.  That works when the conditional distribution of the
response is normal.  I don't see how it could work for a binary or
binomial response.  A linear combination of normals is normal.  A
non-trivial linear combination of binomials is not binomial.

In writing lmer I have found that I must think about the model very
carefully before I can determine a suitable computational method.  I
spent most of the month of January staring at a sequence of
transformations on the whiteboard in my office trying to determine
what should go where and how to implement the whole chain in data
structures and algorithms.

The normal distribution and linear predictors fit together in such a
way that one can factor out correlation structures or variance
functions.  Get away from the normal distribution and things start to
break.

Think of the typical way in which we write a linear model:

y = X b + e

where y is an n-dimensional response, X is an n by p model matrix, b
is a p-dimensional coefficient vector to be estimated and e is the
"noise" term with a multivariate normal distribution that has mean 0.
Now, try to write a generalized linear model that way.  It doesn't
work.  You must express a GLM differently, relating the mean to the
liner predictor, and taking into account the way the variance relates
to the mean.

This is more than a notational difference.  In a linear model the
effect of b is limited to the linear predictor and, through that, the
mean.  The variance-covariance specification can be separated from the
mean and, hence, can be specified separately.  It is easy to fool
yourself into thinking that the same should be true for generalized
linear models, just like it is easy to fool yourself into thinking
that all the arguments for the lme function will work unchanged in
lmer.

>   You could try gee or geoRglm -- neither trivially easy, I think ...
>
>   Ben Bolker
>
>
>
>  ______________________________________________
>  R-help at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>