[R-sig-eco] terminology for binomial regression

Sat Mar 5 21:31:44 CET 2011

On 11-03-05 02:59 PM, Matthew Forister wrote:
> Hi all,
> 
> I have been frustrated by what seems to me like inconsistent terminology
> associated with binomial regression.  There are two questions I'd love to
> have answered, below.
> 
> For context, I have been using glm with binomial error, logit link.  The
> response variable is "successes and failures" -- the successes are the days
> on which a species is observed in a year, and the failures are days in which
> it is not observed.  So the code
> is  glm(cbind(DaysPresent,DaysAbsent)~years,binomial).  I'm interested in
> the coefficient associated with years as a way to express the decline in the
> number of days a species is observed over time.
> 
> Question:
> 
> (1) This probably seems silly, but is "logistic regression" the same as a
> glm with binomial error?  This is where I have found some frustrating
> inconsistency in the ecological literature.

  In my opinion, it would be reasonable to use 'logistic regression' to
mean any GLM (generalized linear model) with a logit link, although very
most probably with the binomial family. My impression is that people
most commonly use 'logistic regression' to mean a GLM with
*binary* data and a logit link and 'binomial regression' to denote
non-binary data, but I don't have any references.

> (2) What's the most straightforward way to interpret the coefficients from a
> predictor variable in a model like the one specified above?  For example, a
> species in decline (observed in fewer days over time) will have a years
> coefficient of -0.14.  I'd like a verbal interpretation of that number.
>  Rather than give you my understanding, I'll just ask and hope someone can
> help me out!

  I would suggest Gelman and Hill for this, but these are statements of
changes on the logit scale ("log-odds" is a synonym).  Unfortunately,
the interpretation in terms of probability outcomes depends on the
baseline probability.  Rules of thumb are:

 (1) for small (near zero) baseline probabilities, the logistic
resembles an exponential and so the interpretation of logit-scale and
log-scale coefficients are similar, i.e. for small changes they can be
interpreted as proportional changes.  For your example above, this would
correspond to a PROPORTIONAL decline of approximately 14% per year for a
species that was already fairly rare.  (More precisely a decline of
(1-exp(-0.14))=0.13.)  (I want to emphasize that this is a change
relative to the original frequency of the species.)

 (2) for baseline probabilities near 0.5, the rule of thumb is that the
change in probability of occurrence is about r/4, so if your species
were originally present in about half of the samples a coefficient of
-0.14 would correspond to a decline of about 3.5% per year (this is
absolute rather than proportional).

 (3) For baseline probabilities near 1.0 (common species), #1 applies
but this time to the probability of non-occurrence. For example, suppose
we have a species that occurs 95% of the time.

## transform to logit scale
 qlogis(0.95)  ## 2.944, call it approx 2.95
 plogis(2.95-0.14) ## 0.943

## compare this with the change in the original probability of
## non-occurrence (0.05), which *increases* by 14%
1-0.05*1.14  ## 0.943