[R-sig-eco] terminology for binomial regression

Matthew Forister forister at gmail.com
Sun Mar 6 01:52:20 CET 2011


Ben, thank you.  I did not realize the interpretation was dependent on the
baseline probabilities, but I think I get it now.  One follow up question...

Assume for minute that I'm not interested in converting those values into
statements of probability.  Rather, I'm interested in making comparisons
among species.  For example, a species with a value of -0.25 (for the
coefficient associated with years) is in more severe decline than a species
with a value of -0.14.

Empirically, this seems to work out just fine.  If you take a look at the
attached pdf, you'll see examples of the fit of the binomial regression
models.  The numbers on the outside are the years-coefficients.  Seems to me
that those numbers do a good job at indicating the rate of decline, even
though the starting frequencies are different for different species.

Am I making any mistake in thinking about comparisons among species based on
the years-coefficient like this?

thanks!
Matt





On Sat, Mar 5, 2011 at 12:31 PM, Ben Bolker <bbolker at gmail.com> wrote:

> On 11-03-05 02:59 PM, Matthew Forister wrote:
> > Hi all,
> >
> > I have been frustrated by what seems to me like inconsistent terminology
> > associated with binomial regression.  There are two questions I'd love to
> > have answered, below.
> >
> > For context, I have been using glm with binomial error, logit link.  The
> > response variable is "successes and failures" -- the successes are the
> days
> > on which a species is observed in a year, and the failures are days in
> which
> > it is not observed.  So the code
> > is  glm(cbind(DaysPresent,DaysAbsent)~years,binomial).  I'm interested in
> > the coefficient associated with years as a way to express the decline in
> the
> > number of days a species is observed over time.
> >
> > Question:
> >
> > (1) This probably seems silly, but is "logistic regression" the same as a
> > glm with binomial error?  This is where I have found some frustrating
> > inconsistency in the ecological literature.
>
>   In my opinion, it would be reasonable to use 'logistic regression' to
> mean any GLM (generalized linear model) with a logit link, although very
> most probably with the binomial family. My impression is that people
> most commonly use 'logistic regression' to mean a GLM with
> *binary* data and a logit link and 'binomial regression' to denote
> non-binary data, but I don't have any references.
>
> > (2) What's the most straightforward way to interpret the coefficients
> from a
> > predictor variable in a model like the one specified above?  For example,
> a
> > species in decline (observed in fewer days over time) will have a years
> > coefficient of -0.14.  I'd like a verbal interpretation of that number.
> >  Rather than give you my understanding, I'll just ask and hope someone
> can
> > help me out!
>
>   I would suggest Gelman and Hill for this, but these are statements of
> changes on the logit scale ("log-odds" is a synonym).  Unfortunately,
> the interpretation in terms of probability outcomes depends on the
> baseline probability.  Rules of thumb are:
>
>  (1) for small (near zero) baseline probabilities, the logistic
> resembles an exponential and so the interpretation of logit-scale and
> log-scale coefficients are similar, i.e. for small changes they can be
> interpreted as proportional changes.  For your example above, this would
> correspond to a PROPORTIONAL decline of approximately 14% per year for a
> species that was already fairly rare.  (More precisely a decline of
> (1-exp(-0.14))=0.13.)  (I want to emphasize that this is a change
> relative to the original frequency of the species.)
>
>  (2) for baseline probabilities near 0.5, the rule of thumb is that the
> change in probability of occurrence is about r/4, so if your species
> were originally present in about half of the samples a coefficient of
> -0.14 would correspond to a decline of about 3.5% per year (this is
> absolute rather than proportional).
>
>  (3) For baseline probabilities near 1.0 (common species), #1 applies
> but this time to the probability of non-occurrence. For example, suppose
> we have a species that occurs 95% of the time.
>
> ## transform to logit scale
>  qlogis(0.95)  ## 2.944, call it approx 2.95
>  plogis(2.95-0.14) ## 0.943
>
> ## compare this with the change in the original probability of
> ## non-occurrence (0.05), which *increases* by 14%
> 1-0.05*1.14  ## 0.943
>



-- 
Matthew L Forister
Assistant Professor
Dept. of Biology / MS 314
1664 N. Virginia St.
University of Nevada, Reno
Reno, Nevada 89557
--


More information about the R-sig-ecology mailing list