[R] Fwd: Re: Poisson GLM using non-integer response/predictors?
Ben Bolker
bbolker at gmail.com
Fri Dec 30 20:50:33 CET 2011
Matthias Gondan <matthias-gondan <at> gmx.de> writes:
>
> Hi,
>
> Use offset variables if count occurrences of an event and you want to
> model the
> observation time.
>
> glm(count ~ predictors + offset(log(observation_time)), family=poisson)
>
> If you want to compare durations, look at library(survival), ?coxph
>
> If tnoise_sqrt is the square root of tourist noise, your example seems
> incorrect, because it is a predictor, not the dependent variable
>
> tnoise_sqrt ~ lengthfeeding_log
>
> Best wishes,
>
> Matthias
>
> Am 30.12.2011 16:29, schrieb Lucy Dablin:
> > Great lists, I always find them useful, thank you to
> > everyone who contributes to them.
> >
> > My question is regarding non-integer values from some data I
> > collected on parrots when using the poisson GLM. I observed the
> > parrots on a daily basis to see if they were affected by tourist
> > presence. My key predictors are tourist noise (averaged over a day
> > period so decimal value, square root to adjust for skew), tourist
> > number (the number of tourists at a site, square root), and the
> > number of boats passing the site in a day (log). These are
> > compared with predictors: total number of birds (count data,
> > square root), average time devoted to foraging at site (log),
> > species richness (sqrt), and the number of flushes per day. Apart
> > from the last one they are all non-integer values. When I run a
> > glm for example:
Your description sounds like you might already have transformed
your predictors: generally speaking, you don't want to do that
before running a GLM (the variance function incorporated in the
GLM takes care of heteroscedasticity, and the link function takes
care of nonlinearity in the response).
I suspect you want total number of birds, number of flushes per day,
and species richness to be modeled as Poisson (or negative binomial --
see ?glm.nb in the MASS package). Species richness *might* be
binomial, or more complicated, if you are drawing from a limited
species pool (e.g. if there are only 5 possible species and you
sometimes see 4 or 5 of them in a day). Is the total number
of birds really non-integer *before* you square-root transform it?
Time devoted to foraging at the site is most easily
modeled as log-normal (unless the response includes zeros:
i.e., log-transform as you have already done and use lm),
or possibly Gamma-distributed (although you may want to
use a log link instead of the default inverse link).
As Matthias said, offsets are used for the specific case of
non-uniform sampling effort (e.g. if you sampled different areas,
or for different lengths of time, every day).
You may be interested in r-sig-ecology at r-project.org , which
is an R mailing list specifically devoted to ecological questions.
More information about the R-help
mailing list