[R-sig-eco] R-sig-ecology Digest, Vol 19, Issue 2

Peter Solymos solymos at ualberta.ca
Fri Oct 2 18:23:24 CEST 2009


Dear All,

I admit that overdispersion can be a problem. But you can't compare
Poisson with quasi-Poisson based on logLik, because the likelihood is
not defined for quasi* models. The quasi-likelihood can be maximized
to get the dispersion parameter, but coefficients are the same, only
SE's and p-values are corrected:

## some random data
y<-rpois(100, 3)
x<-rnorm(100)
## GLMs
m1 <- glm(y~x,family=poisson)
m2 <- glm(y~x,family=quasipoisson)
## coefficients are equal
all.equal(coef(m1), coef(m2))
## SE's are not
rbind(pois=coef(summary(m1))[,2], qpois=coef(summary(m2))[,2])
## p-values are not
rbind(pois=coef(summary(m1))[,4], qpois=coef(summary(m2))[,4])
## logLik for Poisson: OK
logLik(m1)
## logLik for Poisson: NA
logLik(m2)

The pscl package provides negative binomial models with zero inflation
too (see Achim Zeileis, Christian Kleiber, Simon Jackman:
Regression Models for Count Data in R, JSS, http://www.jstatsoft.org/v27/i08).

If you have fancier (say GLMM) models, you can make likelihood ratio
test, but that might be quite advanced to do so (see José Miguel
Ponciano, Mark L. Taper, Brian Dennis, Subhash R. Lele (2009)
Hierarchical models in ecology: confidence intervals, hypothesis
testing, and model selection using data cloning. Ecology: Vol. 90, No.
2, pp. 356-362.).

Yours,

Peter

Péter Sólymos
Alberta Biodiversity Monitoring Institute
Department of Biological Sciences
CW 405, Biological Sciences Bldg
University of Alberta
Edmonton, Alberta, T6G 2E9, Canada
Phone: 780.492.8534
email <- paste("solymos", "ualberta.ca", sep = "@")



On Fri, Oct 2, 2009 at 9:53 AM, Nicholas Lewin-Koh <nikko at hailmail.net> wrote:
>  No No No No No!
> The log likelihood of the Poisson and the Gaussian are not comparable.
> One is a discrete distribution and the other continuous, you can get
> into all
> sorts of trouble there and not just pathological cases. They are on
> totally different scales.
>
> You need to make a decision if you want to model the MEAN species
> richness as
> continuous, and not worry about answers like 3.1 species. You are
> modeling the mean.
> Or go with a discrete distribution like Poisson or quasi-Poisson, you
> can test
> for overdispersion within a discrete family of distributions.  As
> someone
> mentioned before if your counts are away from zero, the Poisson is very
> symmetric,
> and goes asymptotically to a normal. But for practical purposes your
> results
> should be similar. For small samples, ie with categorical predictors and
> few
> counts per cell, it can make a difference.
>
> So, if you want to do model selection, you have to first choose
> discrete or continuous, then within that set compare log likelihoods.
> (you are on firmer ground if the models are somehow nested).
>
> Nicholas
>> Message: 10
>> Date: Fri, 02 Oct 2009 08:29:10 +0200
>> From: Carsten Dormann <carsten.dormann at ufz.de>
>> Subject: Re: [R-sig-eco] Negative binomial
>> To: "Canning-Clode, Joao" <Canning-ClodeJ at si.edu>
>> Cc: "r-sig-ecology at r-project.org" <r-sig-ecology at r-project.org>
>> Message-ID: <4AC59DB6.9030001 at ufz.de>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> Dear Joao,
>>
>> I propose you do the following (and wait for the outcry-responses to
>> this email to see if it is a reasonable proposal):
>>
>> Fit your model with different types of distributions and compare their
>> logLik-values:
>> logLik(glm(y ~ x1+x2+x3+I(x1^2) + x1:x3, family=gaussian))
>> logLik(glm(y ~ x1+x2+x3+I(x1^2) + x1:x3, family=poisson))
>> logLik(glm(y ~ x1+x2+x3+I(x1^2) + x1:x3, family=quasipoisson))
>> logLik(glm.nb(y ~ x1+x2+x3+I(x1^2) + x1:x3)) # require(MASS)
>>
>> The model with the highest log-Likelihood is the distribution of choice
>> and you can defend it against reviewer.
>>
>> A few notes:
>> 1. You obviously cannot do this when one of the models uses transformed
>> responses (e.g. log(y)), because the LL will then be completely
>> different.
>> 2. When you use a more complex model (say a GLMM), you can approximate
>> the neg.bin through a two-step procedure: 1. fit a (wrongly structured)
>> glm.nb and extract the theta value from the summary of the model, say
>> theta=4.5 (that is the second parameter of the neg.bin distribution).
>> Then fit the GLMM again, giving as family the argument:
>> negative.binomial(theta=4.5) (again from package MASS). The same holds
>> for GAMs and other models requiring a specification of family.
>> 3. You may want to dig around for books recommending the above
>> procedure. I think I got this as advice from someone else, but haven't
>> bothered yet to look it up (obviously MASS would be a good starting
>> place, in their description of the neg.bin). I saw a paper that does
>> this (using the minimum AIC but otherwise this approach), but it is not
>> a statistical, but rather an ecological paper (although the analyst in
>> the author group is a biometrician whom I full trust): Weigelt, A.,
>> Schumacher, J., Walther, T. Bartelheimer, M., Steinlein, T., Beyschlag,
>> W. (2006) Identifying mechanisms of competition in multispecies
>> communities. Journal of Ecology 95:53-64
>>
>> HTH,
>>
>> Carsten
>>
>>
>> Canning-Clode, Joao wrote:
>> > Hi all,
>> >
>> > 1st time user here!
>> > I am an ecologist working with marine fouling assemblages. I just got a paper back for revision. I am working with count data (species richness). I have used a linear model but the reviewers are recommending the use of negative binomial or Poisson. As far as I could understand from the literature these complex models should be used and the distribution is skewed left (lots of zeros). Well, my data is perfectly normal distributed. My main question is: can I still use negative binomial or poisson even if my data is normal? Does that make sense?
>> >
>> > Thanks in advance
>> >
>> > Jo?o Canning Clode, PhD
>> > Postdoctoral Fellow
>> > Marine Invasions Research Lab
>> > Smithsonian Environmental Research Center
>> > 647 Contees Wharf Road
>> > Edgewater, MD 21037
>> >
>> > Email: canning-clodej at si.edu
>> > Web: www.canning-clode.com
>> > Tel: 443-482-2354
>> >
>> > _______________________________________________
>> > R-sig-ecology mailing list
>> > R-sig-ecology at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>> >
>> >
>>
>> --
>> Dr. Carsten F. Dormann
>> Department of Computational Landscape Ecology
>> Helmholtz Centre for Environmental Research-UFZ
>> Permoserstr. 15
>> 04318 Leipzig
>> Germany
>>
>> Tel: ++49(0)341 2351946
>> Fax: ++49(0)341 2351939
>> Email: carsten.dormann at ufz.de
>> internet: http://www.ufz.de/index.php?de=4205
>>
>>
>>
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>



More information about the R-sig-ecology mailing list