[R-sig-ME] Model validation for Presence / Absence, (binomial) GLMs
Ken Knoblauch
ken.knoblauch at inserm.fr
Fri Jul 5 12:48:02 CEST 2013
Ben Bolker <bbolker at ...> writes:
> Highland Statistics Ltd <highstat <at> ...> writes:
> > >> This is something I always battle with given the
plethora of great model
> > >> fitting methods available for other models.
> > >>
> > >> I always use a variant of Hugh's suggestion and
look at the % of correct
> > >> predictions between models as a quick model
fitting statistic.
> > >>
> > >> And for overdispersion I believe one way is to fit
individual level random
> > >> effects and see if this is a substantively better model.
There is more on
> > >> this in the wiki http://glmm.wikidot.com/faq
> > > Yes, but this is unidentifiable for Bernoulli
responses (as also
> > > explained there).
> >
> > The statement on 'unidentifiable for Bernoulli
> > responses'....well...apparently this is not that trivial.
> > See: http://www.highstat.com/BGGLM.htm
> >
> > Follow the link to: the Discussion Board....
> > Go to: Chapter 1 Introduction to generalized linear
models
> > And see the topic: Can binary logistic models be
overdispersed?
> >
> > Alain
>
> That's an interesting document: I think the bottom line is:
>
> * if the Bernoulli data can be grouped, i.e. if there are
> in general multiple observations with the same set of
covariates,
> then overdispersion can be identified, because the data are
> really equivalent to a binomial response within the groups.
>
> For example, the trivial example
>
> grp resp
> A 1
> A 0
> A 1
> B 0
> B 0
> B 1
>
> is equivalent to:
>
> grp successes total
> A 2 3
> B 1 3
>
Agreed that this is very interesting but still a bit mysterious
as everything looks the same on the surface.
The likelihoods only differ by the log of the binomial coefficients
as can easily be verified on Ben's example above and as expected
from the likelihood equations:
Grpd <- read.table(
textConnection("grp resp
A 1
A 0
A 1
B 0
B 0
B 1"), TRUE)
UnGrpd <- read.table(
textConnection("grp successes total
A 2 3
B 1 3"), TRUE)
-logLik(glm(resp ~ grp, binomial, Grpd)) +
logLik(glm(cbind(successes, total - successes) ~ grp, binomial, UnGrpd))
with(UnGrpd, sum(log(choose(total, successes))))
However, looking at the outputs of the glm, the degrees of freedom
differ, being 4 on the binary responses and 0 for the binomial response.
Should degrees of freedom really be computed differently in the two cases
since it is easy to transform the two cases back and forth?
And, if so, what does that mean?
Ken
--
Kenneth Knoblauch
Inserm U846
Stem-cell and Brain Research Institute
Department of Integrative Neurosciences
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.sbri.fr/members/kenneth-knoblauch.html
More information about the R-sig-mixed-models
mailing list