[R-sig-ME] Model validation for Presence / Absence, (binomial) GLMs

Fri Jul 5 12:48:02 CEST 2013

Ben Bolker <bbolker at ...> writes:
> Highland Statistics Ltd <highstat <at> ...> writes:
> > >> This is something I always battle with given the 
plethora of great model
> > >> fitting methods available for other models.
> > >>
> > >> I always use a variant of Hugh's suggestion and
 look at the % of correct
> > >> predictions between models as a quick model 
fitting statistic.
> > >>
> > >> And for overdispersion I believe one way is to fit 
individual level random
> > >> effects and see if this is a substantively better model. 
There is more on
> > >> this in the wiki http://glmm.wikidot.com/faq
> > >    Yes, but this is unidentifiable for Bernoulli 
responses (as also
> > > explained there).
> > 
> > The statement on  'unidentifiable for Bernoulli 
> > responses'....well...apparently this is not that trivial.
> > See:  http://www.highstat.com/BGGLM.htm
> > 
> > Follow the link to: the Discussion Board....
> > Go to: Chapter 1 Introduction to generalized linear 
models
> > And see the topic: Can binary logistic models be 
overdispersed?
> > 
> > Alain
> 
>   That's an interesting document: I think the bottom line is:
> 
>   * if the Bernoulli data can be grouped, i.e. if there are
> in general multiple observations with the same set of 
covariates,
> then overdispersion can be identified, because the data are
> really equivalent to a binomial response within the groups.
> 
>   For example, the trivial example
> 
> grp  resp
> A    1
> A    0
> A    1
> B    0
> B    0
> B    1
> 
> is equivalent to:
> 
> grp  successes total
> A    2         3
> B    1         3
> 
Agreed that this is very interesting but still a bit mysterious
as everything looks the same on the surface.  
The likelihoods only differ by the log of the binomial coefficients 
as can easily be verified on Ben's example above and as expected
from the likelihood equations:

Grpd <- read.table(
textConnection("grp  resp
A    1
A    0
A    1
B    0
B    0
B    1"), TRUE)

UnGrpd <- read.table(
textConnection("grp  successes total
A    2         3
B    1         3"), TRUE)

-logLik(glm(resp ~ grp, binomial, Grpd)) +
logLik(glm(cbind(successes, total - successes) ~ grp, binomial, UnGrpd))

with(UnGrpd, sum(log(choose(total, successes))))

However, looking at the outputs of the glm, the degrees of freedom
differ, being 4 on the binary responses and 0 for the binomial response.
Should degrees of freedom really be computed differently in the two cases
since it is easy to transform the two cases back and forth?
And, if so, what does that mean?

Ken

-- 
Kenneth Knoblauch
Inserm U846
Stem-cell and Brain Research Institute
Department of Integrative Neurosciences
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.sbri.fr/members/kenneth-knoblauch.html