[R-sig-ME] Model validation for Presence / Absence, (binomial) GLMs
Gabriel Baud-Bovy
baud-bovy.gabriel at hsr.it
Fri Jul 5 13:34:39 CEST 2013
On 05/07/2013 12:48 PM, Ken Knoblauch wrote:
> Ben Bolker <bbolker at ...> writes:
>> Highland Statistics Ltd <highstat <at> ...> writes:
>>>>> This is something I always battle with given the
> plethora of great model
>>>>> fitting methods available for other models.
>>>>>
>>>>> I always use a variant of Hugh's suggestion and
> look at the % of correct
>>>>> predictions between models as a quick model
> fitting statistic.
>>>>> And for overdispersion I believe one way is to fit
> individual level random
>>>>> effects and see if this is a substantively better model.
> There is more on
>>>>> this in the wiki http://glmm.wikidot.com/faq
>>>> Yes, but this is unidentifiable for Bernoulli
> responses (as also
>>>> explained there).
>>> The statement on 'unidentifiable for Bernoulli
>>> responses'....well...apparently this is not that trivial.
>>> See: http://www.highstat.com/BGGLM.htm
>>>
>>> Follow the link to: the Discussion Board....
>>> Go to: Chapter 1 Introduction to generalized linear
> models
>>> And see the topic: Can binary logistic models be
> overdispersed?
>>> Alain
>> That's an interesting document: I think the bottom line is:
>>
>> * if the Bernoulli data can be grouped, i.e. if there are
>> in general multiple observations with the same set of
> covariates,
>> then overdispersion can be identified, because the data are
>> really equivalent to a binomial response within the groups.
>>
>> For example, the trivial example
>>
>> grp resp
>> A 1
>> A 0
>> A 1
>> B 0
>> B 0
>> B 1
>>
>> is equivalent to:
>>
>> grp successes total
>> A 2 3
>> B 1 3
>>
> Agreed that this is very interesting but still a bit mysterious
> as everything looks the same on the surface.
> The likelihoods only differ by the log of the binomial coefficients
> as can easily be verified on Ben's example above and as expected
> from the likelihood equations:
>
> Grpd <- read.table(
> textConnection("grp resp
> A 1
> A 0
> A 1
> B 0
> B 0
> B 1"), TRUE)
>
> UnGrpd <- read.table(
> textConnection("grp successes total
> A 2 3
> B 1 3"), TRUE)
>
> -logLik(glm(resp ~ grp, binomial, Grpd)) +
> logLik(glm(cbind(successes, total - successes) ~ grp, binomial, UnGrpd))
>
> with(UnGrpd, sum(log(choose(total, successes))))
>
> However, looking at the outputs of the glm, the degrees of freedom
> differ, being 4 on the binary responses and 0 for the binomial response.
> Should degrees of freedom really be computed differently in the two cases
> since it is easy to transform the two cases back and forth?
> And, if so, what does that mean?
>
> Ken
>
The issue of DoF seems similar in assessing Goodness of Fit for logistic
regression, which can be done with grouped data (or by defining bins as
in Hosmer's test, which is based on the same idea of pooling data).
I would have to look back at the references below to see how these
goodness-of-ift tests are affected by overdispersion.
Hosmer et al. (91) The Importance of Assessing the Fit of Logistic
Regression Models: A Case Study. Americal Journal of Public Health,
December 1991, Vol. 81, No. 12
Hosmer DW, Lemeshow S. A goodness-of-fit test for the multiple logistic
regression
model. Commu in Stat. 1980;A10:1043-1069.
Lemeshow S, Hosmer DW. A review of goodness-of-fit statistics for use in
the development of logistic regression models.Am JEpidemioL
1982;115:92-106.
Hosmer DW, Lemeshow S, Klar J. Goodness-of-fit testing for multiple
logistic regression analysis when the estimated probabilities are small.
Biometrcal J. 1988;30(7):1-14.
Gabriel
--
---------------------------------------------------------------------
Gabriel Baud-Bovy tel.: (+39) 02 2643 4839 (office)
UHSR University (+39) 02 2643 3429 (laboratory)
via Olgettina, 58 (+39) 02 2643 4891 (secretary)
20132 Milan, Italy fax: (+39) 02 2643 4892
More information about the R-sig-mixed-models
mailing list