[R-sig-eco] GLM mixed model with quasibinomial family

Fri Jul 30 22:22:35 CEST 2010

  [cc'ing back to r-sig-ecology]
  [Please keep sending replies to r-sig-ecology so that others may
benefit from the conversation, and so that others can answer if I
can't or am too busy (!)]

  I don't know what you mean by the "degree of fit to your data". Are
you trying to do a goodness-of-fit test? The standard deviance-based
test for goodness of fit/overdispersion that you may be thinking of
(e.g. see if residual deviance/residual df approx. 1, or test residual
deviance against a chi-squared distribution with df=(residual df))
only applies to NON-overdispersed models.

  You might want to get the book by Zuur et al on mixed models in ecology.

  Ben Bolker

On Fri, Jul 30, 2010 at 11:49 AM, Javier Martinez
<javi.martinez.lopez at gmail.com> wrote:
> Hello again Mr. Bolker,
>
> I have now tried glmmPQL and look very promising because I have in
> fact the expected results. Since I do not get a deviance parameter
> from these models I cannot assess their degree of fit to my data, so I
> was thinking if it would be possible to assess it somehow by doing a
> linear fit model between the expected and fitted values from the
> resulting glmmPQL model. Does it make sense to you?
>
> Thank you very much for any advise and regards,
>
> Javier
>
> On Thu, Jul 29, 2010 at 6:25 PM, Ben Bolker <bbolker at gmail.com> wrote:
>>  A little more information would probably be helpful.  Here's what
>> I'm guessing:
>>
>>   You have no 'treatment' except the passage of time, and only two
>> time points (say, before/after). You have a total of 16 measurements
>> (2 each at 8 sites), they are like binomial data (number of counts of
>> type x out of a total number N counted) but overdispersed.  You want
>> to test whether the proportion of type x changed between 'before' and
>> 'after'.  If the data were normally distributed, you could use a paired
>> t-test.
>>
>>  Is that a correct description?
>>
>>  If so, then time should be treated as a fixed factor, group as random.
>> 8 samples is probably enough (just).
>>
>>   If your counts are fairly large (i.e. the minimum of the numbers
>> of 'successes' and 'failures' in a typical group is >5) then you could
>> safely use glmmPQL in the MASS package:
>>
>>  glmmPQL(cbind(successes,failures)~time,random=~1|group,
>>              family="quasibinomial",data=...)
>>
>>  Have you thought about simply using a nonparametric test on the
>> proportions (i.e. wilcox.test(prop.before, prop.after,paired=TRUE) ... ?)
>>
>> On Thu, Jul 29, 2010 at 12:07 PM, Javier Martinez
>> <javi.martinez.lopez at gmail.com> wrote:
>>> Thanks to all of you! I did know the e-mail by Bates, which is out of
>>> my understanding, but I did not know the wiki on mixed models and the
>>> manuscript by Bolker! My data are based on 2 temporal samples from 8
>>> different sites. I use mixed models because I want to avoid
>>> pseudo-replication including the grouping factor into my model and
>>> thus looking for the trends within each group and not looking at the
>>> data as if they were independent. The question is, can I really use a
>>> mixed model if I only have two cases per group? At the end there are
>>> 16 cases in the regression plot but I am not sure if such a grouped
>>> analysis is right!
>>>
>>> Thank you again for your help!
>>>
>>> Javier
>>>
>>> On Wed, Jul 28, 2010 at 6:44 PM, Javier Martinez
>>> <javi.martinez.lopez at gmail.com> wrote:
>>>> Dear R-users,
>>>>
>>>> I am using the 'lmer' function from package 'lme4', looking for a
>>>> regression model which takes into account the grouped nature of my
>>>> data. I am using frequencies as the dependent variable and percentages
>>>> as the independent one. After some reading I think I should use the
>>>> 'quasibinomial' family because there is 'overdispersion' in my data
>>>> set (greater residual deviance than residual degrees of freedom). So,
>>>> I test this regression model but I do not get a significance p-value
>>>> for the regression! I have to test many different regressions with
>>>> different data, so how can I assess the significance of each of of
>>>> them?
>>>>
>>>> Thank you very much for your help!
>>>>
>>>> Javier
>>>>
>>>
>>
>