[R] lmer and a response that is a proportion

Mon Dec 4 00:30:46 CET 2006

Dear Brian and John,

Thanks for your insight.  I'll clarify a couple of things incase it changes
your advice.

My response is a ratio of two measures taken during a bird's path, which
varies from 0  to 1, so I cannot convert it columns of the number of
successes.  It has to be reported as the proportion.  I could logit
transform it to make it normal, but I am trying to avoid that so I can
analyze it directly.

The subjects are individual birds and I have a range of sample sizes from
each bird (from 8 to >200, average of about 75 measurements/bird).

Thanks!
Cam

On 12/3/06 3:47 PM, "Prof Brian Ripley" <ripley at stats.ox.ac.uk> wrote:

> On Sun, 3 Dec 2006, John Fox wrote:
> 
>> Dear Cameron,
>> 
>>> -----Original Message-----
>>> From: r-help-bounces at stat.math.ethz.ch
>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Cameron Gillies
>>> Sent: Sunday, December 03, 2006 1:58 PM
>>> To: r-help at stat.math.ethz.ch
>>> Subject: [R] lmer and a response that is a proportion
>>> 
>>> Greetings all,
>>> 
>>> I am using lmer (lme4 package) to analyze data where the
>>> response is a proportion (0 to 1).  It appears to work, but I
>>> am wondering if the analysis is treating the response
>>> appropriately - i.e. can lmer do this?
>>> 
>> 
>> As far as I know, you can specify the response as a proportion, in which
>> case the binomial counts would be given via the weights argument -- at least
>> that's how it's done in glm(). An alternative that should be equivalent is
>> to specify a two-column matrix with counts of "successes" and "failures" as
>> the response. Simply giving the proportion of successes without the counts
>> wouldn't be appropriate.
>> 
>>> I have used both family=binomial and quasibinomial - is one
>>> more appropriate when the response is a proportion?  The
>>> coefficient estimates are identical, but the standard errors
>>> are larger with family=binomial.
>>> 
>> 
>> The difference is that in the binomial family the dispersion is fixed to 1,
>> while in the quasibinomial family it is estimated as a free parameter. If
>> the standard errors are larger with family=binomial, then that suggests that
>> the data are underdispersed (relative to the binomial); if the difference is
>> substantial -- the factor is just the square root of the estimated
>> dispersion -- then the binomial model is probably not appropriate for the
>> data.
> 
> John's last deduction is appropriate to a GLM, but not necessarily to a
> GLMM. I don't have detailed experience with lmer for binomial, but I do
> for various other fitting routines for GLMM.  Remember there are at least
> two sources of randomness in a GLMM, and let us keep it simple and have
> just a subject effect and a measurement error.  Then if over-dispersion is
> happening within subjects, forcing the binomial dispersion (at the
> measurement level) to 1 tends to increase the estimate of the
> subject-level variance component to compensate, and in turn increase some
> of the standard errors.
> 
> (Please note the 'tends' in that para, as the details of the design do
> matter.  For cognescenti, think about plot and sub-plot treatments in a
> split-plot design.)