[R] lmer and a response that is a proportion
Simon Blomberg
blomsp at ozemail.com.au
Mon Dec 4 03:46:26 CET 2006
Would beta regression solve your problem? (package betareg)
Simon.
John Fox wrote:
> Dear Cameron,
>
> Given your description, I thought that this might be the case.
>
> I'd first examine the distribution of the response variable to see what it
> looks like. If the values don't push the boundaries of 0 and 1, and their
> distribution is unimodal and reasonably symmetric, I'd consider analyzing
> them directly using normally distributed errors. If the values do stack up
> near 0, 1, or both, I'd consider a transformation, or perhaps a different
> family (depending on the pattern); in particular, if they stack up near both
> 0 and 1, a logit or similar transformation could help. Finally, if you have
> many values of 0, 1, or both, then a transformation isn't promising (and,
> indeed, the logit wouldn't be defined for these values). In any event, I'd
> check diagnostics after a preliminary fit.
>
> I hope this helps,
> John
>
> --------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> --------------------------------
>
>
>> -----Original Message-----
>> From: Cameron Gillies [mailto:cgillies at ualberta.ca]
>> Sent: Sunday, December 03, 2006 6:31 PM
>> To: Prof Brian Ripley; John Fox
>> Cc: r-help at stat.math.ethz.ch
>> Subject: Re: [R] lmer and a response that is a proportion
>>
>> Dear Brian and John,
>>
>> Thanks for your insight. I'll clarify a couple of things
>> incase it changes your advice.
>>
>> My response is a ratio of two measures taken during a bird's
>> path, which varies from 0 to 1, so I cannot convert it
>> columns of the number of successes. It has to be reported as
>> the proportion. I could logit transform it to make it
>> normal, but I am trying to avoid that so I can analyze it directly.
>>
>> The subjects are individual birds and I have a range of
>> sample sizes from each bird (from 8 to >200, average of about
>> 75 measurements/bird).
>>
>> Thanks!
>> Cam
>>
>>
>> On 12/3/06 3:47 PM, "Prof Brian Ripley" <ripley at stats.ox.ac.uk> wrote:
>>
>>
>>> On Sun, 3 Dec 2006, John Fox wrote:
>>>
>>>
>>>> Dear Cameron,
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: r-help-bounces at stat.math.ethz.ch
>>>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Cameron
>>>>> Gillies
>>>>> Sent: Sunday, December 03, 2006 1:58 PM
>>>>> To: r-help at stat.math.ethz.ch
>>>>> Subject: [R] lmer and a response that is a proportion
>>>>>
>>>>> Greetings all,
>>>>>
>>>>> I am using lmer (lme4 package) to analyze data where the
>>>>>
>> response is
>>
>>>>> a proportion (0 to 1). It appears to work, but I am wondering if
>>>>> the analysis is treating the response appropriately -
>>>>>
>> i.e. can lmer
>>
>>>>> do this?
>>>>>
>>>>>
>>>> As far as I know, you can specify the response as a proportion, in
>>>> which case the binomial counts would be given via the weights
>>>> argument -- at least that's how it's done in glm(). An alternative
>>>> that should be equivalent is to specify a two-column matrix with
>>>> counts of "successes" and "failures" as the response.
>>>>
>> Simply giving
>>
>>>> the proportion of successes without the counts wouldn't be
>>>>
>> appropriate.
>>
>>>>> I have used both family=binomial and quasibinomial - is one more
>>>>> appropriate when the response is a proportion? The coefficient
>>>>> estimates are identical, but the standard errors are larger with
>>>>> family=binomial.
>>>>>
>>>>>
>>>> The difference is that in the binomial family the
>>>>
>> dispersion is fixed
>>
>>>> to 1, while in the quasibinomial family it is estimated as a free
>>>> parameter. If the standard errors are larger with family=binomial,
>>>> then that suggests that the data are underdispersed
>>>>
>> (relative to the
>>
>>>> binomial); if the difference is substantial -- the factor
>>>>
>> is just the
>>
>>>> square root of the estimated dispersion -- then the
>>>>
>> binomial model is
>>
>>>> probably not appropriate for the data.
>>>>
>>> John's last deduction is appropriate to a GLM, but not
>>>
>> necessarily to
>>
>>> a GLMM. I don't have detailed experience with lmer for
>>>
>> binomial, but I
>>
>>> do for various other fitting routines for GLMM. Remember
>>>
>> there are at
>>
>>> least two sources of randomness in a GLMM, and let us keep
>>>
>> it simple
>>
>>> and have just a subject effect and a measurement error. Then if
>>> over-dispersion is happening within subjects, forcing the binomial
>>> dispersion (at the measurement level) to 1 tends to increase the
>>> estimate of the subject-level variance component to
>>>
>> compensate, and in
>>
>>> turn increase some of the standard errors.
>>>
>>> (Please note the 'tends' in that para, as the details of
>>>
>> the design do
>>
>>> matter. For cognescenti, think about plot and sub-plot
>>>
>> treatments in
>>
>>> a split-plot design.)
>>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
Centre for Resource and Environmental Studies
The Australian National University
Canberra ACT 0200
Australia
T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au
F: +61 2 6125 0757
CRICOS Provider # 00120C
The combination of some data and an aching desire for
an answer does not ensure that a reasonable answer
can be extracted from a given body of data.
- John Tukey.
More information about the R-help
mailing list