[R] lmer and a response that is a proportion
Cameron Gillies
cgillies at ualberta.ca
Mon Dec 4 05:38:31 CET 2006
Hello Simon and John,
I'm afraid I need to include random effects, both a random intercept and
possibly random coefficients and it doesn't look like betareg can do that.
John, the data is spread along the range of 0 to 1 with most values closer
to 1, so it does transform well using the logit transformation. I was
trying to avoid that though because I was not sure what impact the
transformation would have on the random effects or interpretation of the
coefficients.
Thanks again!
Cam
On 12/3/06 7:46 PM, "Simon Blomberg" <blomsp at ozemail.com.au> wrote:
> Would beta regression solve your problem? (package betareg)
>
> Simon.
>
> John Fox wrote:
>> Dear Cameron,
>>
>> Given your description, I thought that this might be the case.
>>
>> I'd first examine the distribution of the response variable to see what it
>> looks like. If the values don't push the boundaries of 0 and 1, and their
>> distribution is unimodal and reasonably symmetric, I'd consider analyzing
>> them directly using normally distributed errors. If the values do stack up
>> near 0, 1, or both, I'd consider a transformation, or perhaps a different
>> family (depending on the pattern); in particular, if they stack up near both
>> 0 and 1, a logit or similar transformation could help. Finally, if you have
>> many values of 0, 1, or both, then a transformation isn't promising (and,
>> indeed, the logit wouldn't be defined for these values). In any event, I'd
>> check diagnostics after a preliminary fit.
>>
>> I hope this helps,
>> John
>>
>> --------------------------------
>> John Fox
>> Department of Sociology
>> McMaster University
>> Hamilton, Ontario
>> Canada L8S 4M4
>> 905-525-9140x23604
>> http://socserv.mcmaster.ca/jfox
>> --------------------------------
>>
>>
>>> -----Original Message-----
>>> From: Cameron Gillies [mailto:cgillies at ualberta.ca]
>>> Sent: Sunday, December 03, 2006 6:31 PM
>>> To: Prof Brian Ripley; John Fox
>>> Cc: r-help at stat.math.ethz.ch
>>> Subject: Re: [R] lmer and a response that is a proportion
>>>
>>> Dear Brian and John,
>>>
>>> Thanks for your insight. I'll clarify a couple of things
>>> incase it changes your advice.
>>>
>>> My response is a ratio of two measures taken during a bird's
>>> path, which varies from 0 to 1, so I cannot convert it
>>> columns of the number of successes. It has to be reported as
>>> the proportion. I could logit transform it to make it
>>> normal, but I am trying to avoid that so I can analyze it directly.
>>>
>>> The subjects are individual birds and I have a range of
>>> sample sizes from each bird (from 8 to >200, average of about
>>> 75 measurements/bird).
>>>
>>> Thanks!
>>> Cam
>>>
>>>
>>> On 12/3/06 3:47 PM, "Prof Brian Ripley" <ripley at stats.ox.ac.uk> wrote:
>>>
>>>
>>>> On Sun, 3 Dec 2006, John Fox wrote:
>>>>
>>>>
>>>>> Dear Cameron,
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: r-help-bounces at stat.math.ethz.ch
>>>>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Cameron
>>>>>> Gillies
>>>>>> Sent: Sunday, December 03, 2006 1:58 PM
>>>>>> To: r-help at stat.math.ethz.ch
>>>>>> Subject: [R] lmer and a response that is a proportion
>>>>>>
>>>>>> Greetings all,
>>>>>>
>>>>>> I am using lmer (lme4 package) to analyze data where the
>>>>>>
>>> response is
>>>
>>>>>> a proportion (0 to 1). It appears to work, but I am wondering if
>>>>>> the analysis is treating the response appropriately -
>>>>>>
>>> i.e. can lmer
>>>
>>>>>> do this?
>>>>>>
>>>>>>
>>>>> As far as I know, you can specify the response as a proportion, in
>>>>> which case the binomial counts would be given via the weights
>>>>> argument -- at least that's how it's done in glm(). An alternative
>>>>> that should be equivalent is to specify a two-column matrix with
>>>>> counts of "successes" and "failures" as the response.
>>>>>
>>> Simply giving
>>>
>>>>> the proportion of successes without the counts wouldn't be
>>>>>
>>> appropriate.
>>>
>>>>>> I have used both family=binomial and quasibinomial - is one more
>>>>>> appropriate when the response is a proportion? The coefficient
>>>>>> estimates are identical, but the standard errors are larger with
>>>>>> family=binomial.
>>>>>>
>>>>>>
>>>>> The difference is that in the binomial family the
>>>>>
>>> dispersion is fixed
>>>
>>>>> to 1, while in the quasibinomial family it is estimated as a free
>>>>> parameter. If the standard errors are larger with family=binomial,
>>>>> then that suggests that the data are underdispersed
>>>>>
>>> (relative to the
>>>
>>>>> binomial); if the difference is substantial -- the factor
>>>>>
>>> is just the
>>>
>>>>> square root of the estimated dispersion -- then the
>>>>>
>>> binomial model is
>>>
>>>>> probably not appropriate for the data.
>>>>>
>>>> John's last deduction is appropriate to a GLM, but not
>>>>
>>> necessarily to
>>>
>>>> a GLMM. I don't have detailed experience with lmer for
>>>>
>>> binomial, but I
>>>
>>>> do for various other fitting routines for GLMM. Remember
>>>>
>>> there are at
>>>
>>>> least two sources of randomness in a GLMM, and let us keep
>>>>
>>> it simple
>>>
>>>> and have just a subject effect and a measurement error. Then if
>>>> over-dispersion is happening within subjects, forcing the binomial
>>>> dispersion (at the measurement level) to 1 tends to increase the
>>>> estimate of the subject-level variance component to
>>>>
>>> compensate, and in
>>>
>>>> turn increase some of the standard errors.
>>>>
>>>> (Please note the 'tends' in that para, as the details of
>>>>
>>> the design do
>>>
>>>> matter. For cognescenti, think about plot and sub-plot
>>>>
>>> treatments in
>>>
>>>> a split-plot design.)
>>>>
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
More information about the R-help
mailing list