[R] lmer and a response that is a proportion

Simon Blomberg blomsp at ozemail.com.au
Mon Dec 4 03:46:26 CET 2006


Would beta regression solve your problem? (package betareg)

Simon.

John Fox wrote:
> Dear Cameron,
>
> Given your description, I thought that this might be the case. 
>
> I'd first examine the distribution of the response variable to see what it
> looks like. If the values don't push the boundaries of 0 and 1, and their
> distribution is unimodal and reasonably symmetric, I'd consider analyzing
> them directly using normally distributed errors. If the values do stack up
> near 0, 1, or both, I'd consider a transformation, or perhaps a different
> family (depending on the pattern); in particular, if they stack up near both
> 0 and 1, a logit or similar transformation could help. Finally, if you have
> many values of 0, 1, or both, then a transformation isn't promising (and,
> indeed, the logit wouldn't be defined for these values). In any event, I'd
> check diagnostics after a preliminary fit.
>
> I hope this helps,
>  John
>
> --------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox 
> -------------------------------- 
>
>   
>> -----Original Message-----
>> From: Cameron Gillies [mailto:cgillies at ualberta.ca] 
>> Sent: Sunday, December 03, 2006 6:31 PM
>> To: Prof Brian Ripley; John Fox
>> Cc: r-help at stat.math.ethz.ch
>> Subject: Re: [R] lmer and a response that is a proportion
>>
>> Dear Brian and John,
>>
>> Thanks for your insight.  I'll clarify a couple of things 
>> incase it changes your advice.
>>
>> My response is a ratio of two measures taken during a bird's 
>> path, which varies from 0  to 1, so I cannot convert it 
>> columns of the number of successes.  It has to be reported as 
>> the proportion.  I could logit transform it to make it 
>> normal, but I am trying to avoid that so I can analyze it directly.
>>
>> The subjects are individual birds and I have a range of 
>> sample sizes from each bird (from 8 to >200, average of about 
>> 75 measurements/bird).
>>
>> Thanks!
>> Cam
>>
>>
>> On 12/3/06 3:47 PM, "Prof Brian Ripley" <ripley at stats.ox.ac.uk> wrote:
>>
>>     
>>> On Sun, 3 Dec 2006, John Fox wrote:
>>>
>>>       
>>>> Dear Cameron,
>>>>
>>>>         
>>>>> -----Original Message-----
>>>>> From: r-help-bounces at stat.math.ethz.ch 
>>>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Cameron 
>>>>> Gillies
>>>>> Sent: Sunday, December 03, 2006 1:58 PM
>>>>> To: r-help at stat.math.ethz.ch
>>>>> Subject: [R] lmer and a response that is a proportion
>>>>>
>>>>> Greetings all,
>>>>>
>>>>> I am using lmer (lme4 package) to analyze data where the 
>>>>>           
>> response is 
>>     
>>>>> a proportion (0 to 1).  It appears to work, but I am wondering if 
>>>>> the analysis is treating the response appropriately - 
>>>>>           
>> i.e. can lmer 
>>     
>>>>> do this?
>>>>>
>>>>>           
>>>> As far as I know, you can specify the response as a proportion, in 
>>>> which case the binomial counts would be given via the weights 
>>>> argument -- at least that's how it's done in glm(). An alternative 
>>>> that should be equivalent is to specify a two-column matrix with 
>>>> counts of "successes" and "failures" as the response. 
>>>>         
>> Simply giving 
>>     
>>>> the proportion of successes without the counts wouldn't be 
>>>>         
>> appropriate.
>>     
>>>>> I have used both family=binomial and quasibinomial - is one more 
>>>>> appropriate when the response is a proportion?  The coefficient 
>>>>> estimates are identical, but the standard errors are larger with 
>>>>> family=binomial.
>>>>>
>>>>>           
>>>> The difference is that in the binomial family the 
>>>>         
>> dispersion is fixed 
>>     
>>>> to 1, while in the quasibinomial family it is estimated as a free 
>>>> parameter. If the standard errors are larger with family=binomial, 
>>>> then that suggests that the data are underdispersed 
>>>>         
>> (relative to the 
>>     
>>>> binomial); if the difference is substantial -- the factor 
>>>>         
>> is just the 
>>     
>>>> square root of the estimated dispersion -- then the 
>>>>         
>> binomial model is 
>>     
>>>> probably not appropriate for the data.
>>>>         
>>> John's last deduction is appropriate to a GLM, but not 
>>>       
>> necessarily to 
>>     
>>> a GLMM. I don't have detailed experience with lmer for 
>>>       
>> binomial, but I 
>>     
>>> do for various other fitting routines for GLMM.  Remember 
>>>       
>> there are at 
>>     
>>> least two sources of randomness in a GLMM, and let us keep 
>>>       
>> it simple 
>>     
>>> and have just a subject effect and a measurement error.  Then if 
>>> over-dispersion is happening within subjects, forcing the binomial 
>>> dispersion (at the measurement level) to 1 tends to increase the 
>>> estimate of the subject-level variance component to 
>>>       
>> compensate, and in 
>>     
>>> turn increase some of the standard errors.
>>>
>>> (Please note the 'tends' in that para, as the details of 
>>>       
>> the design do 
>>     
>>> matter.  For cognescenti, think about plot and sub-plot 
>>>       
>> treatments in 
>>     
>>> a split-plot design.)
>>>       
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>   


-- 
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat. 
Centre for Resource and Environmental Studies
The Australian National University              
Canberra ACT 0200                               
Australia                                       
T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au
F: +61 2 6125 0757
CRICOS Provider # 00120C

The combination of some data and an aching desire for 
an answer does not ensure that a reasonable answer 
can be extracted from a given body of data.
- John Tukey.




More information about the R-help mailing list