[R] logistic regression or not?

Tue Dec 21 18:26:29 CET 2010

On 10-12-21 12:20 PM, array chip wrote:
> Thank you Ben, Steve and Peter.
>  
> Ben, my last question was to see if there are other ways of analyzing
> this type of data where the response variable is a proportion, in
> addition to binomial regression.
>  
> BTW, I also found the following is also an equivalent model directly
> using percentage:
>  
> glm(log(percentage/(1-percentage))~treatment,data=test)
>  
> Thanks
>  
> John
> 

  Yes, but this is a different model.

  The model you have here uses Gaussian errors (it is in fact an
identical model, although not necessarily quite an identical algorithm
(?), to just using lm().  It will fail if you have any percentages that
are 0 or 1.  See Stuart's comment about how things were done in the "old
days".

  Beta regression (see e.g. the betareg package) is another way of
handling analysis of proportions.

> 
> ------------------------------------------------------------------------
> *From:* Ben Bolker <bbolker at gmail.com>
> *To:* r-help at stat.math.ethz.ch
> *Sent:* Tue, December 21, 2010 5:08:34 AM
> *Subject:* Re: [R] logistic regression or not?
> 
> array chip <arrayprofile <at> yahoo.com <http://yahoo.com/>> writes:
> 
> [snip]
> 
>> I can think of analyzing this data using glm() with the attached dataset:
>>
>> test<-read.table('test.txt',sep='\t')
>> fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
>> summary(fit)
>> anova(fit, test='Chisq')
> 
>> First, is this still called logistic regression or something else? I
> thought
>> with logistic regression, the response variable is a binary factor?
> 
>   Sometimes I've seen it called "binomial regression", or just
> "a binomial generalized linear model"
> 
>> Second, then summary(fit) and anova(fit, test='Chisq') gave me
> different p
>> values, why is that? which one should I use?
> 
>   summary(fit) gives you p-values from a Wald test.
>   anova() gives you tests based on the Likelihood Ratio Test.
>   In general the LRT is more accurate.
> 
>> Third, is there an equivalent model where I can use variable "percentage"
>> instead of "positive" & "total"?
> 
>   glm(percentage~treatment,weights=total,data=tests,family=binomial)
> 
> is equivalent to the model you fitted above.
>>
>> Finally, what is the best way to analyze this kind of dataset
>> where it's almost the same as ANOVA except that the response variable
>>  is a proportion (or success and failure)?
> 
>   Don't quite know what you mean here.  How is the situation "almost
> the same as ANOVA" different from the situation you described above?
> Do you mean when there are multiple factors? or ???
> 
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>