[R] logistic regression or not?

Tue Dec 21 17:33:14 CET 2010

On Dec 21, 2010, at 14:22 , S Ellison wrote:

> A possible caveat here.
> 
> Traditionally, logistic regression was performed on the
> logit-transformed proportions, with the standard errors based on the
> residuals for the resulting linear fit. This accommodates overdispersion
> naturally, but without telling you that you have any.
> 
> glm with a binomial family does not allow for overdispoersion unless
> you use the quasibinomial family. If you have overdispersion, standard
> errors from glm will be unrealistically small. Make sure your model fits
> in glm before you believe the standard errors, or use the quasibionomial
> family.

...and before you believe in overdispersion, make sure you have a credible explanation for it. All too often, what you really have is a model that doesn't fit your data properly.

> 
> Steve Ellison
> LGC
> 
> 
> 
>>>> Ben Bolker <bbolker at gmail.com> 21/12/2010 13:08:34 >>>
> array chip <arrayprofile <at> yahoo.com> writes:
> 
> [snip]
> 
>> I can think of analyzing this data using glm() with the attached
> dataset:
>> 
>> test<-read.table('test.txt',sep='\t')
>> 
> fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
>> summary(fit)
>> anova(fit, test='Chisq')
> 
>> First, is this still called logistic regression or something else? I
> thought 
>> with logistic regression, the response variable is a binary factor?
> 
>  Sometimes I've seen it called "binomial regression", or just 
> "a binomial generalized linear model"
> 
>> Second, then summary(fit) and anova(fit, test='Chisq') gave me
> different p 
>> values, why is that? which one should I use?
> 
>  summary(fit) gives you p-values from a Wald test.
>  anova() gives you tests based on the Likelihood Ratio Test.
>  In general the LRT is more accurate.
> 
>> Third, is there an equivalent model where I can use variable
> "percentage" 
>> instead of "positive" & "total"?
> 
>  glm(percentage~treatment,weights=total,data=tests,family=binomial)
> 
> is equivalent to the model you fitted above.
>> 
>> Finally, what is the best way to analyze this kind of dataset 
>> where it's almost the same as ANOVA except that the response
> variable
>> is a proportion (or success and failure)?
> 
>  Don't quite know what you mean here.  How is the situation "almost
> the same as ANOVA" different from the situation you described above?
> Do you mean when there are multiple factors? or ???
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
> 
> *******************************************************************
> This email and any attachments are confidential. Any use...{{dropped:8}}
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com