[R] logistic regression or not?

S Ellison S.Ellison at lgc.co.uk
Tue Dec 21 14:22:10 CET 2010


A possible caveat here.

Traditionally, logistic regression was performed on the
logit-transformed proportions, with the standard errors based on the
residuals for the resulting linear fit. This accommodates overdispersion
naturally, but without telling you that you have any.

glm with a binomial family does not allow for overdispoersion unless
you use the quasibinomial family. If you have overdispersion, standard
errors from glm will be unrealistically small. Make sure your model fits
in glm before you believe the standard errors, or use the quasibionomial
family.

Steve Ellison
LGC



>>> Ben Bolker <bbolker at gmail.com> 21/12/2010 13:08:34 >>>
array chip <arrayprofile <at> yahoo.com> writes:

[snip]

> I can think of analyzing this data using glm() with the attached
dataset:
> 
> test<-read.table('test.txt',sep='\t')
>
fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
> summary(fit)
> anova(fit, test='Chisq')
 
> First, is this still called logistic regression or something else? I
thought 
> with logistic regression, the response variable is a binary factor?

  Sometimes I've seen it called "binomial regression", or just 
"a binomial generalized linear model"

> Second, then summary(fit) and anova(fit, test='Chisq') gave me
different p 
> values, why is that? which one should I use?

  summary(fit) gives you p-values from a Wald test.
  anova() gives you tests based on the Likelihood Ratio Test.
  In general the LRT is more accurate.

> Third, is there an equivalent model where I can use variable
"percentage" 
> instead of "positive" & "total"?

  glm(percentage~treatment,weights=total,data=tests,family=binomial)

 is equivalent to the model you fitted above.
> 
> Finally, what is the best way to analyze this kind of dataset 
> where it's almost the same as ANOVA except that the response
variable
>  is a proportion (or success and failure)?

  Don't quite know what you mean here.  How is the situation "almost
the same as ANOVA" different from the situation you described above?
Do you mean when there are multiple factors? or ???

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}



More information about the R-help mailing list