[R] logistic regression or not?
peter dalgaard
pdalgd at gmail.com
Tue Dec 21 17:33:14 CET 2010
On Dec 21, 2010, at 14:22 , S Ellison wrote:
> A possible caveat here.
>
> Traditionally, logistic regression was performed on the
> logit-transformed proportions, with the standard errors based on the
> residuals for the resulting linear fit. This accommodates overdispersion
> naturally, but without telling you that you have any.
>
> glm with a binomial family does not allow for overdispoersion unless
> you use the quasibinomial family. If you have overdispersion, standard
> errors from glm will be unrealistically small. Make sure your model fits
> in glm before you believe the standard errors, or use the quasibionomial
> family.
...and before you believe in overdispersion, make sure you have a credible explanation for it. All too often, what you really have is a model that doesn't fit your data properly.
>
> Steve Ellison
> LGC
>
>
>
>>>> Ben Bolker <bbolker at gmail.com> 21/12/2010 13:08:34 >>>
> array chip <arrayprofile <at> yahoo.com> writes:
>
> [snip]
>
>> I can think of analyzing this data using glm() with the attached
> dataset:
>>
>> test<-read.table('test.txt',sep='\t')
>>
> fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
>> summary(fit)
>> anova(fit, test='Chisq')
>
>> First, is this still called logistic regression or something else? I
> thought
>> with logistic regression, the response variable is a binary factor?
>
> Sometimes I've seen it called "binomial regression", or just
> "a binomial generalized linear model"
>
>> Second, then summary(fit) and anova(fit, test='Chisq') gave me
> different p
>> values, why is that? which one should I use?
>
> summary(fit) gives you p-values from a Wald test.
> anova() gives you tests based on the Likelihood Ratio Test.
> In general the LRT is more accurate.
>
>> Third, is there an equivalent model where I can use variable
> "percentage"
>> instead of "positive" & "total"?
>
> glm(percentage~treatment,weights=total,data=tests,family=binomial)
>
> is equivalent to the model you fitted above.
>>
>> Finally, what is the best way to analyze this kind of dataset
>> where it's almost the same as ANOVA except that the response
> variable
>> is a proportion (or success and failure)?
>
> Don't quite know what you mean here. How is the situation "almost
> the same as ANOVA" different from the situation you described above?
> Do you mean when there are multiple factors? or ???
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> *******************************************************************
> This email and any attachments are confidential. Any use...{{dropped:8}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list