[R] Question on binomial data
ehud cohen
ehudco.list at gmail.com
Tue Apr 21 23:31:45 CEST 2009
I thought of testing the difference in deviance between the null model
and the fitted model, assuming it is distributed as chi-sq. However,
Faraway writes that if the outcome is binary, the deviance
distribution is far from chisq.
I've done a permutation test:
N<-5000; # Towards the upper limit, as there are only 17 over 5 =
6,188 combination of the T/F data I have..
for (i in 1:N) {
and the outcome is 0.005 - which is close to the ttest.
I've repeated the same with calculating the statistics on the z-value
in summary(l1) each time instead of the deviance, and got a comparable
I think it means that David is right, the Pr(>|z|) in glm output does
not mean much. I still don't know what does it mean.
Regarding your suggestion of using car's Anova:
> Anova(l)
Anova Table (Type II tests)
Response: p
LR Chisq Df Pr(>Chisq)
w 9.4008 1 0.002169 **
which is identical to:
which seems to be too low - which is probably due to the binary response.
would you think the permutation method is appropriate to use in this
case? and extended also to a case with several covariates?
On Tue, Apr 21, 2009 at 10:34 PM, <markleeds at verizon.net> wrote:
> hi: i would wait for one of the guRus to say something but my take ( take it
> with a grain of salt ) is that the results
> are not so contradictory. the test of the significance of the coefficient in
> the GLM is 0.06. and the test that the
> means are difference gives a pv-pvalue of 0.004. a couple of reasons why
> this might not be so contradictory:
> A) the test gives greater significance in the t-test case but it's not
> really testing the same thing. the t-test is only testing that
> the means are different. the glm is testing is that log odds of the means
> of the two events ( pass and fail ) are linearly related to
> a covariate.
> b) your t-test is a little weird because it's only got sample of five in
> one of the 2 samples and I'm not clear on whether it's assuming equal
> variances and then pooling ( I think there's a pooled = TRUE option for
> t.test but I don't know the default value ).
> definitely that's not a large sample size regardless of the pooling issue.
> c) when you test the significance in a glm you need to compare the deviance
> of the model to the deviance of the nested null model.
> John Fox's book desacribes this but I don't think it's the same as looking
> as the significance in the table output of glm. that's
> a wald test and not the same as the deviance comparison ( essentially a
> likelihood ratio test i think ). with small sample sizes, i think these
> differences between these various test can be large. check out john fox's
> text for a nice description of testing in the generalized linear model
> framework. you can use Anova from his car package to do this.
> hopefully someone else wil say something though because i'd be curious to
> see where i'm wrong/right or something new.
> good luck.
> On Apr 21, 2009, ehud cohen <ehudco.list at gmail.com> wrote:
> Hi,
> We have an experiment with pass/fail outcome, and a continuous
> parameter which may contribute to the outcome.
> First, we've analyzed it by:
> p=c(F,T,F,F,F,T,T,T,T,T,T,T,F,T,T,T,T);
> w=c(53,67,59,59,53,89,72,56,65,63,62,58,59,72,61,68,63);
> l<-glm(p~w,family=binomial)
> summary(l)
> Which turned out to be non significant.
> Then, we thought of comparing the parameters of the two groups (passed
> vs. failed)
> t.test(w[which(p)],w[which(!p)],alternative="two.sided")
> which turned highly significant.
> I'd appreciate some insight...
> Thanks, Ehud.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
