[R] Question on binomial data

Tue Apr 21 23:31:45 CEST 2009

I thought of testing the difference in deviance between the null model
and the fitted model, assuming it is distributed as chi-sq. However,
Faraway writes that if the outcome is binary, the deviance
distribution is far from chisq.
I've done a permutation test:

N<-5000; # Towards the upper limit, as there are only 17 over 5 =
6,188 combination of the T/F data I have..
dev<-rep(0,N);
for (i in 1:N) {
	l1<-glm(sample(p)~w,family=binomial);
	dev[i]<-l1$dev;
}
print(mean(dev<l$dev))

and the outcome is 0.005 - which is close to the ttest.

I've repeated the same with calculating the statistics on the z-value
in summary(l1) each time instead of the deviance, and got a comparable
result.

I think it means that David is right, the Pr(>|z|) in glm output does
not mean much. I still don't know what does it mean.

Regarding your suggestion of using car's Anova:

> Anova(l)
Anova Table (Type II tests)

Response: p
  LR Chisq Df Pr(>Chisq)
w   9.4008  1   0.002169 **

which is identical to:

pchisq(l$null.deviance-l$dev,1,lower=F)

which seems to be too low - which is probably due to the binary response.

would you think the permutation method is appropriate to use in this
case? and extended also to a case with several covariates?

On Tue, Apr 21, 2009 at 10:34 PM,  <markleeds at verizon.net> wrote:
> hi: i would wait for one of the guRus to say something but my take ( take it
> with a grain of salt ) is that the results
> are not so contradictory. the test of the significance of the coefficient in
> the GLM is 0.06. and the test that the
> means are difference gives a pv-pvalue of 0.004.  a couple of reasons why
> this might not be so contradictory:
>
> A) the test gives greater significance in the t-test case but it's not
> really testing the same thing. the t-test is only testing that
> the means are different. the glm is testing is that log odds of the  means
> of the two events ( pass and fail ) are linearly related to
> a covariate.
>
> b) your t-test is a little weird because it's only got  sample of five in
> one of the 2 samples and I'm not clear on whether it's assuming equal
> variances and then pooling ( I think there's a pooled = TRUE option for
> t.test  but I don't know the default value ).
> definitely that's not a large sample size regardless of the pooling issue.
>
> c) when you test the significance in a glm you need to compare the deviance
> of the model to the deviance of the nested null model.
> John Fox's book desacribes this but I don't think it's the same as looking
> as the significance in the table output of glm. that's
> a wald test and not the same as the deviance comparison ( essentially a
> likelihood ratio test i think ). with small sample sizes, i think these
> differences between these various test can be large. check out john fox's
> text for a nice description of testing in the generalized linear model
> framework. you can use Anova from his car package to do this.
>
> hopefully someone else wil say something though because i'd be curious to
> see where i'm wrong/right or something new.
> good luck.
>
>
>
>
>
>
>
> On Apr 21, 2009, ehud cohen <ehudco.list at gmail.com> wrote:
>
> Hi,
>
> We have an experiment with pass/fail outcome, and a continuous
> parameter which may contribute to the outcome.
>
> First, we've analyzed it by:
>
> p=c(F,T,F,F,F,T,T,T,T,T,T,T,F,T,T,T,T);
> w=c(53,67,59,59,53,89,72,56,65,63,62,58,59,72,61,68,63);
> l<-glm(p~w,family=binomial)
> summary(l)
>
> Which turned out to be non significant.
>
> Then, we thought of comparing the parameters of the two groups (passed
> vs. failed)
>
> t.test(w[which(p)],w[which(!p)],alternative="two.sided")
>
> which turned highly significant.
>
> I'd appreciate some insight...
>
> Thanks, Ehud.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>