[R] Question on binomial data

David Winsemius dwinsemius at comcast.net
Wed Apr 22 01:04:44 CEST 2009


Surely Faraway does not suggest using the Wald statistic in preference  
to the deviance?

Even if the distribution of deviance is not exactly chi-square, it  
appears generally accepted that a comparison of the difference in  
deviance to the chi-square statistic is better than using the ratio of  
the beta to se(beta) which is what that "Pr(>|z|)" number is.

Your permutation results look sensible and could conceivably be  
considered the gold standard.

-- 
David


On Apr 21, 2009, at 5:31 PM, ehud cohen wrote:

> I thought of testing the difference in deviance between the null model
> and the fitted model, assuming it is distributed as chi-sq. However,
> Faraway writes that if the outcome is binary, the deviance
> distribution is far from chisq.
> I've done a permutation test:
>
> N<-5000; # Towards the upper limit, as there are only 17 over 5 =
> 6,188 combination of the T/F data I have..
> dev<-rep(0,N);
> for (i in 1:N) {
> 	l1<-glm(sample(p)~w,family=binomial);
> 	dev[i]<-l1$dev;
> }
> print(mean(dev<l$dev))
>
> and the outcome is 0.005 - which is close to the ttest.
>
> I've repeated the same with calculating the statistics on the z-value
> in summary(l1) each time instead of the deviance, and got a comparable
> result.
>
> I think it means that David is right, the Pr(>|z|) in glm output does
> not mean much. I still don't know what does it mean.
>
> Regarding your suggestion of using car's Anova:
>
>> Anova(l)
> Anova Table (Type II tests)
>
> Response: p
>  LR Chisq Df Pr(>Chisq)
> w   9.4008  1   0.002169 **
>
> which is identical to:
>
> pchisq(l$null.deviance-l$dev,1,lower=F)
>
> which seems to be too low - which is probably due to the binary  
> response.
>
> would you think the permutation method is appropriate to use in this
> case? and extended also to a case with several covariates?
>
>
>
> On Tue, Apr 21, 2009 at 10:34 PM,  <markleeds at verizon.net> wrote:
>> hi: i would wait for one of the guRus to say something but my take  
>> ( take it
>> with a grain of salt ) is that the results
>> are not so contradictory. the test of the significance of the  
>> coefficient in
>> the GLM is 0.06. and the test that the
>> means are difference gives a pv-pvalue of 0.004.  a couple of  
>> reasons why
>> this might not be so contradictory:
>>
>> A) the test gives greater significance in the t-test case but it's  
>> not
>> really testing the same thing. the t-test is only testing that
>> the means are different. the glm is testing is that log odds of  
>> the  means
>> of the two events ( pass and fail ) are linearly related to
>> a covariate.
>>
>> b) your t-test is a little weird because it's only got  sample of  
>> five in
>> one of the 2 samples and I'm not clear on whether it's assuming equal
>> variances and then pooling ( I think there's a pooled = TRUE option  
>> for
>> t.test  but I don't know the default value ).
>> definitely that's not a large sample size regardless of the pooling  
>> issue.
>>
>> c) when you test the significance in a glm you need to compare the  
>> deviance
>> of the model to the deviance of the nested null model.
>> John Fox's book desacribes this but I don't think it's the same as  
>> looking
>> as the significance in the table output of glm. that's
>> a wald test and not the same as the deviance comparison  
>> ( essentially a
>> likelihood ratio test i think ). with small sample sizes, i think  
>> these
>> differences between these various test can be large. check out john  
>> fox's
>> text for a nice description of testing in the generalized linear  
>> model
>> framework. you can use Anova from his car package to do this.
>>
>> hopefully someone else wil say something though because i'd be  
>> curious to
>> see where i'm wrong/right or something new.
>> good luck.
>>
>>
>>
>>
>>
>>
>>
>> On Apr 21, 2009, ehud cohen <ehudco.list at gmail.com> wrote:
>>
>> Hi,
>>
>> We have an experiment with pass/fail outcome, and a continuous
>> parameter which may contribute to the outcome.
>>
>> First, we've analyzed it by:
>>
>> p=c(F,T,F,F,F,T,T,T,T,T,T,T,F,T,T,T,T);
>> w=c(53,67,59,59,53,89,72,56,65,63,62,58,59,72,61,68,63);
>> l<-glm(p~w,family=binomial)
>> summary(l)
>>
>> Which turned out to be non significant.
>>
>> Then, we thought of comparing the parameters of the two groups  
>> (passed
>> vs. failed)
>>
>> t.test(w[which(p)],w[which(!p)],alternative="two.sided")
>>
>> which turned highly significant.
>>
>> I'd appreciate some insight...
>>
>> Thanks, Ehud.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list