# [R] Is it safe? Cochran etc

Frederico Zanqueta Poleto fred-l at poleto.com
Sat Oct 9 20:56:00 CEST 2004

```Dan,

I don't know what is the theory behind this "hybrid" option and what
consists the Cochran conditions.

However, I think even if you suppose the asymptotic distribution is not
too accurate, because your sampled 1, there is a too strong association
of A and B, as this can be noticed by conservative methods such as using
the Yates continuity correction or Wald/Neyman tests (that usually does
not reject the null hypothesis of no interaction much more than the
Pearson/score test and likelihood ratio test, in this order) of the log
odds.
Both procedures inflate the pvalues, but not sufficiently to change your
conclusion as you can notice by:

> chisq.test(dat,correct=FALSE)

Pearson's Chi-squared test

data:  dat

X-squared = 6.0115, df = 1, p-value = 0.01421

> chisq.test(dat)

Pearson's Chi-squared test with Yates' continuity correction

data:  dat

X-squared = 5.1584, df = 1, p-value = 0.02313

> 1-pchisq( (log(878702/(13714*506))^2)/(1+1/878702+1/13714+1/506) ,1)
# Wald test of null log odds

[1] 0.03898049

The book "Categorical data analysis" from Agresti (2002) has an ample
discussion about tests like this on chapters 1 (basics and one sample)
and 3 (two variables). You may look there if you still have doubts about
this tests.

Sincerely,

--
Frederico Zanqueta Poleto
fred at poleto.com
--
"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." J. W. Tukey

Dan Bolser wrote:

>Why can't I just use Log odds? Does the standard error of the logs score
>depend on a similar chisq assumption?
>
>
>
>On Sat, 9 Oct 2004, Dan Bolser wrote:
>
>
>
>>I have the following contingency table
>>
>>dat <- matrix(c(1,506,13714,878702),nr=2)
>>
>>And I want to test if their is an association between events
>>
>>A:{a,not(a)} and B:{b,not(b)}
>>
>>       | b   | not(b) |
>>--------+-----+--------+
>>a      |   1 |  13714 |
>>--------+-----+--------+
>>not(a) | 506 | 878702 |
>>--------+-----+--------+
>>
>>I am worried that prop.test and chisq.test are not valid given the low
>>counts and low probabilites associated with 'sucess' in each category.
>>
>>Is it safe to use them, and what is the alternative? (given that
>>fisher.test can't handle this data... hold the phone...
>>
>>I just found fisher.test can handle this data if the test is one-tailed
>>and not two-tailed.
>>
>>I don't understand the difference between chisq.test, prop.test and
>>fisher.test when the hybrid=1 option is used for the fisher.test.
>>
>>I was using the binomial distribution to test the 'extremity' of the
>>observed data, but now I think I know why that is inapropriate, however,
>>with the binomial (and its approximation) at least I know what I am
>>doing. And I can do it in perl easily...
>>
>>Generally, how should I calculate fisher.test in perl (i.e. what are its
>>principles). When is it safe to approximate fisher to chisq?
>>
>>I cannot get insight into this problem...
>>
>>How come if I do...
>>
>>dat <- matrix(c(50,60,100,100),nr=2)
>>
>>prop.test(dat)\$p.value
>>chisq.test(dat)\$p.value
>>fisher.test(dat)\$p.value
>>
>>I get
>>
>>[1] 0.5173269
>>[1] 0.5173269
>>[1] 0.4771358
>>
>>When I looked at the binomial distribution and the normal approximation
>>thereof with similar counts I never had a p-value difference > 0.004
>>
>>I am so fed up with this problem :(
>>
>>
>>
>
>

```