[R] Is it safe? Cochran etc
Kjetil Brinchmann Halvorsen
kjetil at acelerate.com
Sat Oct 9 21:16:58 CEST 2004
Dan Bolser wrote:
>I have the following contingency table
>
>dat <- matrix(c(1,506,13714,878702),nr=2)
>
>And I want to test if their is an association between events
>
>A:{a,not(a)} and B:{b,not(b)}
>
> | b | not(b) |
>--------+-----+--------+
> a | 1 | 13714 |
>--------+-----+--------+
> not(a) | 506 | 878702 |
>--------+-----+--------+
>
>I am worried that prop.test and chisq.test are not valid given the low
>counts and low probabilites associated with 'sucess' in each category.
>
>
> test <- matrix( c(1,506, 13714, 878702), 2,2)
> test
[,1] [,2]
[1,] 1 13714
[2,] 506 878702
> chisq.test(test)
Pearson's Chi-squared test with Yates' continuity correction
data: test
X-squared = 5.1584, df = 1, p-value = 0.02313
> chisq.test(test,sim=TRUE)
Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)
data: test
X-squared = 6.0115, df = NA, p-value = 0.02099
> chisq.test(test,sim=TRUE, B=200000)
# To rule out simulation uncertainty
Pearson's Chi-squared test with simulated p-value (based on 200000
replicates)
data: test
X-squared = 6.0115, df = NA, p-value = 0.01634
> N <- sum(test)
> rows <- rowSums(test)
> cols <- colSums(test)
> E <- rows %o% cols/N
> E
[,1] [,2]
[1,] 7.787351 13707.21
[2,] 499.212649 878708.79
>
None of the expected'eds are lesser than 5, an often used rule of thumb
(which might even be
to conservative. Just to check on the distribution:
rows <- round(rows)
cols <- round(cols)
> pvals <- sapply(r2dtable(100000, rows, cols), function(x)
chisq.test(x)$p.value)
> hist(pvals)
# not very good approximation to uniform histogram, but:
> sum(pvals < 0.05)/100000
[1] 0.03068
> sum(pvals < 0.01)/100000
[1] 0.00669
So the true levels are not very far from the calculated by te chisq
approximation, and it seems safe to use it.
All of this to show that with R you are not anymore dependent on old
"rules os thumb",
you can investigate for yourself.
Kjetil
>Is it safe to use them, and what is the alternative? (given that
>fisher.test can't handle this data... hold the phone...
>
>I just found fisher.test can handle this data if the test is one-tailed
>and not two-tailed.
>
>I don't understand the difference between chisq.test, prop.test and
>fisher.test when the hybrid=1 option is used for the fisher.test.
>
>I was using the binomial distribution to test the 'extremity' of the
>observed data, but now I think I know why that is inapropriate, however,
>with the binomial (and its approximation) at least I know what I am
>doing. And I can do it in perl easily...
>
>Generally, how should I calculate fisher.test in perl (i.e. what are its
>principles). When is it safe to approximate fisher to chisq?
>
>I cannot get insight into this problem...
>
>How come if I do...
>
>dat <- matrix(c(50,60,100,100),nr=2)
>
>prop.test(dat)$p.value
>chisq.test(dat)$p.value
>fisher.test(dat)$p.value
>
>I get
>
>[1] 0.5173269
>[1] 0.5173269
>[1] 0.4771358
>
>When I looked at the binomial distribution and the normal approximation
>thereof with similar counts I never had a p-value difference > 0.004
>
>I am so fed up with this problem :(
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
>
>
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
-- Mahdi Elmandjra
More information about the R-help
mailing list