[R] fisher.test - can I use non-integer expected values?

Wed Dec 11 13:33:06 CET 2013

On 11 Dec 2013, at 06:37 , Peter Langfelder <peter.langfelder at gmail.com> wrote:

>> 
>> Expected values are needed to test a null hypothesis against observed
>> counts, but if total observed counts are 20 for 3 categories, then a null
>> hypothesis of a random effect would use expected values = 6.67 in each of
>> the 3 categories (20/3).
>> 
>> Yes, fisher.test is for count data and so is chisq.test, but chisq.test
>> allows 6.67 to be input as expected values in each of 3 categories, while
>> fisher.test does not seem to allow this?
> 
> To the best of my knowledge (which may be limited) you never put
> expected counts as input in Fisher Exact Test, you need to put actual
> observed counts. Fisher test tests the independence of two different
> random variables, each of which has a set of categorical outcomes.

> From what you wrote it appears that you have only one random variable
> that can take 3 different values, and you want a statistical test for
> whether the frequencies are the same. You can use chisq.test for this
> by specifying the probabilities (argument p) and running it as a
> goodness-of-fit test. I am not aware of goodness-of-fit way of using
> fisher.test.

A couple of additional notes: 

(a) If you think you can feed expected values like 6.67 to chisq.test anywhere, I think you are doing it wrong. It might give you an answer, but not likely a correct one.

(b) There is an exact test for equidistribution or goodness of fit in general, but that is not what fisher.test does. You can "cheat" and get an approximation by claiming that you are comparing your data to a much larger set of equidistributed data, e.g.:

> fisher.test(cbind(c(1,10,9),c(10000,10000,10000)))

	Fisher's Exact Test for Count Data

data:  cbind(c(1, 10, 9), c(10000, 10000, 10000))
p-value = 0.01465
alternative hypothesis: two.sided

(c) It's not massively hard to generate the ~200 configurations of 20 items into 3 groups and use that to calculate the exact test exactly:

tab <- outer(0:20,0:20,
	Vectorize(function(i,j)
	  if (i+j <= 20)
              dmultinom(c(i, j, 20 - i - j), p=c(1, 1, 1)/3)
          else 0
	))
pp <- dmultinom(c(1, 10, 9), p=c(1, 1, 1)/3)
sum(tab[tab<=pp])

## [1] 0.01468422

(d) Another option is to use the simulate.p.value option to chisq.test():

> chisq.test(c(1, 10, 9), simulate=TRUE, B=10000)

	Chi-squared test for given probabilities with simulated p-value (based
	on 10000 replicates)

data:  c(1, 10, 9)
X-squared = 7.3, df = NA, p-value = 0.0252

(The p-values _will_ differ because chi-square critical regions are slightly different from those based on the point probabilities.)

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com