[R] correlation between categorical data

Sat Jan 24 21:08:49 CET 2015

On Jan 23, 2015, at 5:54 PM, JohnDee wrote:

> Heinz Tuechler wrote
>> At 07:40 21.06.2009, J Dougherty wrote:
>> 
>> [...]
>>> There are other ways of regarding the FET.  Since it is precisely 
>>> what it says
>>> - an exact test - you can argue that you should avoid carrying over any
>>> conclusions drawn about the small population the test was applied to and
>>> employing them in a broader context.  In so far as the test is concerned,
> the
>>> "sample" data and the contingency table it is arrayed in are the entire
>>> universe.  In that sense, the FET can't be "conservative" or "liberal." 
> It
>>> isn't actually a hypothesis test and should not be thought of as one or
> used
>>> in the place of one.
>>>> 
>>> JDougherty
>> 
>> Could you give some reference, supporting this, for me, surprising 
>> view? I don't see a necessary connection between an exact test and 
>> the idea that it does not test a hypothesis.
>> 
>> Thanks,
>> Heinz
>> 
> 

> Fisher's Exact Test is a nonparametric "test."  It tests the distribution in
> the contingency table against the total possible arrangements and gives you
> the precise likelihood of that many items being arranged in that manner.

That's not the way I understand the construction of the result. The statistic gives rather the ratio of the number of permutations as extreme or more extreme (as measured by the odds ratio) while holding the marginals constant which is then divided by the total number of possible permutations of the data.

>  No
> more and no less.  You could argue about the greater population from which
> your sample is drawn, but FET makes no assumptions at all about any greater
> sample universe.

It is conditional on the margins, so that is the description of the "universe".

>  Also, since the "population" being used in FET is strictly
> limited to the members of the contingency table, the results are a subset of
> a finite group of possible results that are relevant to that specific
> arrangement of data.  You are not "estimating" parameters of a parent
> population or making any assumptions about the parent distribution.  You can
> designate a "p" value such as 0.05 as a level of significance, but there is
> no "error" term in the FET result.  Fisher stated that the test DOES assume
> a null hypothesis of independence to a hypergeometric distribution of the
> cell members.  But that creates other issues if you are attempting to use
> the results in conjunction with assumptions about a broader sample universe
> than that in the test.  For instance you have to carry the assumption of a
> hypergeometric distribution over in to the land of reality your sample is
> drawn from and you then have to justify that.  
> 

And this is off-topic on Rhelp .....
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/correlation-between-categorical-data-tp888975p4702235.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA