[R] correlation between categorical data
Jason Morgan
jwm-r-help at skepsi.net
Sat Jun 20 21:05:45 CEST 2009
On 2009.06.19 14:04:59, Michael wrote:
> Hi all,
>
> In a data-frame, I have two columns of data that are categorical.
>
> How do I form some sort of measure of correlation between these two columns?
>
> For numerical data, I just need to regress one to the other, or do
> some pairs plot.
>
> But for categorical data, how do I find and/or visualize correlation
> between the two columns of data?
As Dylan mentioned, using crosstabs may be the easiest way. Also, a
simple correlation between the two variables may be informative. If
each variable is ordinal, you can use Kendall's tau-b (square table)
or tau-c (rectangular table). The former you can calculate with ?cor
(set method="kendall"), the latter you may have to hack something
together yourself, there is code on the Internet to do this. If the
data are nominal, then a simple chi-squared test (large-n) or Fisher's
exact test (small-n) may be more appropriate. There are rules about
which to use when one variable is ordinal and one is nominal, but I
don't have my notes in front of me. Maybe someone else can provide
more assistance (and correct me if I'm wrong :).
Cheers,
~Jason
--
Jason W. Morgan
Graduate Student
Department of Political Science
*The Ohio State University*
154 North Oval Mall
Columbus, Ohio 43210
More information about the R-help
mailing list