[R] correlation between categorical data

Jason Morgan jwm-r-help at skepsi.net
Sat Jun 20 21:05:45 CEST 2009


On 2009.06.19 14:04:59, Michael wrote:
> Hi all,
> 
> In a data-frame, I have two columns of data that are categorical.
> 
> How do I form some sort of measure of correlation between these two columns?
> 
> For numerical data, I just need to regress one to the other, or do
> some pairs plot.
> 
> But for categorical data, how do I find and/or visualize correlation
> between the two columns of data?

As Dylan mentioned, using crosstabs may be the easiest way. Also, a
simple correlation between the two variables may be informative. If
each variable is ordinal, you can use Kendall's tau-b (square table)
or tau-c (rectangular table). The former you can calculate with ?cor
(set method="kendall"), the latter you may have to hack something
together yourself, there is code on the Internet to do this. If the
data are nominal, then a simple chi-squared test (large-n) or Fisher's
exact test (small-n) may be more appropriate. There are rules about
which to use when one variable is ordinal and one is nominal, but I
don't have my notes in front of me. Maybe someone else can provide
more assistance (and correct me if I'm wrong :).

Cheers,

~Jason


-- 
Jason W. Morgan
Graduate Student
Department of Political Science
*The Ohio State University*
154 North Oval Mall
Columbus, Ohio 43210




More information about the R-help mailing list