[R] testing independence of categorical variables

Petr PIKAL petr.pikal at precheza.cz
Fri Dec 7 08:46:20 CET 2007


Hi

Well, R does exactly what it says. From help page.

"Otherwise, x and y must be vectors or factors of the same length"

I do not know SAS but I presume that

> tables bloodtype*state

gives you something like

tab <- table(bloodtype, state)

and

chisq.test(tab)

shall give you the expected result. You can also do directly 
chisq.test(bloodtype, state). But what you cannot do is to test vectors 
unequal **lengths**, and that is what he did. I beleve that you can not do 
it in SAS either.
 
 x<-sample(letters[1:3], 10, replace=T)
 x
 [1] "c" "a" "c" "c" "a" "c" "a" "c" "a" "a"
 y<-sample(1:5, 20, replace=T)
> y
 [1] 2 5 1 1 2 5 2 3 1 5 5 5 1 5 5 3 2 2 5 1
> chisq.test(x,y)
Error in chisq.test(x, y) : 'x' and 'y' must have the same length
 x<-sample(letters[1:3], 20, replace=T)

> chisq.test(x,y)

        Pearson's Chi-squared test

data:  x and y 
X-squared = 4.7937, df = 6, p-value = 0.5705

Warning message:
In chisq.test(x, y) : Chi-squared approximation may be incorrect
>

Regards
Petr


r-help-bounces at r-project.org napsal dne 06.12.2007 23:09:24:

> 
> The chi-square does not need your two categorical variables to have 
equal
> levels, nor limitation for the number of levels.
> 
> The Chi-square procedure is as follow:
> χ^2=∑_(All Cells)▒〖(Observed-Expected)〗^2/Expected
> 
> Expected Cell= E_ij=n((i^th RowTotal)/n)((j^th RowTotal)/n)
> 
> Degree of Freedom=df= (row-1)(Col-1)
> 
> This way should not give you any errors if your calculations are all
> correct. I usually use SAS for calculations like this. Below is a sample
> code I wrote to test whether US_State and Blood type are independent. 
You
> can modify it for your data and should give you no error.
> 
> data bloodtype;
> input bloodtype$ state$ count@@;
> datalines;
> A FL 122 B FL 117
> AB FL 19 O FL 244
> A IA 1781 B IA 351
> AB IA 289 O IA 3301
> A MO 353 B MO 269
> AB MO 60 O MO 713
> ;
> proc freq data=bloodtype;
> tables bloodtype*state
> / cellchi2 chisq expected norow nocol nopercent;
> weight count;
> quit;
> 
> 
> Best
> Ramin
> Gainesville
> 
> 
> 
> Shoaaib Mehmood wrote:
> > 
> > hi,
> > 
> > is there a way of calculating of measuring dependence between two
> > categorical variables. i tried using the chi square test to test for
> > independence but i got error saying that the lengths of the two
> > vectors don't match. Suppose X and Y are two factors. X has 5 levels
> > and Y has 7 levels. This is what i tried doing
> > 
> >>temp<-chisq.test(x,y)
> > 
> > but got error "the lengths of the two vectors don't match". any help
> > will be appreciated
> > -- 
> > Regards,
> > Rana Shoaaib Mehmood
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> > 
> 
> -- 
> View this message in context: 
http://www.nabble.com/testing-independence-of-
> categorical-variables-tf4855773.html#a14202348
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list