[R] How to Get Categorical Correlation Coefficient
Kum-Hoe Hwang
phdhwang at gmail.com
Thu Oct 12 12:08:56 CEST 2006
There was my mistake in the earlier email.
I have corrected the error by dropping "ns.omit" from data.frame().
I added a new corrected correlation and output followings:
------------------------------------------------------------------------------
#
> nrow(sdi)
[1] 65613
> print(corridor1[65600:65613])
[1] C C C C F
[6] F F F B B
[11] F F B B
Levels: B C D E A F
> print(corridor2[65600:65613])
[1] 4 4 4 4 2 2 2 2 1 1 2 2 1 1
> summary(corridor1)
B C D E
A F
15092 13456 6652 1611 1796 27006
> summary(corridor2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 1.0 2.0 2.3 3.0 5.0
> summary(as.numeric(as.factor(corridor1))-as.numeric(as.factor(corridor1)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 0 0 0 0 0
> table(corridor1,corridor2)
corridor2
corridor1 0 1 2 3 4 5
B 0 15092 0 0 0 0
C 0 0 0 0 13456 0
D 0 0 0 6652 0 0
E 0 0 0 0 0 1611
A 1796 0 0 0 0 0
F 0 0 27006 0 0 0
>
---------------------------------------------------------------------------------------
There are different correlation coefficients from the following results:
Are there any functions or packages for a categorical correlation?
> cor(jh1_1, corridor1)
[1] 0.02753303
> cor(jh1_1, as.factor(corridor2))
[1] -0.3682788
Thanks for your kindness,
Kum
On 12 Oct 2006 10:25:33 +0200, Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
> "Kum-Hoe Hwang" <phdhwang at gmail.com> writes:
>
> > Howdy Gurus !
> >
> > I have a different correlation result from the same data. The
> > "corridor1" string variable is expressed
> > as a number like the "corridor2" number variable.
> > --------------------------------------------------------------------------
> > > levels(corridor1)
> > [1] "A" "B" "C" "D" "E" "F"
> > > levels(as.factor(corridor2))
> > [1] "0" "1" "2" "3" "4"
> > >
> > ------------------------------------------------------------------------------------------
> > I have the correlation results followings using cor() function.
> > ------------------------------------------------------------------------------------------
> > > cor(jh1_1, as.factor(corridor1))
> > [1] 0.01528538
> > > cor(jh1_1, as.factor(corridor2))
> > [1] -0.4972571
> > ------------------------------------------------------------------------------------------
> > I donot know why the above correlation coefficients used the same data
> > are different.
> > They are 0.015 from as.factor(corridor1), -0.497 from as,factor(corridor2).
> > The string variable "corridor1" is the same catergory data with the
> > variable corridor2.
> > The difference is that "A" is replaced with "0", "B" with "1", "C"
> > with "2", .....
> >
> > Could you tell me why they are different, and which correlation
> > coefficient is correct?
>
> One thing that strikes me is that corridor1 has 6 levels and corridor2
> has 5...
>
> In general correlations are not expected to work on factors so I'd be
> explicit about taking as.numeric(). A glance at
> table(corridor1,corridor2) should be informative too, as would a
> summary(as.numeric(as.factor(corridor1))-as.numeric(as.factor(corridor1)))
>
> --
> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
>
--
Kum-Hoe Hwang, Ph.D.Phone : 82-31-250-3516Email : phdhwang at gmail.com
More information about the R-help
mailing list