[R] How to Get Categorical Correlation Coefficient
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Thu Oct 12 12:29:42 CEST 2006
"Kum-Hoe Hwang" <phdhwang at gmail.com> writes:
> There was my mistake in the earlier email.
> I have corrected the error by dropping "ns.omit" from data.frame().
>
> I added a new corrected correlation and output followings:
>
> ------------------------------------------------------------------------------
> #
> > nrow(sdi)
> [1] 65613
>
> > print(corridor1[65600:65613])
> [1] C C C C F
> [6] F F F B B
> [11] F F B B
> Levels: B C D E A F
>
> > print(corridor2[65600:65613])
> [1] 4 4 4 4 2 2 2 2 1 1 2 2 1 1
>
> > summary(corridor1)
> B C D E
> A F
> 15092 13456 6652 1611 1796 27006
> > summary(corridor2)
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 0.0 1.0 2.0 2.3 3.0 5.0
>
> > summary(as.numeric(as.factor(corridor1))-as.numeric(as.factor(corridor1)))
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 0 0 0 0 0 0
One term of course needs to have corridor2. (That's my typo, but...)
> > table(corridor1,corridor2)
> corridor2
> corridor1 0 1 2 3 4 5
> B 0 15092 0 0 0 0
> C 0 0 0 0 13456 0
> D 0 0 0 6652 0 0
> E 0 0 0 0 0 1611
> A 1796 0 0 0 0 0
> F 0 0 27006 0 0 0
Notice that they are not in the same order! as.numeric(corridor1) will
have 1 for B, ..., 5 for A, 6 for F
> ---------------------------------------------------------------------------------------
> There are different correlation coefficients from the following results:
> Are there any functions or packages for a categorical correlation?
>
> > cor(jh1_1, corridor1)
> [1] 0.02753303
> > cor(jh1_1, as.factor(corridor2))
> [1] -0.3682788
>
>
> Thanks for your kindness,
>
> Kum
>
>
> On 12 Oct 2006 10:25:33 +0200, Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
> > "Kum-Hoe Hwang" <phdhwang at gmail.com> writes:
> >
> > > Howdy Gurus !
> > >
> > > I have a different correlation result from the same data. The
> > > "corridor1" string variable is expressed
> > > as a number like the "corridor2" number variable.
> > > --------------------------------------------------------------------------
> > > > levels(corridor1)
> > > [1] "A" "B" "C" "D" "E" "F"
> > > > levels(as.factor(corridor2))
> > > [1] "0" "1" "2" "3" "4"
> > > >
> > > ------------------------------------------------------------------------------------------
> > > I have the correlation results followings using cor() function.
> > > ------------------------------------------------------------------------------------------
> > > > cor(jh1_1, as.factor(corridor1))
> > > [1] 0.01528538
> > > > cor(jh1_1, as.factor(corridor2))
> > > [1] -0.4972571
> > > ------------------------------------------------------------------------------------------
> > > I donot know why the above correlation coefficients used the same data
> > > are different.
> > > They are 0.015 from as.factor(corridor1), -0.497 from as,factor(corridor2).
> > > The string variable "corridor1" is the same catergory data with the
> > > variable corridor2.
> > > The difference is that "A" is replaced with "0", "B" with "1", "C"
> > > with "2", .....
> > >
> > > Could you tell me why they are different, and which correlation
> > > coefficient is correct?
> >
> > One thing that strikes me is that corridor1 has 6 levels and corridor2
> > has 5...
> >
> > In general correlations are not expected to work on factors so I'd be
> > explicit about taking as.numeric(). A glance at
> > table(corridor1,corridor2) should be informative too, as would a
> > summary(as.numeric(as.factor(corridor1))-as.numeric(as.factor(corridor1)))
> >
> > --
> > O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
> > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
> >
>
>
> --
> Kum-Hoe Hwang, Ph.D.Phone : 82-31-250-3516Email : phdhwang at gmail.com
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list