[R] cor(data.frame) infelicities

Gabor Grothendieck ggrothendieck at gmail.com
Mon Dec 3 20:05:36 CET 2007


You are right but I was just trying to stick to the same example.
In reality it would be ok as long as its an ordered factor.  One could
restrict it to those of class "ordered".


On Dec 3, 2007 1:58 PM, Liaw, Andy <andy_liaw at merck.com> wrote:
> I'd call that another infelicity.  Species is supposed to be nominal,
> not ordinal, so rank correlation wouldn't make much sense.  So what does
> cor(, method="kendall") do?  It looks like it simply uses the underlying
> numeric code.  (Change Species to numerics and you'll see the same
> answer.)  However, reordering the levels changes the result:
>
> R> iris2 <- iris
> R> levels(iris2$Species) <- levels(iris2$Species)[c(2, 1, 3)]
> R> cor(iris2, method = "kendall")
>             Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> Sepal.Length   1.00000000 -0.07699679    0.7185159   0.6553086 0.1897778
> Sepal.Width   -0.07699679  1.00000000   -0.1859944  -0.1571257 0.1439793
> Petal.Length   0.71851593 -0.18599442    1.0000000   0.8068907 0.2677154
> Petal.Width    0.65530856 -0.15712566    0.8068907   1.0000000 0.2724843
> Species        0.18977778  0.14397927    0.2677154   0.2724843 1.0000000
>
> To me, this is dangerous!
>
> Andy
>
>
> From: Gabor Grothendieck
>
> >
> > You can calculate the Kendall rank correlation with such a matrix
> > so you would not want to exclude factors in that case:
> >
> > > cor(iris, method = "kendall")
> >              Sepal.Length Sepal.Width Petal.Length
> > Petal.Width    Species
> > Sepal.Length   1.00000000 -0.07699679    0.7185159
> > 0.6553086  0.6704444
> > Sepal.Width   -0.07699679  1.00000000   -0.1859944
> > -0.1571257 -0.3376144
> > Petal.Length   0.71851593 -0.18599442    1.0000000
> > 0.8068907  0.8229112
> > Petal.Width    0.65530856 -0.15712566    0.8068907
> > 1.0000000  0.8396874
> > Species        0.67044444 -0.33761438    0.8229112
> > 0.8396874  1.0000000
> >
> >
> > On Dec 3, 2007 9:27 AM, Michael Friendly <friendly at yorku.ca> wrote:
> > > In using cor(data.frame), it is annoying that you have to explicitly
> > > filter out non-numeric columns, and when you don't, the
> > error message
> > > is misleading:
> > >
> > >  > cor(iris)
> > > Error in cor(iris) : missing observations in cov/cor
> > > In addition: Warning message:
> > > In cor(iris) : NAs introduced by coercion
> > >
> > > It would be nicer if stats:::cor() did the equivalent
> > *itself* of the
> > > following for a data.frame:
> > >  > cor(iris[,sapply(iris, is.numeric)])
> > >              Sepal.Length Sepal.Width Petal.Length Petal.Width
> > > Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
> > > Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
> > > Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
> > > Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000
> > >  >
> > >
> > > A change could be implemented here:
> > >     if (is.data.frame(x))
> > >         x <- as.matrix(x)
> > >
> > > Second, the default, use="all" throws an error if there are any
> > > NAs.  It would be nicer if the default was use="complete.cases",
> > > which would generate warnings instead.  Most other statistical
> > > software is more tolerant of missing data.
> > >
> > >  > library(corrgram)
> > >  > data(auto)
> > >  > cor(auto[,sapply(auto, is.numeric)])
> > > Error in cor(auto[, sapply(auto, is.numeric)]) :
> > >   missing observations in cov/cor
> > >  > cor(auto[,sapply(auto, is.numeric)],use="complete")
> > > # works; output elided
> > >
> > > -Michael
> > >
> > > --
> > > Michael Friendly     Email: friendly AT yorku DOT ca
> > > Professor, Psychology Dept.
> > > York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
> > > 4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
> > > Toronto, ONT  M3J 1P3 CANADA
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
>
> ------------------------------------------------------------------------------
> Notice:  This e-mail message, together with any attach...{{dropped:15}}



More information about the R-help mailing list