[R] cor(data.frame) infelicities
Michael Friendly
friendly at yorku.ca
Mon Dec 3 15:27:07 CET 2007
In using cor(data.frame), it is annoying that you have to explicitly
filter out non-numeric columns, and when you don't, the error message
is misleading:
> cor(iris)
Error in cor(iris) : missing observations in cov/cor
In addition: Warning message:
In cor(iris) : NAs introduced by coercion
It would be nicer if stats:::cor() did the equivalent *itself* of the
following for a data.frame:
> cor(iris[,sapply(iris, is.numeric)])
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
>
A change could be implemented here:
if (is.data.frame(x))
x <- as.matrix(x)
Second, the default, use="all" throws an error if there are any
NAs. It would be nicer if the default was use="complete.cases",
which would generate warnings instead. Most other statistical
software is more tolerant of missing data.
> library(corrgram)
> data(auto)
> cor(auto[,sapply(auto, is.numeric)])
Error in cor(auto[, sapply(auto, is.numeric)]) :
missing observations in cov/cor
> cor(auto[,sapply(auto, is.numeric)],use="complete")
# works; output elided
-Michael
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list