[Rd] cor() fails with big dataframe
Mayeul KAUFFMANN
mayeul.kauffmann at tiscali.fr
Thu Sep 16 13:33:31 CEST 2004
Thanks all for your answers.
#The difference between the 2 following commands might be a puzzle even
for intermediate users. (I give explanation below)
> cor(x[,4],x[,5])
[1] -0.4352342
> cor(x[,4:5])
Error in cor(x[, 4:5]) : missing observations in cov/cor
In addition: Warning message:
NAs introduced by coercion
From: "Martin Maechler" <maechler at stat.math.ethz.ch>
To: "Mayeul KAUFFMANN" <mayeul.kauffmann at tiscali.fr>
> Mayeul> #I found the obvious workaround:
> Mayeul> COR <- matrix(rep(0, 81),9,9)
> Mayeul> for (i in 1:9) for (j in 1:9) {if (i>j) COR[i,j] <- cor
(x[,i],x[,j])}
> Mayeul> #which works fine, with no warning
> Mayeul> #looks like a "cor()" bug.
Martin Maechler wrote:
> quite improbably.
if it is wrong, can you say what is wrong then propose an alternate
workaround? (or should I ask on r-help).
> What does
> sapply(x, function(u)all(is.finite(u)))
> return ?
sapply(x2, function(u)all(is.finite(u)))
jntdem smldepnp lrgdepnp contigkb logdstab majdyds alliesr lncaprt
GATT
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE
_______________________________________________
But I now got the explanation. It is not due to size.
#Tony Plate wrote:
#I would suspect that your dataframe has columns that result in NA's when
it
#is coerced to a matrix
That's not yet the explanation, but you are close to it.
All columns are numerics, except 3 that are logical (I thought they would
be coerced to 0 an 1, which they are with cor(x[,4],x[,5]) not with
cor(x[,4:5]) )
They do not changes to NA's or infinite values, they ALL change to TEXT
?as.matrix
'as.matrix' is a generic function. The method for data frames will
convert any non-numeric/complex column into a character vector
using 'format' and so return a character matrix, except that
all-logical data frames will be coerced to a logical matrix.
> as.matrix(x[1:3,1:9])
jntdem smldepnp lrgdepnp contigkb logdstab majdyds alliesr
1 "400" "0.01420874" "0.2156945" "TRUE" "5.820108" "TRUE" "TRUE"
2 "400" "0.01534535" "0.2496879" "TRUE" "5.820108" "TRUE" "TRUE"
3 "400" "0.01585586" "0.2570493" "TRUE" "5.820108" "TRUE" "TRUE"
lncaprt GATT
1 "2.883204" "1"
2 "2.906521" "1"
3 "2.833357" "1"
?cor says it accepts data.frame. In fact, it does iff they have no (or
only: cor(x[,6:7]) works) logical columns.
doing cor with a logical (a dummy variable) and a numeric is maybe not as
sensible as doing it with 2 numerics.
But it may still usefull to explore data.
Maybe one may want either to change the documentation of ?cor , or not
rely on as.matrix to convert the data.frame if some columns are logical.
Cheers,
Mayeul
More information about the R-devel
mailing list