[R] Spearman correlation and missing observations
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Nov 26 15:04:10 CET 2003
Nicolas STRANSKY <Nicolas.Stransky at curie.fr> writes:
> Hi,
>
> I am using R 1.8.1 on WinXP. I encounter a problem when trying to
> compute a Spearman correlation under certain conditions (at least I
> think there is a problem, but maybe this is the normal behavior).
>
> > X<-array(0,c(20,2))
> >
> > X[,1]<-c(runif(10),rep(NA,10))
> > X[,2]<-c(runif(10),rep(NA,10))
> >
> > Y<-X[1:10,]
> >
> > cor(Y,method="s",use="complete.obs")
> [,1] [,2]
> [1,] 1.0000000 0.3939394
> [2,] 0.3939394 1.0000000
> > cor(X,method="s",use="complete.obs")
> [,1] [,2]
> [1,] 1.000000 0.924812
> [2,] 0.924812 1.000000
>
>
> The problem is that I do not get the same results whenever there are
> NA's is the dataset or not. Perhaps I misunderstand the use of
> "complete.obs" and "pairwise.complete.obs" for dealing with missing data
> ; if so, please tell me how I could manage to have se same result at the
> end.
>
> On the other hand, the same type of commands with a Pearson correlation
> gives exactly the same result for X and Y :
>
> > cor(Y,method="p",use="complete.obs")
> [,1] [,2]
> [1,] 1.0000000 0.3109109
> [2,] 0.3109109 1.0000000
> > cor(X,method="p",use="complete.obs")
> [,1] [,2]
> [1,] 1.0000000 0.3109109
> [2,] 0.3109109 1.0000000
>
> Thank's for your help
Oh, d*mn....
The problem is that
> rank(c(runif(10),rep(NA,10)))
[1] 4 8 6 5 9 10 2 3 7 1 11 12 13 14 15 16 17 18 19 20
and we want
> rank(c(runif(10),rep(NA,10)),na.last="keep")
[1] 6 2 9 5 8 3 7 1 10 4 NA NA NA NA NA NA NA NA NA NA
so inside cor, we need to add na.last="keep" in two places:
if (method != "pearson") {
Rank <- function(u) if (is.matrix(u))
apply(u, 2, rank, na.last="keep")
else rank(u, na.last="keep")
x <- Rank(x)
if (!is.null(y))
y <- Rank(y)
}
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list