[R] bug? in stats::cor for use=complete.obs with NAs
hugh.genin at thomsonreuters.com
hugh.genin at thomsonreuters.com
Wed Jun 9 19:36:22 CEST 2010
Arrrrr,
I think I've found a bug in the behavior of the stats::cor function when
NAs are present, but in case I'm missing something, could you look over
this example and let me know what you think:
> a = c(1,3,NA,1,2)
> b = c(1,2,1,1,4)
> cor(a,b,method="spearman", use="complete.obs")
[1] 0.8164966
> cor(a,b,method="spearman", use="pairwise.complete.obs")
[1] 0.7777778
My understanding is that, when the inputs are vectors (but not
necessarily when they're matrices), the "complete.obs" and
"pairwise.complete.obs" arguments should give identical spearman
correlations. The above example clearly shows they do not in my version
of R (2.11.1). However, in cor.test, they do:
> cor.test(a,b,method="spearman", use="complete.obs")
Spearman's rank correlation rho
data: a and b
S = 2.2222, p-value = 0.2222
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.7777778
So cor and cor.test do not agree, which seems very likely to be a bug.
When calculating by hand, I also get 0.7777778. Additionally, when
using an old version of R (2.5.0), both the complete.obs and
pairwise.complete.obs versions give 0.7777778. Which strongly suggests
either 2.5.0 or 2.11.1 has a bug in it. Is this a bug? If so, has it
already been reported? (I found a related but confusing email thread
from 2004 in the R archives, but I did not find the resolution to that
bug report).
Additional info:
Platform = Windows XP
> sessionInfo()
R version 2.11.1 (2010-05-31)
i386-pc-mingw32
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United
States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.11.1
> Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
Thanks,
--Hugh
More information about the R-help
mailing list