[R] cor vs cor.test
Godmar Back
godmar at gmail.com
Tue Jul 7 16:25:39 CEST 2009
Hi,
I am trying to use R for some survey analysis, and need to compute the
significance of some correlations. I read the man pages for cor and
cor.test, but I am confused about
- whether these functions are intended to work the same way
- about how these functions handle NA values
- whether cor.test supports 'use = complete.obs'.
Some example output may explain why I am confused:
-----------------------------------------------
WORKS:
> cor(q[[9]], q[[10]])
perceivedlearningcurve
overallimpression 0.7440637
-----------------------------------------------
DOES NOT WORK:
> cor.test(q[[9]], q[[10]])
Error in `[.data.frame`(x, OK) : undefined columns selected
-----------------------------------------------
(I assume that's because of R's generous type coercions.... does R
have a "typeof" operator to learn what the type of q[[9]] is?)
-----------------------------------------------
WORKS:
> cor.test(q[[9]][,1], q[[10]][,1])
Pearson's product-moment correlation
data: q[[9]][, 1] and q[[10]][, 1]
t = 12.9877, df = 136, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.6588821 0.8104055
sample estimates:
cor
0.7440637
-----------------------------------------------
WORKS, but propagates NAs:
> cor(q[[9]], q[[51]])
usefulnessautodetectionbox_ord
overallimpression NA
-----------------------------------------------
WORKS, and uses complete observations only
> cor(q[[9]], q[[51]], use="complete.obs")
usefulnessautodetectionbox_ord
overallimpression 0.2859895
-----------------------------------------------
WORKS, apparently, but does not require 'use="complete.obs"' (!?)
> cor.test(q[[9]][,1], q[[51]][,1])
Pearson's product-moment correlation
data: q[[9]][, 1] and q[[51]][, 1]
t = 3.1016, df = 108, p-value = 0.002456
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1043351 0.4491779
sample estimates:
cor
0.2859895
-----------------------------------------------
The help page for cor.test states that 'getOption('na.action')'
describes the action taken on NAs:
> getOption("na.option")
NULL
-----------------------------------------------
No action is taken, yet cor.test appears to only use complete observations (!?)
Others believe that cor.test accepts 'use=complete.obs':
http://markmail.org/message/nuzqeouqhbb7f6ok
--------------
Needless to say, this makes writing robust code very hard.
I'm wondering what the rationale for the inconsistencies between cor
and cor.test is.
Thanks!
- Godmar
More information about the R-help
mailing list