[R] cor vs cor.test

Tue Jul 7 16:25:39 CEST 2009

Hi,

I am trying to use R for some survey analysis, and need to compute the
significance of some correlations. I read the man pages for cor and
cor.test, but I am confused about

- whether these functions are intended to work the same way
- about how these functions handle NA values
- whether cor.test supports 'use = complete.obs'.

Some example output may explain why I am confused:

-----------------------------------------------
WORKS:
> cor(q[[9]], q[[10]])
                  perceivedlearningcurve
overallimpression              0.7440637
-----------------------------------------------

DOES NOT WORK:
> cor.test(q[[9]], q[[10]])
Error in `[.data.frame`(x, OK) : undefined columns selected
-----------------------------------------------

(I assume that's because of R's generous type coercions.... does R
have a "typeof" operator to learn what the type of q[[9]] is?)

-----------------------------------------------
WORKS:
> cor.test(q[[9]][,1], q[[10]][,1])

        Pearson's product-moment correlation

data:  q[[9]][, 1] and q[[10]][, 1]
t = 12.9877, df = 136, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.6588821 0.8104055
sample estimates:
      cor
0.7440637
-----------------------------------------------

WORKS, but propagates NAs:
> cor(q[[9]], q[[51]])
                  usefulnessautodetectionbox_ord
overallimpression                             NA
-----------------------------------------------
WORKS, and uses complete observations only

> cor(q[[9]], q[[51]], use="complete.obs")
                  usefulnessautodetectionbox_ord
overallimpression                      0.2859895
-----------------------------------------------
WORKS, apparently, but does not require 'use="complete.obs"' (!?)

> cor.test(q[[9]][,1], q[[51]][,1])

        Pearson's product-moment correlation

data:  q[[9]][, 1] and q[[51]][, 1]
t = 3.1016, df = 108, p-value = 0.002456
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.1043351 0.4491779
sample estimates:
      cor
0.2859895
-----------------------------------------------

The help page for cor.test states that 'getOption('na.action')'
describes the action taken on NAs:

> getOption("na.option")
NULL
-----------------------------------------------

No action is taken, yet cor.test appears to only use complete observations (!?)

Others believe that cor.test accepts 'use=complete.obs':
http://markmail.org/message/nuzqeouqhbb7f6ok

--------------

Needless to say, this makes writing robust code very hard.

I'm wondering what the rationale for the inconsistencies between cor
and cor.test is.

Thanks!

 - Godmar