[R] robust method to obtain a correlation coeff?
dwinsemius at comcast.net
Mon Aug 24 17:38:21 CEST 2009
On Aug 24, 2009, at 11:26 AM, (Ted Harding) wrote:
> On 24-Aug-09 14:47:02, Christian Meesters wrote:
>> Being a R-newbie I am wondering how to calculate a correlation
>> coefficient (preferably with an associated p-value) for data like:
>>  25.5 25.3 25.1 NA 23.3 21.5 23.8 23.2 24.2 22.7 27.6 24.2 ...
>>  0.0 11.1 0.0 NA 0.0 10.1 10.6 9.5 0.0 57.9 0.0 0.0 ...
>> Apparently corr(d) from the boot-library fails with NAs in the data,
> Yes, apparently corr() has no option for dealing with NAs.
>> also cor.test cannot cope with a different number of NAs.
> On the other hand, cor.test() does have an option "na.action"
> which, by default, is the same as what is in getOption("na.action").
> In my R installation, this, by default, is "na.omit". This has the
> effect that, for any pair in (x,y) where at least one of the pair
> is NA, that pair will be omitted from the calculation. For example,
> basing two vectors x,y on your data above, and a third z which is y
> with an extra NA:
> y<-c( 0.0,11.1, 0.0,NA, 0.0,10.1,10.6, 9.5, 0.0,57.9, 0.0, 0.0)
> z<-y; z<-NA
> I get
> <snipped unneeded output>
> # sample estimates:
> # cor
> # -0.4298726
> So it has worked in both cases (see the difference in 'df'), despite
> the different numbers of NAs in x and z.
You may not need to go through the material that follows. There are
already a set of functions to handle such concerns:
?na.omit will bring a help page describing:
na.fail(object, ...) na.omit(object, ...) na.exclude(object, ...)
It reminded me that:
na.action: the name of a function for treating missing values (NA's)
for certain situations.
... but I do not know what those "certain situations" really are.
> For functions such as corr() which do not have provision for omitting
> NAs, you can fix it up for yourself before calling the function.
> In the case of your two series d[,1], d[,2] you could use an index
> variable to select cases:
> ix <- (!is.na(d[,1]))&(!is.na(d[,2]))
> With my variables x,y,z I get
> ix.1 <- (!is.na(x))&(!is.na(y))
> ix.2 <- (!is.na(x))&(!is.na(z))
> d.1 <-cbind(x,y)
> #  -0.422542 ## (and -0.422542 from cor.test above as well)
> d.2 <- cbind(x,z)
> #  -0.4298726 ## (and -0.4298726 from cor.test above as well)
> Hoping this helps,
>> Is there a
>> solution to this problem (calculating a correlation coefficient and
>> ignoring different number of NAs), e.g. Pearson's corr coeff?
>> If so, please point me to the relevant piece of documentation.
David Winsemius, MD
West Hartford, CT
More information about the R-help