[R] robust method to obtain a correlation coeff?
dwinsemius at comcast.net
Mon Aug 24 17:53:17 CEST 2009
On Aug 24, 2009, at 11:38 AM, David Winsemius wrote:
> On Aug 24, 2009, at 11:26 AM, (Ted Harding) wrote:
>> On 24-Aug-09 14:47:02, Christian Meesters wrote:
>>> Being a R-newbie I am wondering how to calculate a correlation
>>> coefficient (preferably with an associated p-value) for data like:
>>>  25.5 25.3 25.1 NA 23.3 21.5 23.8 23.2 24.2 22.7 27.6 24.2 ...
>>>  0.0 11.1 0.0 NA 0.0 10.1 10.6 9.5 0.0 57.9 0.0 0.0 ...
>>> Apparently corr(d) from the boot-library fails with NAs in the data,
>> Yes, apparently corr() has no option for dealing with NAs.
>>> also cor.test cannot cope with a different number of NAs.
>> On the other hand, cor.test() does have an option "na.action"
>> which, by default, is the same as what is in getOption("na.action").
>> In my R installation, this, by default, is "na.omit". This has the
>> effect that, for any pair in (x,y) where at least one of the pair
>> is NA, that pair will be omitted from the calculation. For example,
>> basing two vectors x,y on your data above, and a third z which is y
>> with an extra NA:
>> y<-c( 0.0,11.1, 0.0,NA, 0.0,10.1,10.6, 9.5, 0.0,57.9, 0.0, 0.0)
>> z<-y; z<-NA
>> I get
>> <snipped unneeded output>
>> # sample estimates:
>> # cor
>> # -0.4298726
>> So it has worked in both cases (see the difference in 'df'), despite
>> the different numbers of NAs in x and z.
> You may not need to go through the material that follows. There are
> already a set of functions to handle such concerns:
> ?na.omit will bring a help page describing:
> na.fail(object, ...) na.omit(object, ...) na.exclude(object, ...)
> na.pass(object, ...)
Apologies; this was a bit hastily constructed. What I was quoting in
what follows was from the Options help page and "Options set in
package stats" section of that help page.
> na.action: the name of a function for treating missing values (NA's)
> for certain situations.
> ... but I do not know what those "certain situations" really are.
So there are some function that may be affected by settings of
options("na.action") but I cannot tell you where to find a list of
>> For functions such as corr() which do not have provision for omitting
>> NAs, you can fix it up for yourself before calling the function.
>> In the case of your two series d[,1], d[,2] you could use an index
>> variable to select cases:
>> ix <- (!is.na(d[,1]))&(!is.na(d[,2]))
>> With my variables x,y,z I get
>> ix.1 <- (!is.na(x))&(!is.na(y))
>> ix.2 <- (!is.na(x))&(!is.na(z))
>> d.1 <-cbind(x,y)
>> #  -0.422542 ## (and -0.422542 from cor.test above as well)
>> d.2 <- cbind(x,z)
>> #  -0.4298726 ## (and -0.4298726 from cor.test above as well)
>> Hoping this helps,
David Winsemius, MD
West Hartford, CT
More information about the R-help