[Rd] Incorrect handling of NA's in cor() (PR#6750)

msa at biostat.mgh.harvard.edu msa at biostat.mgh.harvard.edu
Fri Apr 9 19:22:43 CEST 2004


Dear Uwe,

You are wrong. First, I've read the help file before
submitting the report. For two variables,
use="pairwise.complete.obs" and use="complete.obs" should be
equivalent, shouldn't it? Of sourse, the results will be
different when we have more than 2 variables. Second, with the
call you proposed I am also getting incorrect result:

> cor(x, y, use="pairwise.complete.obs", method="s")
[1] -0.1428571

The correct result is -0.4, as correctly calculated by
cor.test()

Regards

Marek Ancukiewicz



> X-Original-To: msa at biostat.mgh.harvard.edu
> Date: Fri, 09 Apr 2004 19:06:47 +0200
> From: Uwe Ligges <ligges at statistik.uni-dortmund.de>
> Organization: Fachbereich Statistik, Universitaet Dortmund
> X-Accept-Language: en-us, en, de-de, de
> Cc: R-bugs at biostat.ku.dk
> 
> msa at biostat.mgh.harvard.edu wrote:
> > Full_Name: Marek Ancukiewicz
> > Version: 1.8.1
> > OS: Linux
> > Submission from: (NULL) (132.183.12.87)
> > 
> > 
> > Function cor() incorrectly handles missing observation with method="spearman":
> > 
> > 
> >>x <- c(1,2,3,NA,5,6)
> >>y <- c(4,NA,2,5,1,3)
> >>cor(x,y,use="complete.obs",method="s")
> > 
> > [1] -0.1428571
> > 
> >>cor(x[!is.na(x)&!is.na(y)],y[!is.na(x)&!is.na(y)],method="s")
> > 
> > [1] -0.4
> > 
> > These two results should be the same.
> > 
> 
> 
> No! Please read at least the help file, ?cor, before submitting a bug 
> report:
> 
> 
> "If use is "complete.obs" then missing values are handled by casewise 
> deletion. Finally, if use has the value "pairwise.complete.obs" then the 
> correlation between each pair of variables is computed using all 
> complete pairs of observations on those variables."
> 
> 
> Hence
>    cor(x, y, use="pairwise.complete.obs", method="s")
> is what you expect ...
> 
> Uwe Ligges
>



More information about the R-devel mailing list