[R] cor and missing values. Bug?

Frank E Harrell Jr feh3k at spamcop.net
Wed May 26 21:16:38 CEST 2004


On 27 May 2004 00:20:17 +0200
Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:

> "Robert W. Baer, Ph.D." <rbaer at atsu.edu> writes:
> 
> > > Not to put too fine a point on it, but did you consider checking the
> > > NEWS file for the most recent version (1.9.0,
> > > http://cran.r-project.org/src/base/NEWS)?
> > >
> > >     o   The cor() function did not remove missing values in the
> > >         non-Pearson case.
> > 
> > 
> > There is still something a little strange in version 1.9.0. What  is
> > the source of the discrpancy between cor() and cor.test()?
> 
> One ranks x and y before removing missing values, the other one
> removes them first and then ranks. It is not really desirable, but a
> better solution is nontrivial (esp. in the "pairwise.complete.obs"
> case) and we did document it in ?cor:
> 
>                   Notice also that the ranking is (currently) done
>      removing only cases that are missing on the variable itself,
>      which may not be what you expect if you let 'use' be
>      '"complete.obs"' or '"pairwise.complete.obs"'.
> 
> 
> -- 
>    O__  ---- Peter Dalgaard             Blegdamsvej 3  
>   c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
>  (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
> 

Some of you may want to look at the old rcorr function in the Hmisc
package, which uses the pairwise complete obs method, uses some C code for
Spearman correlation, and is fast for large matrices.

Frank

---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list