[R] Unconsistent behaviour of function cor()

rforge seifert.reinhard at gmail.com
Wed Jan 6 11:08:42 CET 2010


Odd behaviour of function cor() in R-2.10.1-64bit-Unix

In a dataset with 1366 patients and 244 clinical variables Spearman's Rho
was calculated for some fatty acids and BMI and came over something rather
odd: 

R seems to calculate Rho differently on 2.10.1-64bit-Unix and
2.9.0-32bit-Windows when I calculate the complete (244x244) correlation
matrix and then pick out the values I am interested in!


The 2.9.0-32bit-Windows version calculates the Rho for pairwise complete
observations as I expected but not so did 2.10.1-64bit-Unix.

I compared 4 ways of producing the Rho:
  A) calculating the rho for each pair of variables in a loop -> forcing
pairwise complete obs
  B) calculating a matrix of a small selection of variables and then picking
one column of the correlation matrix
  C) calculating the complete 244x244 correlation matrix and then picking
the relevant rho's
  D) as C but with 'use = "pairwise.complete.obs"'

I used initialy D) and produced wrong results.

I included the code and output:


________________________________ R-code ________________________________

## Read data using UNIX-path to USB-disk
data <-
read.table("/media/disk/ONYG/fatty_acids/mito/BECAC_MITO_23okt09.txt" ,
header = TRUE , dec = "," , sep = ";")

## Read data using Windows-path to USB-disk
# data <- read.table("E:/ONYG/fatty_acids/mito/BECAC_MITO_23okt09.txt" ,
header = TRUE , dec = "," , sep = ";")

## Usage of "use" in cor()
#     use: an optional character string giving a method for computing
#          covariances in the presence of missing values.  This must be
#          (an abbreviation of) one of the strings '"everything"',
#          '"all.obs"', '"complete.obs"', '"na.or.complete"', or
#          '"pairwise.complete.obs"'.


## four ways calculating Spearman's Rho for a selection of variables
## (column 7 of the data relates to BMI and column 104:108 to some fatty
acids)

# __________________ A _____________________ 
cori <- array(0,151)
for (i in 104:151) cori[i] <- cor( data[,7] , data[,i] , method =
"spearman")
cor.a <- as.numeric(round(cori[104:108] , 3))

# __________________ B _____________________ 
cor.b <- as.numeric(round(cor( data[, c(7,104:108)] , method = "spearman")
[-1,1] , 3))

# __________________ C _____________________ 
cor.c <- as.numeric(round(cor( data , method = "spearman") [104:108,7] , 3))

# __________________ D _____________________ 
cor.d <- as.numeric(round(cor( data , method = "spearman" , use =
"pairwise.complete.obs") [104:108,7] , 3))


## dump the R- and OS-version and the results into textfile
capture.output( {print(date())
                 print(as.data.frame(unlist(R.Version())))
                 cbind( cor.a , cor.b , cor.c , cor.d)} ,
               file = "Cor_output.txt" ,
               append = TRUE)



_________________________________ Output _________________________________


[1] "Tue Jan  5 15:40:05 2010"
                         unlist(R.Version())
platform                 x86_64-pc-linux-gnu
arch                                  x86_64
os                                 linux-gnu
system                     x86_64, linux-gnu
status                                      
major                                      2
minor                                   10.1
year                                    2009
month                                     12
day                                       14
svn rev                                50720
language                                   R
version.string R version 2.10.1 (2009-12-14)

The rows denote Spearman's Rho for 5 different fatty acids against BMI, the
columns the 4 different ways (a,b,c,d) I calculated the Rho.
      cor.a  cor.b   cor.c   cor.d
[1,] 0.062 0.062  0.057  0.057
[2,] 0.107 0.107 -0.013 -0.013
[3,] 0.226 0.226  0.215  0.215
[4,] 0.232 0.232  0.157  0.157
[5,] 0.179 0.179  0.178  0.178



[1] "Tue Jan 05 15:49:34 2010"
                        unlist(R.Version())
platform                    i386-pc-mingw32
arch                                   i386
os                                  mingw32
system                        i386, mingw32
status                                     
major                                     2
minor                                   9.0
year                                   2009
month                                    04
day                                      17
svn rev                               48333
language                                  R
version.string R version 2.9.0 (2009-04-17)

      cor.a  cor.b  cor.c cor.d
[1,] 0.062 0.062 0.062 0.062
[2,] 0.107 0.107 0.107 0.107
[3,] 0.226 0.226 0.226 0.226
[4,] 0.232 0.232 0.232 0.232
[5,] 0.179 0.179 0.179 0.179


Best regards,
Reinhard
-- 
View this message in context: http://n4.nabble.com/Unconsistent-behaviour-of-function-cor-tp999702p999702.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list