[R] Unconsistent behaviour of function cor()
rforge
seifert.reinhard at gmail.com
Wed Jan 6 11:08:42 CET 2010
Odd behaviour of function cor() in R-2.10.1-64bit-Unix
In a dataset with 1366 patients and 244 clinical variables Spearman's Rho
was calculated for some fatty acids and BMI and came over something rather
odd:
R seems to calculate Rho differently on 2.10.1-64bit-Unix and
2.9.0-32bit-Windows when I calculate the complete (244x244) correlation
matrix and then pick out the values I am interested in!
The 2.9.0-32bit-Windows version calculates the Rho for pairwise complete
observations as I expected but not so did 2.10.1-64bit-Unix.
I compared 4 ways of producing the Rho:
A) calculating the rho for each pair of variables in a loop -> forcing
pairwise complete obs
B) calculating a matrix of a small selection of variables and then picking
one column of the correlation matrix
C) calculating the complete 244x244 correlation matrix and then picking
the relevant rho's
D) as C but with 'use = "pairwise.complete.obs"'
I used initialy D) and produced wrong results.
I included the code and output:
________________________________ R-code ________________________________
## Read data using UNIX-path to USB-disk
data <-
read.table("/media/disk/ONYG/fatty_acids/mito/BECAC_MITO_23okt09.txt" ,
header = TRUE , dec = "," , sep = ";")
## Read data using Windows-path to USB-disk
# data <- read.table("E:/ONYG/fatty_acids/mito/BECAC_MITO_23okt09.txt" ,
header = TRUE , dec = "," , sep = ";")
## Usage of "use" in cor()
# use: an optional character string giving a method for computing
# covariances in the presence of missing values. This must be
# (an abbreviation of) one of the strings '"everything"',
# '"all.obs"', '"complete.obs"', '"na.or.complete"', or
# '"pairwise.complete.obs"'.
## four ways calculating Spearman's Rho for a selection of variables
## (column 7 of the data relates to BMI and column 104:108 to some fatty
acids)
# __________________ A _____________________
cori <- array(0,151)
for (i in 104:151) cori[i] <- cor( data[,7] , data[,i] , method =
"spearman")
cor.a <- as.numeric(round(cori[104:108] , 3))
# __________________ B _____________________
cor.b <- as.numeric(round(cor( data[, c(7,104:108)] , method = "spearman")
[-1,1] , 3))
# __________________ C _____________________
cor.c <- as.numeric(round(cor( data , method = "spearman") [104:108,7] , 3))
# __________________ D _____________________
cor.d <- as.numeric(round(cor( data , method = "spearman" , use =
"pairwise.complete.obs") [104:108,7] , 3))
## dump the R- and OS-version and the results into textfile
capture.output( {print(date())
print(as.data.frame(unlist(R.Version())))
cbind( cor.a , cor.b , cor.c , cor.d)} ,
file = "Cor_output.txt" ,
append = TRUE)
_________________________________ Output _________________________________
[1] "Tue Jan 5 15:40:05 2010"
unlist(R.Version())
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 2
minor 10.1
year 2009
month 12
day 14
svn rev 50720
language R
version.string R version 2.10.1 (2009-12-14)
The rows denote Spearman's Rho for 5 different fatty acids against BMI, the
columns the 4 different ways (a,b,c,d) I calculated the Rho.
cor.a cor.b cor.c cor.d
[1,] 0.062 0.062 0.057 0.057
[2,] 0.107 0.107 -0.013 -0.013
[3,] 0.226 0.226 0.215 0.215
[4,] 0.232 0.232 0.157 0.157
[5,] 0.179 0.179 0.178 0.178
[1] "Tue Jan 05 15:49:34 2010"
unlist(R.Version())
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 9.0
year 2009
month 04
day 17
svn rev 48333
language R
version.string R version 2.9.0 (2009-04-17)
cor.a cor.b cor.c cor.d
[1,] 0.062 0.062 0.062 0.062
[2,] 0.107 0.107 0.107 0.107
[3,] 0.226 0.226 0.226 0.226
[4,] 0.232 0.232 0.232 0.232
[5,] 0.179 0.179 0.179 0.179
Best regards,
Reinhard
--
View this message in context: http://n4.nabble.com/Unconsistent-behaviour-of-function-cor-tp999702p999702.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list