[R] correlation matrix only if enough non-NA values

jeff6868 geoffrey_klein at etu.u-bourgogne.fr
Tue May 29 11:03:12 CEST 2012


Hi everybody.

I'm trying to do a correlation matrix in a list of files. Each file contains
2 columns: "capt1" and "capt2". For the example, I merged all in one
data.frame. My data also contains many missing data. The aim is to do a
correlation matrix for the same data for course (one correlation matrix for
capt1 and another for capt2).
For the moment, I have a correlation matrix which works (for capt1 or
capt2). But correlation coefficients of this matrix are calculated whatever
the number of missing data per column.
What I want to do is to have exactly the same correlation matrix, but only
with coefficients calculated with at least half of non missing data in the
column (in the example, at least 5 non NA values out of 10).

table <- data.frame(ST1_capt1=rnorm(1:10),ST1_capt2=c(1,2,3,4,NA,NA,7:9,NA),
  ST2_capt1=c(NA,NA,NA,NA,NA,6:10),ST2_capt2=c(21,NA,NA,NA,25:30),
  ST3_capt1=c(1,NA,NA,4:10),ST3_capt2=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))

cormatrix <- cor(table[,c(1,3,5)],use="pairwise.complete.obs")

To solve this problem, I think  it would be useful to use a code like this
before calculating the correlation matrix:

if(sum(!is.na(table[1:10,])) >=5) then calculate the correlation
coefficient, and else (if less than 5 non-NA values) put NA in the
correlation matrix.

I'm trying to combinate all this stuff but it doesn't work. Could somebody
help me to do this?
Many thanks!



--
View this message in context: http://r.789695.n4.nabble.com/correlation-matrix-only-if-enough-non-NA-values-tp4631666.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list