[R] Pairwise n for large correlation tables?
    Adam D. I. Kramer 
    adik at ilovebacon.org
       
    Tue Aug  8 04:03:41 CEST 2006
    
    
  
Hello,
I'm using a very large data set (n > 100,000 for 7 columns), for which I'm
pretty happy dealing with pairwise-deleted correlations to populate my
correlation table. E.g.,
a <- cor(cbind(col1, col2, col3),use="pairwise.complete.obs")
...however, I am interested in the number of cases used to compute each
cell of the correlation table. I am unable to find such a function via
google searches, so I wrote one of my own. This turns out to be highly
inefficient (e.g., it takes much, MUCH longer than the correlations do). Any
hints, regarding other functions to use or ways to maket his speedier, would
be much appreciated!
pairwise.n <- function(df=stop("Must provide data frame!")) {
   if (!is.data.frame(df)) {
     df <- as.data.frame(df)
   }
   colNum <- ncol(df)
   result <- matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
   for(i in 1:colNum) {
     for (j in i:colNum) {
       result[i,j] <- length(df[!is.na(df[i])&!is.na(df[j])])/colNum
     }
   }
   result
}
--
Adam D. I. Kramer
University of Oregon
    
    
More information about the R-help
mailing list