[R] Pairwise n for large correlation tables?
Adam D. I. Kramer
adik at ilovebacon.org
Tue Aug 8 04:03:41 CEST 2006
Hello,
I'm using a very large data set (n > 100,000 for 7 columns), for which I'm
pretty happy dealing with pairwise-deleted correlations to populate my
correlation table. E.g.,
a <- cor(cbind(col1, col2, col3),use="pairwise.complete.obs")
...however, I am interested in the number of cases used to compute each
cell of the correlation table. I am unable to find such a function via
google searches, so I wrote one of my own. This turns out to be highly
inefficient (e.g., it takes much, MUCH longer than the correlations do). Any
hints, regarding other functions to use or ways to maket his speedier, would
be much appreciated!
pairwise.n <- function(df=stop("Must provide data frame!")) {
if (!is.data.frame(df)) {
df <- as.data.frame(df)
}
colNum <- ncol(df)
result <- matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
for(i in 1:colNum) {
for (j in i:colNum) {
result[i,j] <- length(df[!is.na(df[i])&!is.na(df[j])])/colNum
}
}
result
}
--
Adam D. I. Kramer
University of Oregon
More information about the R-help
mailing list