[R] Pairwise n for large correlation tables?

Adam D. I. Kramer adik at ilovebacon.org
Fri Aug 11 08:02:02 CEST 2006


On Tue, 8 Aug 2006, ggrothendieck at gmail.com wrote:

> Try this:
>
> # mat is test matrix
> mat <- matrix(1:25, 5)
> mat[2,2] <- mat[3,4] <- NA
> crossprod(!is.na(mat))

Exactly what I was looking for! Thanks.

--Adam

>
>
> On 8/7/06, Adam D. I. Kramer <adik at ilovebacon.org> wrote:
>> Hello,
>>
>> I'm using a very large data set (n > 100,000 for 7 columns), for which I'm
>> pretty happy dealing with pairwise-deleted correlations to populate my
>> correlation table. E.g.,
>>
>> a <- cor(cbind(col1, col2, col3),use="pairwise.complete.obs")
>>
>> ...however, I am interested in the number of cases used to compute each
>> cell of the correlation table. I am unable to find such a function via
>> google searches, so I wrote one of my own. This turns out to be highly
>> inefficient (e.g., it takes much, MUCH longer than the correlations do). Any
>> hints, regarding other functions to use or ways to maket his speedier, would
>> be much appreciated!
>>
>> pairwise.n <- function(df=stop("Must provide data frame!")) {
>>   if (!is.data.frame(df)) {
>>     df <- as.data.frame(df)
>>   }
>>   colNum <- ncol(df)
>>   result <- matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
>>   for(i in 1:colNum) {
>>     for (j in i:colNum) {
>>       result[i,j] <- length(df[!is.na(df[i])&!is.na(df[j])])/colNum
>>     }
>>   }
>>   result
>> }
>>
>> --
>> Adam D. I. Kramer
>> University of Oregon



More information about the R-help mailing list