[R] create a pairwise coocurrence matrix

David Winsemius dwinsemius at comcast.net
Thu Nov 11 16:21:15 CET 2010


On Nov 11, 2010, at 4:44 AM, Stefan Evert wrote:

Pasted and realigned from original posting:
>>     term1 term2 term3 term4 term5
>> term1 0 2 0 1 3
>> term2 2 0 0 1 2
>> term3 0 0 0 0 0
>> term4 1 1 0 0 1
>> term5 3 2 0 1 1
>> Any ideas on how to do that?

> If I understood you correctly, you have this matrix of indicator  
> variables for occurrences of terms in documents:
>
>  A <- matrix(c(1,1,0,0,1,1,1,0,1,1,1,0,0,0,1), nrow=3, byrow=TRUE,  
> dimnames=list(paste("doc",1:3), paste("term",1:5)))
>  A
>
> and want to determine co-occurrence counts for pairs of terms,  
> right? (The formatting of your matrices was messed up, and some of  
> your co-occurrence counts don't make sense to me.)
>
> The fastest and easiest solution is
>
>  t(A) %*% A

That is really elegant. (Wish I could remember my linear algebra  
lessons as well from forty years ago.) I checked it against the  
specified output and found that with one exception that the OP had  
planned for the diagonal to be filled with zeroes. So that could be  
completed by a simple modification:

temp <- t(A) %*% A
diag(temp) <- 0
temp

-- 
David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list