[R] create a pairwise coocurrence matrix
David Winsemius
dwinsemius at comcast.net
Thu Nov 11 16:21:15 CET 2010
On Nov 11, 2010, at 4:44 AM, Stefan Evert wrote:
Pasted and realigned from original posting:
>> term1 term2 term3 term4 term5
>> term1 0 2 0 1 3
>> term2 2 0 0 1 2
>> term3 0 0 0 0 0
>> term4 1 1 0 0 1
>> term5 3 2 0 1 1
>> Any ideas on how to do that?
> If I understood you correctly, you have this matrix of indicator
> variables for occurrences of terms in documents:
>
> A <- matrix(c(1,1,0,0,1,1,1,0,1,1,1,0,0,0,1), nrow=3, byrow=TRUE,
> dimnames=list(paste("doc",1:3), paste("term",1:5)))
> A
>
> and want to determine co-occurrence counts for pairs of terms,
> right? (The formatting of your matrices was messed up, and some of
> your co-occurrence counts don't make sense to me.)
>
> The fastest and easiest solution is
>
> t(A) %*% A
That is really elegant. (Wish I could remember my linear algebra
lessons as well from forty years ago.) I checked it against the
specified output and found that with one exception that the OP had
planned for the diagonal to be filled with zeroes. So that could be
completed by a simple modification:
temp <- t(A) %*% A
diag(temp) <- 0
temp
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list