# [R] generalization of tabulate()

Robin Hankin rksh1 at cam.ac.uk
Fri Oct 16 10:29:35 CEST 2009

```Hi

I want a generalization of tabulate() which works on rows of a matrix.
Suppose I have an integer matrix 'observation':

> observation

y1 y2 y3
1 4 0
1 4 0
2 0 3
4 1 0
0 5 0
0 1 4
2 0 3

Each row corresponds to a (multivariate) observation.  Note that the
first two rows are identical: this means that data "c(1,4,0)" was
observed twice.

Now suppose I can list the sample space:

> S

[1,] 5 0 0
[2,] 4 1 0
[3,] 3 2 0
[4,] 2 3 0
[5,] 1 4 0
[6,] 0 5 0
[7,] 4 0 1
[8,] 3 1 1
[9,] 2 2 1
[10,] 1 3 1
[11,] 0 4 1
[12,] 3 0 2
[13,] 2 1 2
[14,] 1 2 2
[15,] 0 3 2
[16,] 2 0 3
[17,] 1 1 3
[18,] 0 2 3
[19,] 1 0 4
[20,] 0 1 4
[21,] 0 0 5

(thus each row corresponds to a point in my sample space).

Now what I need to do is to construct a new matrix, which uses the
'observation' matrix above, which is a sort of table:

> desired

y1 y2 y3 d
[1,] 5 0 0 0
[2,] 4 1 0 1
[3,] 3 2 0 0
[4,] 2 3 0 0
[5,] 1 4 0 2
[6,] 0 5 0 1
[7,] 4 0 1 0
[8,] 3 1 1 0
[9,] 2 2 1 0
[10,] 1 3 1 0
[11,] 0 4 1 0
[12,] 3 0 2 0
[13,] 2 1 2 0
[14,] 1 2 2 0
[15,] 0 3 2 0
[16,] 2 0 3 2
[17,] 1 1 3 0
[18,] 0 2 3 0
[19,] 1 0 4 0
[20,] 0 1 4 1
[21,] 0 0 5 0

Thus the 'd' column counts the number of times that each row occurs in
variable 'observation'.  So desired[5,4]=2 because the observation
corresponding to desired[5,1:3] (viz c(1,4,0)) occurred twice.  And
desired[1,4]=0 because the observation corresponding to desired[1,1:3]
(viz c(5,0,0)) did not occur once (it was not observed).

In my application I have dim(S) ~= c(5,4e6).

I've tried merge(), stack(),  reshape(), but the best I can do
is the (derisory):

require(partitions)

obs <- matrix(as.integer(c(
1, 4, 0,
1, 4, 0,
2, 0, 3,
4, 1, 0,
0, 5, 0,
0, 1, 4,
2, 0, 3
)),ncol=3,byrow=TRUE)

S <- t(compositions(5,3))
d <- rep(0,nrow(S))

for(i in seq_len(nrow(obs))){
for(j in seq_len(nrow(S))){
if(all(obs[i,,drop=TRUE] == S[j,,drop=TRUE])){
d[j] <- d[j]+1
}
}
}

S <- cbind(S,d)

Anyone got anything better before I try C?

--
Robin K. S. Hankin
Uncertainty Analyst
University of Cambridge
19 Silver Street
Cambridge CB3 9EP
01223-764877

```